0% found this document useful (0 votes)
31 views231 pages

Multi-Root IO Virtualization and Sharing Specification Revision 0.7

The document is a specification for Multi-Root I/O Virtualization and Sharing (MR-IOV) by PCI-SIG, detailing its architecture, protocols, and initialization processes. It includes a comprehensive overview of MR-IOV components, transaction layers, and configuration requirements for devices and switches. The document is intended for PCI-SIG members and disclaims any warranties or liabilities regarding its use.

Uploaded by

6171905030
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views231 pages

Multi-Root IO Virtualization and Sharing Specification Revision 0.7

The document is a specification for Multi-Root I/O Virtualization and Sharing (MR-IOV) by PCI-SIG, detailing its architecture, protocols, and initialization processes. It includes a comprehensive overview of MR-IOV components, transaction layers, and configuration requirements for devices and switches. The document is intended for PCI-SIG members and disclaims any warranties or liabilities regarding its use.

Uploaded by

6171905030
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 231

Multi-Root I/O Virtualization and Sharing, Rev. 0.

7
mr-iov-07-2007-06-08

Multi-Root I/O Virtualization and


Sharing
Revision 0.7

June 8, 2007

PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

REVISION REVISION HISTORY DATE


0.5 Initial release 11/13/2006
0.7 Methods & Functions Complete 6/8/2007

PCI-SIG disclaims all warranties and liability for the use of this document and the information
contained herein and assumes no responsibility for any errors that may appear in this document, nor
does PCI-SIG make a commitment to update the information contained herein.
Contact the PCI-SIG office to obtain the latest revision of the specification.
Questions regarding this document or membership in PCI-SIG may be forwarded to:
Membership Services
http://www.pcisig.com
E-mail: [email protected]
Phone: 503-291-2569
Fax: 503-297-1090

Technical Support
Technical support for this specification is available to members. For information, please
visit: http://www.pcisig.com/developers/technical_support.

DISCLAIMER
This document is provided “as is” with no warranties whatsoever, including any warranty of
merchantability, non-infringement, fitness for any particular purpose, or any warranty otherwise
arising out of any proposal, specification, or sample. PCI-SIG disclaims all liability for infringement
of proprietary rights, relating to use of information in this specification. No license, express or
implied, by estoppel or otherwise, to any intellectual property rights is granted herein.

PCI Express, PCIe, PCI-X, and PCI-SIG are trademarks of PCI-SIG.


All other product names are trademarks, registered trademarks, or service marks of their respective
owners.

Copyright © 2007 PCI-SIG


All rights reserved.

2 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Contents
1. ARCHITECTURAL OVERVIEW ................................................................................... 17
1.1. HOW DOES MR-IOV WORK?......................................................................................... 19
1.1.1. MRA Components ................................................................................................. 24
1.1.1.1. Multi-Root Aware Root Port (MRA RP) .......................................... 24
1.1.1.2. Multi-Root Aware PCIe Device (MRA PCIe Device)...................... 25
1.1.1.3. Multi-Root PCI Manager (MR-PCIM).............................................. 26
1.1.1.4. Multi-Root Aware PCIe Switch (MRA PCIe Switch) ...................... 27
1.1.2. MR Initialization Overview................................................................................... 29
1.1.3. MR Transaction Encapsulation Overview ............................................................ 29
1.1.4. MR Congestion Management Overview ............................................................... 29
1.1.5. MR Error and Event Handling Overview ............................................................. 29
1.1.6. MR-IOV and ARI (Alternative Routing Identifier)................................................ 29
1.1.7. MR-IOV Relationship to SR-IOV and ATS ........................................................... 29
1.2. OVERVIEW OF MR TRANSACTION LAYER ...................................................................... 30
2. MR PROTOCOL CHANGES ........................................................................................... 34
2.1. MR LINK PROTOCOL NEGOTIATION .............................................................................. 34
2.1.1. MR Link Protocol Negotiation.............................................................................. 38
2.1.2. MR Flow Control Initialization Protocol ............................................................. 39
2.1.2.1. MR Flow Control DLLP Encoding ................................................... 40
2.1.2.2. MR Flow Control Initialization State Machine Rules....................... 45
2.2. TLP PREFIX TAGGING.................................................................................................... 49
2.2.1. MR Switch Transaction Layer Processing............................................................ 50
2.2.2. MR Device Transaction Layer Processing ........................................................... 52
2.2.2.1. Receiving TLPs ................................................................................. 52
2.2.2.2. Transmitting TLPs............................................................................. 53
2.2.3. Global Key Processing ......................................................................................... 54
2.2.4. MR TLP Dataflow Examples ................................................................................ 55
2.3. PER-VH RESET ............................................................................................................ 56
2.3.1. Per-VH Reset Example ......................................................................................... 57
2.3.2. RESET DLLP Format ........................................................................................... 60
2.3.3. RESET DLLP Processing ..................................................................................... 60
2.3.3.1. Upstream State Machine ................................................................... 61
2.3.3.2. Downstream State Machine............................................................... 62
2.3.3.3. Reset DLLP Reliability ..................................................................... 64
2.3.3.4. Flow Control and Reset / DL_DOWN .............................................. 65
2.4. MR FLOW CONTROL ...................................................................................................... 65
2.4.1. FC Information Tracked by Transmitter .............................................................. 65
2.4.2. Information Tracked by Receiver.......................................................................... 67
2.5. MR MESSAGE PROCESSING ........................................................................................... 67
2.5.1. Interrupts............................................................................................................... 67
2.5.1.1. INTx Device Processing.................................................................... 67
2.5.1.2. INTx Switch Processing.................................................................... 68

PCISIG Confidential 3
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

2.5.1.3. INTx Root Port Processing................................................................ 68


2.5.2. PME Turn Off Processing..................................................................................... 68
2.5.3. PM_PME Processing............................................................................................ 68
2.6. MISCELLANEOUS CHANGES ........................................................................................... 68
2.7. MISCELLANEOUS NON-CHANGES .................................................................................. 69
3. INITIALIZATION AND RESOURCE ALLOCATION ................................................ 70
3.1. MR TOPOLOGY INITIALIZATION .................................................................................... 70
3.1.1. Initial State after Fundamental Reset ................................................................... 71
3.1.1.1. PCIM Capable Switch Ports.............................................................. 71
3.1.1.2. Non-PCIM Capable Switch Ports...................................................... 72
3.1.1.3. Non-PCIe Switch Management Ports................................................ 72
3.1.1.4. Initial State Example ......................................................................... 74
3.1.2. Initial MR-PCIM Location Policy ........................................................................ 75
3.1.2.1. Initial MR-PCIM Location Example................................................. 75
3.1.3. Topology Discovery .............................................................................................. 75
3.1.3.1. MR-PCIM Topology Discovery Example ........................................ 76
3.1.4. Component Discovery........................................................................................... 78
3.1.4.1. Component Discovery Example........................................................ 79
3.1.5. VH and VF Mapping Policy.................................................................................. 80
3.1.5.1. Example VH and VF Mapping Policy .............................................. 80
3.1.6. VH and VF Mapping Implementation................................................................... 81
3.1.6.1. Example Topology: Switch Implementation..................................... 81
3.1.6.2. Example Topology: Device Implementation .................................... 85
3.1.7. MR-PCIM Failover............................................................................................... 87
3.2. MR DEVICE INITIALIZATION .......................................................................................... 88
3.2.1. Enabling MR Operation........................................................................................ 89
3.2.2. Managing Flow Control ....................................................................................... 90
3.2.3. Managing VF Mapping......................................................................................... 91
3.2.4. Managing VF Migration ....................................................................................... 93
3.2.4.1. VF Migration Initial State ................................................................. 95
3.2.4.2. VF Migration Reinitialization ........................................................... 96
3.3. MR ROOT PORT INITIALIZATION ................................................................................... 97
3.3.1. Example MR Root Port Topology ......................................................................... 98
4. CONFIGURATION............................................................................................................ 99
4.1. CONFIGURATION FIELD SUMMARY .............................................................................. 100
4.2. DEVICE CONFIGURATION SPACE .................................................................................. 106
4.2.1. Device MR-IOV Extended Capability................................................................. 108
4.2.1.1. MR-IOV Extended Capability Header (00h) .................................. 110
4.2.1.2. MR-IOV Capabilities (04h)............................................................. 111
4.2.1.3. MR-IOV Control (08h) ................................................................... 112
4.2.1.4. MR-IOV Status (0Ch) ..................................................................... 114
4.2.1.5. MR-IOV VH Counts (10h).............................................................. 115
4.2.1.6. Function Table Offset (14h) ............................................................ 115
4.2.1.7. MVF and LVF Sizes (18h) .............................................................. 116
4.2.1.8. LVF Table Offset (1Ch) .................................................................. 117

4 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.2.1.9. VL Arbitration Capability and Status (20h) .................................... 118


4.2.1.10. VL Arbitration Control (24h) .......................................................... 120
4.2.1.11. VL Arbitration Table Offset (28h) .................................................. 121
4.2.1.12. MR Error Status (2Ch) .................................................................... 122
4.2.1.13. MR Error Control (2Eh) .................................................................. 122
4.2.1.14. MR Error Log (30h to 40h) ............................................................. 122
4.2.1.15. Statistics Capability and Control (44h to 50h) ................................ 122
4.2.2. Device VL Arbitration Table............................................................................... 122
4.2.3. LVF Table ........................................................................................................... 123
4.2.3.1. LVF Table Entry ............................................................................. 123
4.2.4. Function Table .................................................................................................... 124
4.2.4.1. Function Capability (00h and 04h).................................................. 126
4.2.4.2. Function Control (08h to 10h)......................................................... 127
4.2.4.3. Function Status (14h) ...................................................................... 131
4.2.4.4. Function VC to VL Map (18h and 1Ch) ......................................... 132
4.2.4.5. Function VC Resource (20h to 2Ch)............................................... 134
4.2.4.6. Function Table MFVC Resource Status (30h to 3Ch) .................... 134
4.2.4.7. Function Interrupt Bitmap (minus 20h)........................................... 135
4.2.5. Misc. Device Configuration Space Requirements .............................................. 135
4.2.5.1. BIST (Device) ................................................................................. 135
4.3. SWITCH CONFIGURATION SPACE ................................................................................. 136
4.3.1. Switch MR-IOV Extended Capability ................................................................. 138
4.3.1.1. Switch MR-IOV Extended Capability Header (00h) ...................... 140
4.3.1.2. Switch MR-IOV Capability (04h)................................................... 140
4.3.1.3. Switch MR-IOV Control (08h) ....................................................... 141
4.3.1.4. Switch MR-IOV Status (0Ch) ......................................................... 142
4.3.1.5. MR-IOV This Bridge Map (10h) .................................................... 143
4.3.1.6. Watchdog Timer Control (14h)....................................................... 143
4.3.1.7. Authorization (18h) ......................................................................... 144
4.3.1.8. Port Table Entry Size / Num Port Entries (1Ch) ............................. 145
4.3.1.9. Port Table Offset (20h).................................................................... 145
4.3.1.10. VS Table Entry Size / Num VS Table Entries (24h)....................... 146
4.3.1.11. VS Table Offset (28h) ..................................................................... 146
4.3.1.12. VS Bridge Table Entry Size / Num VS Bridge Table Entries per VS (2Ch)
147
4.3.1.13. VS Bridge Table Offset (30h) ......................................................... 147
4.3.1.14. Statistics Capability and Control (30h to 3Ch) ............................... 148
4.3.2. Switch VS Authorization Bitmap......................................................................... 148
4.3.3. Switch Port Table................................................................................................ 148
4.3.3.1. Port Capability (00h) ....................................................................... 150
4.3.3.2. Port Control (04h and 08h).............................................................. 151
4.3.3.3. Port Status (0Ch) ............................................................................. 154
4.3.3.4. Link Partner Training Status (10h).................................................. 155
4.3.3.5. VL Arbitration Capability and Status (14h) .................................... 157
4.3.3.6. VL Arbitration Control (18h) .......................................................... 159
4.3.3.7. VL Arbitration Table Offset (1Ch) ................................................. 160

PCISIG Confidential 5
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.3.3.8. MR Error Status (20h)..................................................................... 160


4.3.3.9. MR Error Control (22h) .................................................................. 160
4.3.3.10. MR Error Log (24h to 34h) ............................................................. 160
4.3.3.11. PCI Bridge Control (N+00h)........................................................... 161
4.3.3.12. PCIe Capability Structure (N+02h)................................................. 161
4.3.3.13. Port Interrupt Status Bitmap (minus 20h) ....................................... 164
4.3.4. Switch VL Arbitration Table ............................................................................... 164
4.3.5. Switch VS Table .................................................................................................. 165
4.3.5.1. VS Capability and Status (00h) ....................................................... 165
4.3.5.2. VS Control (04h) ............................................................................. 166
4.3.5.3. VS Bridge Interrupt Status (08h to ??h).......................................... 167
4.3.5.4. VS Interrupt Status Bitmap (minus 20h) ......................................... 168
4.3.6. Switch VS Bridge Table ...................................................................................... 168
4.3.6.1. VS Bridge Capability and Status (00h) ........................................... 171
4.3.6.2. VS Bridge Control (04h and 08h) ................................................... 173
4.3.6.3. VC ID to VL Map (0Ch and 10h) ................................................... 176
4.3.6.4. VC Resource Fields (14h to 20h) .................................................... 178
4.3.6.5. Hot-Plug Virtual Signals Interface (24h and 28h)........................... 179
4.3.7. Misc. Switch Configuration Space Requirements............................................... 182
4.3.7.1. ARI Support .................................................................................... 182
4.3.7.2. BIST (switch) .................................................................................. 182
4.4. VL ARBITRATION TABLE ............................................................................................. 183
4.5. PERFORMANCE MONITORING AND STATISTICS COLLECTION ....................................... 184
4.5.1. Configuration Space Fields ................................................................................ 185
4.5.1.1. Statistics Capability (+00h) ............................................................. 186
4.5.1.2. Statistics Block Start / Busy (+04h) ................................................ 187
4.5.1.3. Statistics Descriptor Table Offset (+08h)........................................ 188
4.5.1.4. Statistics Block Table Offset (+0Ch) .............................................. 189
4.5.2. Statistics Descriptor Table.................................................................................. 189
4.5.2.1. Standard Statistics ........................................................................... 191
4.5.2.2. Standard Filters................................................................................ 193
4.5.3. Statistics Block Table.......................................................................................... 195
4.5.3.1. Statistics Block Capability (00h)..................................................... 195
4.5.3.2. Statistics Table Offset (04h)............................................................ 196
4.5.3.3. Statistics Wait Time (08h)............................................................... 196
4.5.3.4. Statistics Count Time (0Ch) ............................................................ 197
4.5.4. Statistics Counter Table...................................................................................... 197
4.5.4.1. Statistics Capability and Control (00h) ........................................... 198
4.5.4.2. Statistics Filter Enable and Control (04h) ....................................... 199
4.5.4.3. Statistics Counter Low (08h)........................................................... 199
4.5.4.4. Statistics Counter High (0Ch) ......................................................... 199
5. ERROR HANDLING ....................................................................................................... 200
5.1. PCIE ERROR MAPPING TO MR ..................................................................................... 200
5.2. MR ERRORS ................................................................................................................. 203
6. HOT PLUG........................................................................................................................ 205

6 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

6.1. MRA SWITCH .............................................................................................................. 205


6.1.1. PCI Express Capability: Slot Capability Register.............................................. 206
6.1.2. PCI Express Capability: Slot Control Register .................................................. 208
6.1.3. PCI Express Capability: Slot Status Register..................................................... 210
6.1.4. PCI Express Capability: Device Capabilities Register ...................................... 212
6.1.5. Virtual Hot-Plug Signals Interface Registers ..................................................... 212
6.1.6. Physical Slot Registers........................................................................................ 213
6.1.7. Physical Hot-Plug Signals Interface................................................................... 214
6.2. VIRTUAL DEVICE MIGRATION ..................................................................................... 214
6.3. BASE PCI EXPRESS DEVICE MIGRATION ..................................................................... 214
7. POWER MANAGEMENT .............................................................................................. 215
7.1. OVERVIEW ................................................................................................................... 215
7.2. VIRTUAL D-STATE ....................................................................................................... 215
7.3. LINK POWER STATES ................................................................................................... 215
7.4. MULTI-ROOT ASPM.................................................................................................... 216
7.5. SLOT CLOCK AND COMMON CLOCK CONFIGURATION ................................................. 216
7.6. MULTI-ROOT WAKE-UP .............................................................................................. 216
7.6.1. PME Triggers Beacon / Wake#........................................................................... 217
7.6.2. Beacon / Wake# Triggers MSI ............................................................................ 217
7.6.3. Beacon / WAKE# Triggers Beacon / WAKE#..................................................... 218
7.7. MULTI-ROOT PME TURN OFF ..................................................................................... 218
7.8. MULTI-ROOT POWER CONTROLLER............................................................................. 218
7.9. MULTI-ROOT POWER BUDGETING ............................................................................... 219
8. CONGESTION MANAGEMENT .................................................................................. 220
8.1. OVERVIEW ................................................................................................................... 220
8.2. CONGESTION ISOLATION .............................................................................................. 221
8.2.1. Virtual Links........................................................................................................ 221
8.2.1.1. Virtual Link and Virtual Hierarchy Identification .......................... 222
8.2.1.2. VL and VC Configuration ............................................................... 223
8.2.1.3. VC to VL Mapping.......................................................................... 224
8.2.1.4. Arbitration ....................................................................................... 225
8.2.2. Bypass Queues .................................................................................................... 227
8.2.3. Flow Control Rules ............................................................................................. 228
8.3. PERFORMANCE MONITORING AND STATISTICS COLLECTION ....................................... 230

Figures
Figure 1-1: Generic Server Blade Configuration..................................................................................17
Figure 1-2: Example Multi-Root Topology ..........................................................................................30
Figure 1-3: Example Multi-Root Topology as viewed from Host A ................................................32
Figure 1-4: Example Multi-Root Topology as viewed from Host C ................................................33
Figure 2-1: MRInit DLLP Format .........................................................................................................34
Figure 2-2: MR to MR Initialization Sequence.....................................................................................36

PCISIG Confidential 7
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Figure 2-3: MR to Base Initialization .....................................................................................................36


Figure 2-4: MR Data Link Control and Management State Machine (MR-DLCMSM).................37
Figure 2-5: MR InitFC State Machine....................................................................................................40
Figure 2-6: MRUpdateFC Header DLLP..............................................................................................41
Figure 2-7: MRInitFC1_VL Header DLLP ..........................................................................................42
Figure 2-8: MRInitFC1_VH Header DLLP .........................................................................................42
Figure 2-9: MRInitFC2_VL Header DLLP ..........................................................................................42
Figure 2-10: MRInitFC2_VH Header DLLP .........................................................................................43
Figure 2-11: MRUpdateFC Data DLLP ..................................................................................................43
Figure 2-12: MRInitFC1_VL Data DLLP ..............................................................................................43
Figure 2-13: MRInitFC1_VH Data DLLP..............................................................................................44
Figure 2-14: MRInitFC1_VL Data DLLP ..............................................................................................44
Figure 2-15: MRInitFC2_VH Data DLLP..............................................................................................44
Figure 2-16: TLP Prefix Header Location ..............................................................................................49
Figure 2-17: TLP Prefix Header Layout..................................................................................................50
Figure 2-18: MR Dataflow Examples ......................................................................................................55
Figure 2-19: Reset DLLP Example: Topology......................................................................................57
Figure 2-20: Reset DLLP Example: Host A’s View .............................................................................58
Figure 2-21: Reset DLLP...........................................................................................................................60
Figure 2-22: Upstream Link Partner RESET SM ..................................................................................62
Figure 2-23: Downstream Link Partner RESET SM.............................................................................64
Figure 3-1: Example MR Topology .......................................................................................................74
Figure 3-2: Example MR Topology with Initial Link Directions ......................................................76
Figure 3-3: Example MR Topology with Component Discovery Details........................................79
Figure 3-4: Example MR Topology: RP 0 View ..................................................................................81
Figure 3-5: Example MR Topology: RP 1 View ..................................................................................81
Figure 3-6: Example MR Topology: RP 2 View ..................................................................................82
Figure 3-7: Example MR Topology: RP 3 View ..................................................................................82
Figure 3-8: Example MR Topology: Device L PF / VF Mapping....................................................86
Figure 3-9: Example MR Topology: Device M VF Mapping ............................................................87
Figure 3-10: Example Mapping of VFs ...................................................................................................92
Figure 3-11: VF Migration State Diagram...............................................................................................94
Figure 3-12: Initial VF State ......................................................................................................................96
Figure 3-13: Example Topology with MR Root Complex ...................................................................98
Figure 4-1: MR Device Configuration Space..................................................................................... 108
Figure 4-2: Device MR-IOV Capability ............................................................................................. 110
Figure 4-3: LVF Table........................................................................................................................... 123
Figure 4-4: Device Function Table ..................................................................................................... 126
Figure 4-5: Switch Mapping Tables..................................................................................................... 137
Figure 4-6: Switch MR-IOV Capability Diagram.............................................................................. 140
Figure 4-7: Switch Port Table .............................................................................................................. 149
Figure 4-8: Port Interrupt Status Bitmap ........................................................................................... 164
Figure 4-9: VS Table.............................................................................................................................. 165
Figure 4-10: VS Interrupt Status Bitmap.............................................................................................. 168
Figure 4-11: VS Bridge Table ................................................................................................................. 169
Figure 4-12: Example VL Arbitration Table with 32 Phases ............................................................ 183
Figure 4-13: Performance Monitoring and Statistics Collection Tables .......................................... 185
Figure 6-1: Slot Capabilities Register (PCIe Figure 7-18) ................................................................ 206

8 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Figure 6-2: Slot Control Register (PCIe Figure 7-19)....................................................................... 208


Figure 6-3: Slot Status Register (PCIe Figure 7-20) .......................................................................... 210
Figure 6-4: PCI Express Capabilities Register (PCIe Figure 7-11)................................................. 212
Figure 7-1: Multi-Root Wake-Up Scenarios....................................................................................... 217
Figure 8-1: (VH, VC) to VL Mapping ................................................................................................ 222
Figure 8-2: MRA Arbitration Model ................................................................................................... 226
Figure 8-3: Logical Queuing Structure Associated with a VL Receiver......................................... 228
Figure 8-4: Statistics Collection Process............................................................................................. 230

Tables
Table 2-1: MRInit DLLP Fields.................................................................................................................35
Table 2-2: MR Flow Control DLLP Fields ..............................................................................................45
Table 2-3: Reset DLLP Example: Events and Actions..........................................................................59
Table 3-1: Port Types – Example MR Topology ....................................................................................74
Table 3-2: Example MR Topology VH and VF Mapping Policy .........................................................80
Table 3-3: Example Topology: Switch A VS Bridge Table Contents ..................................................83
Table 3-4: Example Topology: Switch B VS Bridge Table Contents...................................................84
Table 3-5: Valid MR State Transitions for VF Migration ......................................................................94
Table 3-6: Example MR Root Topology: RP Associations ...................................................................98
Table 4-1: MR-IOV Fields....................................................................................................................... 100
Table 4-2: Device MR-IOV Extended Capability Header.................................................................. 110
Table 4-3: MR-IOV Capabilities............................................................................................................. 111
Table 4-4: Device MR-IOV Control...................................................................................................... 112
Table 4-5: Device MR-IOV Status ......................................................................................................... 114
Table 4-6: Device MR-IOV VH Counts ............................................................................................... 115
Table 4-7: Device Function Table Offset ............................................................................................. 115
Table 4-8: VF MVF Region..................................................................................................................... 116
Table 4-9: LVF Table Offset................................................................................................................... 117
Table 4-10: Device VL Arbitration Capability and Status ................................................................ 118
Table 4-11: Device VL Arbitration Control ....................................................................................... 120
Table 4-12: Device VL Arbitration Table Offset .............................................................................. 121
Table 4-13: Device MR Error Status ................................................................................................... 122
Table 4-14: Device MR Error Control ................................................................................................ 122
Table 4-15: LVF Table Entry................................................................................................................ 123
Table 4-16: Function Capability 1 (00h).............................................................................................. 126
Table 4-17: Function Capability 2 (04h).............................................................................................. 126
Table 4-18: Function Control 1 (08h) ................................................................................................. 127
Table 4-19: Function Control 2 (0Ch)................................................................................................. 129
Table 4-20: Function Control 3 (10h) ................................................................................................. 130
Table 4-21: Function Status .................................................................................................................. 131
Table 4-22: Function Table VC to VL Map 1 (VC Capability) ....................................................... 132
Table 4-23: Function Table VC to VL Map 2 (VC Capability) ....................................................... 133
Table 4-24: Function Table VC Resource State................................................................................. 134
Table 4-25: VH Table MFVC Resource State.................................................................................... 135

PCISIG Confidential 9
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Table 4-26: Switch MR-IOV Extended Capability Header.............................................................. 140


Table 4-27: Switch MR-IOV Capability Bits ...................................................................................... 140
Table 4-28: Switch MR-IOV Control Bits.......................................................................................... 141
Table 4-29: Switch MR-IOV Status Bits ............................................................................................. 142
Table 4-30: Switch MR-IOV This Bridge Map .................................................................................. 143
Table 4-31: Switch MR-IOV Authorization Control ........................................................................ 144
Table 4-32: Switch MR-IOV Authorization Control ........................................................................ 144
Table 4-33: Switch Port Table Sizes .................................................................................................... 145
Table 4-34: Switch Port Table Offset.................................................................................................. 145
Table 4-35: Switch VS Table Sizes....................................................................................................... 146
Table 4-36: Switch VS Table Offset .................................................................................................... 146
Table 4-37: Switch VS Bridge Table Sizes .......................................................................................... 147
Table 4-38: Switch VS Bridge Table Offset........................................................................................ 147
Table 4-39: Switch Port Capability....................................................................................................... 150
Table 4-40: Switch Port Control1 ........................................................................................................ 151
Table 4-41: Switch Port Control2 ........................................................................................................ 152
Table 4-42: Switch Port Status.............................................................................................................. 154
Table 4-43: Switch Link Partner Training Status ............................................................................... 155
Table 4-44: Switch Link Partner Training Status – MRInit DLLP Bits ......................................... 156
Table 4-45: Switch VL Arbitration Capability and Status................................................................. 157
Table 4-46: Switch VL Arbitration Control........................................................................................ 159
Table 4-47: Switch VL Arbitration Table Offset ............................................................................... 160
Table 4-48: Switch MR Error Status.................................................................................................... 160
Table 4-49: Switch MR Error Control................................................................................................. 160
Table 4-50: PCI Bridge Control ........................................................................................................... 161
Table 4-51: Port PCIe Capability Structure ........................................................................................ 162
Table 4-52: Switch VS Capability and Status...................................................................................... 165
Table 4-53: Switch VS Capability and Status...................................................................................... 166
Table 4-54: Switch VS Bridge Capability and Status ......................................................................... 171
Table 4-55: Switch VS Bridge Control 1............................................................................................. 173
Table 4-56: Switch VS Bridge Control 2............................................................................................. 175
Table 4-57: Switch VS Bridge VC ID to VL Map 1.......................................................................... 176
Table 4-58: Switch VS Bridge VC ID to VL Map 2.......................................................................... 177
Table 4-59: VC Resource State............................................................................................................. 179
Table 4-60: Virtual Hot-Plug Signals Interface 1............................................................................... 180
Table 4-61: Hot-Plug Signals Interface 2 ............................................................................................ 181
Table 4-62: Definition of the 4-bit Entries in the VL Arbitration Table..................................... 183
Table 4-63: Length of the VL Arbitration Table ............................................................................... 184
Table 4-64: Statistics Table Sizes.......................................................................................................... 186
Table 4-65: Statistics Start / Busy ........................................................................................................ 187
Table 4-66: Statistics Descriptor Table Offset ................................................................................... 188
Table 4-67: Statistics Block Table Offset............................................................................................ 189
Table 4-68: Statistics Descriptor Table Entry .................................................................................... 190
Table 4-69: Standard Statistics.............................................................................................................. 191
Table 4-70: TLP Filters.......................................................................................................................... 193
Table 4-71: Credit Filters....................................................................................................................... 194
Table 4-72: DLLP Filters....................................................................................................................... 195

10 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Table 4-73: Statistics Block Capability................................................................................................. 196


Table 4-74: Statistics Table Offset ....................................................................................................... 196
Table 4-75: Statistics Wait Time ........................................................................................................... 196
Table 4-76: Statistics Count Time ........................................................................................................ 197
Table 4-77: Statistics Capability and Control...................................................................................... 198
Table 4-78: Statistics Filter Enable and Control ................................................................................ 199
Table 4-79: Statistics Counter Low...................................................................................................... 199
Table 4-80: Statistics Counter High ..................................................................................................... 199
Table 5-1: Physical Layer Error List....................................................................................................... 200
Table 5-2: Data Link Layer Error List ................................................................................................... 200
Table 5-3: Transaction Layer Error List ................................................................................................ 201
Table 5-4: MR Error List ......................................................................................................................... 203
Table 6-1: Virtual Mapping: PCIe Slot Capabilities Register.............................................................. 206
Table 6-2: Virtual Mapping: PCIe Slot Control Register .................................................................... 208
Table 6-3: Virtual Mapping: PCIe Slot Control Register .................................................................... 210
Table 6-4: Virtual Mapping: PCIe Capabilities Register...................................................................... 212

PCISIG Confidential 11
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

12 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Objective of the Specification


The purpose of this document is to specify PCI I/O virtualization and sharing technology. The
specification is focused on multi-root topologies, e.g. a server blade enclosure that uses a PCI
Express Switch-based topology to connect server blades to PCI Express Devices or PCI Express-to-
PCI Bridges and enable the leaf Devices to be serially or simultaneously shared by one or more
server blades.
This document is to be used in conjunction with, and does not supersede, the terms and conditions
specified in the PCI-SIGTM Trademark and Logo Usage Guidelines document.

Document Organization
Chapter 1 specifies.
Chapter 2 specifies.

Documentation Conventions
Capitalization
Some terms are capitalized to distinguish their definition in the context of this document from their
common English meaning. Words not capitalized have their common English meaning. When terms
such as “memory write” or “memory read” appear completely in lower case, they include all
transactions of that type.
Register names and the names of fields and bits in registers and headers are presented with the first
letter capitalized and the remainder in lower case.

Numbers and Number Bases


Hexadecimal numbers are written with a lower case “h” suffix, e.g., 0FFFFh and 80h. Hexadecimal
numbers larger than four digits are represented with a space dividing each group of four digits, as in
1E FFFF FFFFh. Binary numbers are written with a lower case “b” suffix, e.g., 1001b and 10b.
Binary numbers larger than four digits are written with a space dividing each group of four digits, as
in 1000 0101 0010b.
All other numbers are decimal.

Reference Information

Reference information is provided in various places to assist the reader and does not represent a
requirement of this document. Such references are indicated by the abbreviation “(ref).” For
example, in some places, a clock that is specified to have a minimum period of 400 ps also includes
the reference information maximum clock frequency of “2.5 GHz (ref).” Requirements of other
specifications also appear in various places throughout this document and are marked as reference
information. Every effort has been made to guarantee that this information accurately reflects the
referenced document; however, in case of a discrepancy, the original document takes precedence.

PCISIG Confidential 13
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Implementation Notes
Implementation Notes should not be considered to be part of this specification. They are included
for clarification and illustration only. Implementation Notes within this document are enclosed in a
box and set apart from other text.

Terms and Abbreviations


Base Function (BF) Function used to manage the MR features of an MR Device.
IO Virtualization (IOV) The capability for a single PCIe component to be used by more than
one SI.
Switch Management A connection to a MR Switch that can be used to manage a MR
Port topology. Switch Management Ports may be actual PCIe Ports or may
be Vendor Specific non-PCIe Ports.
MR - Multi-Root A PCIe topology that interconnects more than one Root Port through
Topology multi-root aware Switches.
MR PCIM Multi-Root PCIM
MRA Multi-Root Aware – A PCIe component that supports the multi-root
extensions defined in this specification.
MRA VF Virtual Function (VF) in an MRA Device
MR Enabled Link PCIe link using the Multi-Root Encapsulated Link Protocol.
MR Egress Point in a PCIe fabric that a TLP exits the MR Fabric
MR Fabric Subset of a PCIe fabric containing MR Enabled Links and connected
MRA components.
MR Ingress Point in a PCIe fabric that a TLP enters the MR Fabric.
PCI Bus Memory Refers to the address portion of a PCI Memory Transaction
Address
PCI Manager (PCIM) Software that enumerates and configures an IOV topology.
PCIe PCI Express
Physical Address The address used by the system memory controller to access system
memory.
Physical Function (PF) An IOV-capable Function per the Single Root IOV Specification. In MR,
PFs exist within Virtual Hierarchies (VHs).
RC Root Complex per the PCI Express Base Specification
RP Root Port per the PCI Express Base Specification
System Image (SI) A software component running on a Virtual System to which specific
virtual and physical Devices can be assigned. Specification of the
behavior and architecture of an SI is outside the scope of this
specification. Examples of SIs include guest operating systems and

14 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

shared/non-shared protected domain Device drivers.


SR PCIM Single Root PCIM
Virtual Device Collection of PFs and VFs that operate as a Device within a VH.
Virtual Function (VF) An IOV-capable Function per the Single Root IOV Specification. In MR,
VFs exist within Virtual Hierarchies (VHs).
Virtual Hierarchy (VH) Portion of an MR Topology assigned to a single PCIe Domain
Hierarchy.
Virtual Hierarchy Link Local Number designating a VH.
Number (VHN)
Virtual Intermediary A software component supporting one or more SIs – colloquially
(VI) known as a Hypervisor or Virtual Machine Monitor. Specification of
the behavior and architecture of VI is outside the scope of this
specification.
Virtual Switch (VS) A logical PCIe Switch associated with a single VH implemented in a
MR Switch.

PCISIG Confidential 15
1
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

1. Architectural Overview
Within the industry, significant effort has been expended to increase the effective hardware resource
utilization through the use of virtualization technology. The Multi-Root I/O Virtualization (MR-
IOV) specification defines the extensions to the PCI Express (PCIe) specification suite to enable
multiple non-coherent Root Complexes (RCs) to share PCI hardware resources.
To illustrate how this technology can be used to increase effective resource utilization, consider the
following generic server blade configuration as illustrated in the Figure 1-1 below.
‰ The server blade configuration contains four server blades and two external fabric switches.
In a high availability configuration, nominally there would be two external fabric switches of
each type to avoid any single point of failure (SPOF) for a total of four switches.
‰ In this example, each switch provides two external connectivity ports though more can be
configured to deliver high availability solutions as well as increased aggregate performance.
‰ Each server blade is provisioned with two PCIe Endpoint Devices – an Ethernet and a storage
area network (SAN) device. This translates to a total of eight PCIe Endpoint Devices. These
PCIe Devices are point-to-point connected to a Root Port (RP) [not shown] – either emitted
by a chipset or a processor.
‰ Each I/O device and switch port is typically provisioned to enable any I/O device to operate
at full bandwidth.
Depending upon workload, the example configuration’s I/O resource capacity may be excessive
resulting in under-utilized hardware.

Blade Enclosure

Server Blade Server Blade Server Blade Server Blade

PCIe Ethernet PCIe Storage PCIe Ethernet PCIe Storage PCIe Ethernet PCIe Storage PCIe Ethernet PCIe Storage
Device Device Device Device Device Device Device Device

Storage Area
Ethernet
Network
Switch
Switch

External Connectivty

Figure 1-1: Generic Server Blade Configuration

PCISIG Confidential 17
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Through the application of MR-IOV technology, the prior example server blade configuration can
be transformed as illustrated in Figure 1-2.

Figure 1-2: Example Server Blade Configuration using MR-IOV Technology


In contrast to the figure 1-1, the following can be observed:
‰ The server blade configuration contains four server blades. The server blades do not contain
PCIe Endpoint Devices but instead connect a Root Port (RP) to a Multi-Root Aware (MRA)
PCIe Switch.
‰ The two external fabric switches are replaced by two MRA PCIe Switches. While the details
will be described in a subsequent section, the following should be noted:

• Unlike a PCIe Switch which contains a single upstream Port and can only be claimed by
a single RP, a MRA PCIe Switch contains multiple upstream Ports to enable it to
connect to multiple RPs. This enables the MRA PCIe Switch to be a shared component
within the configuration.
• Multiple MRA PCIe Switches can be interconnected in a variety of topologies to create
high availability solutions as well as provide increased I/O fan-out capacity.
‰ In place of eight PCIe Endpoint Devices – four of each type – the example MR-IOV
configuration contains four MRA PCIe Endpoint Devices – two of each type. Each MRA
PCIe Endpoint Device is attached to a MRA PCIe Switch downstream Port enabling each to
be accessed, and thus shared, by any of the server blades.
‰ Unlike the prior example configuration where I/O is dedicated to each server blade, a MR-
IOV based configuration enables the I/O to be dynamically assigned. A fraction or an entire
I/O Device can be assigned to each server blade based on its workload requirements.

18 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

As noted above, a MR-IOV configuration reduces the component count and change the component
composition. This specification will cover the elements involved in delivering a MR-IOV
configuration.

1.1. How does MR-IOV Work?


To understand how MR-IOV works, first examine an example platform configuration devoid of any
Single Root IOV (SR-IOV) – see the Single-Root I/O Virtualization Specification - or MR-IOV
technology as illustrated in Figure 1-3.

SI SI SI SI SI SI

Virtualization Intermediary

Processor

Memory

Translation Agent Address Translation and


(TA) Protection Table (ATPT)

Root Complex (RC)


Root Root
Port Port
(RP) (RP)

ATC
PCIe Device Switch

ATC ATC
PCIe Device PCIe Device

Figure 1-3: Example Platform Configuration without SR-IOV or MR-IOV Technology


The above example platform is composed of the following:
‰ A processor capable of running any operating system (OS). In virtualized environments, a
processor can execute a Virtual Intermediary (VI) which abstracts or virtualizes all hardware
from one or more System Images (SI). Each SI contains a virtual I/O device driver while the
VI will contain the I/O-specific device drivers and perform all I/O hardware accesses.
‰ A memory controller and associated memory.

‰ A Translation Agent (TA) and Address Translation and Protection Table (ATPT).

PCISIG Confidential 19
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

• A TA parses the contents of a PCIe DMA request transaction (TLP) to index an ATPT
to derive the physical address translation and access rights. The purposes for having
DMA address translation vary and include:
♦ Limiting the destructiveness of a ‘broken’ or miss-programmed DMA I/O Function
♦ Providing for scatter/gather
♦ Ability to redirect message-signaled interrupts (e.g., MSI or MSI-X) to different
address ranges without requiring coordination with the underlying I/O Function
♦ Address space conversion (32-bit I/O Function to larger system address space)
♦ Virtualization support
• A PCIe Endpoint may contain an Address Translation Cache (ATC) in support of the
PCI-SIG Address Translation Services Specification.
‰ A PCIe Root Complex (RC) containing one or more Root Ports (RP) with direct-attached or
PCIe Switch-attached PCIe Devices or PCI / PCI-X Bridges. Each RP defines a unique
hierarchy domain (see PCI Express Base Specification).
Now examine a platform that supports SR-IOV technology as illustrated in Figure 1-4 below.

20 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

SI SI SI SI SI SI

Virtualization Intermediary SR-PCIM

Processor

Memory

Translation Agent Address Translation and


(TA) Protection Table (ATPT)

Root Complex (RC)


Root Root
Port Port
(RP) (RP)

ATC

VF0 ….. VFN Switch

PCIe Device
ATC ATC

VF0 ….. VFN VF0 ….. VFN

PCIe Device PCIe Device

Figure 1-4: Example Platform Configuration with SR-IOV Technology


The differences between the platforms in Figure 1-3 and Figure 1-4 are:
‰ The PCIe Device support the SR-IOV capability as defined in the Single Root I/O Virtualization
Specification. SR-IOV enables a PCIe Device to support multiple Virtual Functions (VFs).
‰ SR-PCIM (Single Root PCI Manager) is responsible for SR-IOV hardware configuration and
management. SR-PCIM is nominally integrated within a VI, though a variety of
implementation options exist and remain outside the scope of the PCI-SIG specifications.
From the prior two figures, the following key semantics should be kept in mind:
1. The OS or VI has exclusive control over each PCIe Component – RC, RP, Switch, Link,
PCIe-to-PCI/PCI-X Bridge, Device, and Function.
a. All PCI enumeration, configuration operations, reset, power management, and event
handling, e.g. errors, must only be initiated or processed by the OS or VI.
Operations initiated by a SI are trapped by the VI and processed on its behalf.
2. Each RP acts as the terminus for all upstream targeted operations, e.g. error event
notification.

PCISIG Confidential 21
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

In order to support either example platform while preserving these semantics, the PCI components
underneath each RP must be virtualized and logically overlaid on the MRA PCIe Switches and
Devices as illustrated in Figure 1-5. The virtualized PCI components are referred to as a Virtual
Hierarchy (VH). A VH has the following attributes:
‰ Each VH must contain at least one PCIe Switch.

• The PCIe Switch will be a virtualized component implemented over of a MRA Switch.
• The PCIe Switch functionality and semantics are per the PCI Express Base Specification.
‰ Each VH may contain any mix of PCIe Devices, MRA PCIe Devices, or PCIe to PCI / PCI-X
Bridges as illustrated in Figure 1-6.

• A PCIe Device is a device that does not support the MR-IOV Capability. Such a device
must only be visible in a single VH at a time. A PCIe Device can be serially shared
among a set of accessible VH within the MR-IOV topology. The PCIe Device is serially
deleted from the current source VH and added to destination VH.
• A PCIe Device may support the SR-IOV Capability which enables it to be shared by
multiple SI executing above a single RP.
• A PCIe to PCI / PCI-X Bridge can only be visible in a single VH at a time. As with a
PCIe Device, a PCIe to PCI / PCI-X Bridge can be serially shared among a set of
accessible VH within the MR-IOV topology using a conceptually similar deletion /
addition process as a PCIe Device.
♦ The SR-IOV Capability does not apply to the PCIe to PCI / PCI-X Bridge. The
bridge and all associated PCI / PCI-X devices can only be configured in a single OS,
VI, or SI at a time.
• A MRA PCIe Device is a device which supports the MR-IOV Capability. Such a device
can be visible in multiple VH at a time depending upon the MR-IOV resources
provisioned. A MRA PCIe Device can be added or deleted from any accessible VH
within a MR-IOV topology.
• A MRA PCIe Device must support the SR-IOV Capability. This enables it to be shared
by multiple SI executing above each RP.
‰ The MR-IOV topology must contain at least one MRA PCIe Switch.

• Multiple MRA PCIe Switches can be provisioned and interconnected in a variety of


topologies – tree, fat-tree, star, mesh, etc.
• A MRA PCIe Switch must contain two or more upstream Ports. Each upstream Port
must connect to a RP which acts as the root of the VH.

22 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Hierarchy A Hierarchy B

Root Complex (RC) Root Complex (RC)

Root Port Root Port


(RP) (RP)

Switch Switch

PCIe Device PCIe Device PCIe Device PCIe Device

Physical Components

Root Complex (RC) Root Complex (RC)

Root Port Root Port


(RP) (RP)

MRA
Switch

MRA PCIe MRA PCIe


Device Device

Figure 1-5: Two Virtual Hierarchies (VH) Implemented over Shared Physical
Components

PCISIG Confidential 23
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Full Range of Physical Components

Root Complex (RC) Root Complex (RC) Root Complex (RC) Root Complex (RC)

Root Port Root Port Root Port MRA Root


(RP) (RP) (RP) Port (RP)

MRA MRA
Switch Switch

MRA PCIe SR-IOV PCIe PCIe to PCI


Device Device PCIe Bridge
Switch

PCI /PCI-X
PCIe Device Device

Figure 1-6: Physical Components that can be supported in a MR-IOV Topology

1.1.1. MRA Components


The prior section illustrated that MR-IOV is primarily the overlaying of multiple VH over a shared
physical set of MRA and non-MRA components. To further understand how MR-IOV works, let’s
examine the MRA components in more detail.

1.1.1.1. Multi-Root Aware Root Port (MRA RP)

As illustrated in Figure 1-6, a PCIe RP supporting either a single OS or a VI with multiple SI can be
connected to a MRA Switch and access multiple downstream devices and bridges. The PCIe RP
though, is restricted to a single VH. In order to enable multiple VH to be accessed, a MRA PCIe
RP is required. A MRA PCIe RP differs from a PCIe RP in the following ways:
‰ A MRA PCIe RP maintains state to delineate each VH. At a high level, this amounts to a set
of resource mapping tables to translate the I/O function associated with each SI into a VH
and MR I/O function identifier.
‰ A MRA PCIe RP participates in the MR transaction encapsulation protocol (see subsequent
section for details) to enable a MRA PCIe Switch to derive the VH and associated routing
information.

24 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

‰ A MRA PCIe RP emits a MRA Link. A MRA Link is identical the physical layer to a PCIe
Link as defined in the PCI Express Base Specification. A MRA Link differs at the data link layer
where a new set of DLLP are defined to support the MR-IOV protocol.
‰ A MRA PCIe RP may implement MRA congestion management (see subsequent chapter for
details).

Figure 1-7 PCIe RP and MRA PCIe RP Functional Block Comparison

1.1.1.2. Multi-Root Aware PCIe Device (MRA PCIe Device)

A MR-IOV platform may contain any mix of PCIe Devices, SR-IOV PCIe Devices, or MR-IOV
PCIe Devices. Figure 1-8 illustrates a functional block comparison between these three types of
devices.

Figure 1-8 PCIe Device, SR-IOV, and MRA PCIe Device Functional Block Comparison

PCISIG Confidential 25
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

A MRA IOV PCIe Device differs from a PCIe Device and a SR-IOV PCIe Device in the following
ways:
‰ The MRA IOV PCIe Device must support the new MR DLLP protocol.

• A PCIe Device or a SR-IOV PCIe Device do not support the MR-IOV capability and
therefore are unable to participate in this protocol. A MRA PCIe Switch must subsume
all responsibility for forwarding transactions and event handling on behalf of these
devices through the MR-IOV topology. The MRA PCIe Switch will perform all
encapsulation or de-encapsulation as appropriate.
‰ The MRA IOV PCIe Device must support the MR-IOV transaction encapsulation protocol.

• The MR-IOV encapsulation protocol provides VH identification information to the


MRA PCIe Switch to enable the transaction to be transparently forwarded through the
MR-IOV topology without requiring modification to the PCI Express Base Specification
TLP protocol or contents.
‰ The MRA IOC PCIe Device is composed of a set of Physical Functions (PF).

• Each PF supports a full PCI Configuration and PCIe Extended Configuration Space.
• Each PF supports a full BAR.
• Each PF must only be assigned to a single accessible VH at a given time.
• One or more PF may be assigned to any accessible VH.
• A PF and its associated resources may be migrated from one VH to another.
• Each PF may support zero or more Virtual Functions (VF).
♦ VF share resources including some portions of the configuration space with the
associated PF.
♦ A VF exists only within the VH associated with the PF.
♦ A MRA PCIe Device with multiple PF and zero VF per PF is conceptually
equivalent to a single function PCIe Device configured per VH.
• The number of VF provisioned per PF may vary on a per PF basis.
• Each PF represents a single device-specific functionality, e.g. an Ethernet controller, a
SATA controller, etc. Subsequently, each VF must represent the same device-specific
functionality. This enables the existing device driver models to be supported.

1.1.1.3. Multi-Root PCI Manager (MR-PCIM)

Each MRA component must support a corresponding MR-IOV capability. This capability is
accessed and configured by the Multi Root PCI Manager (MR-PCIM). MR-PCIM can be
implemented anywhere within the MR-IOV topology – for example, above a RP as illustrated in
Figure 1-9, or, for example, through a private interface provided by a MRA PCIe Switch.
MR-PCIM is responsibilities include:

26 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

‰ Enumeration of the physical components within the MR-IOV topology. MR-PCIM must
determine what components are or are not MR-IOV capable, how components are
interconnected, and what the PCIe and MR-IOV resources they provide.
‰ MR-PCIM configures the components and resources that comprise each VH. The policies to
determine this are outside the scope of this specification.
‰ Given the physical hardware is shared among a set of VH, MR-PCIM configures all PCIe and
MR-IOV attributes including: link signaling rate, VC arbitration, Port arbitration, Access
Control Services (ACS), etc.
‰ Given the physical hardware is shared among a set of VH, MR-PCIM processes or controls
various events, e.g. RESET, physical hardware failure, surprise add / remove, error handling,
etc.
‰ <continue to build up list of responsibilities>

SI SI SI SI SI

MR-
Virtual Intermediary SI Virtual Intermediary PCIM

Root Complex (RC) Root Complex (RC) Root Complex (RC) Root Complex (RC)

Root Port Root Port Root Port MRA Root


(RP) (RP) (RP) Port (RP)

MRA MRA
Switch Switch

MRA PCIe MRA PCIe MRA PCIe MRA PCIe


Device Device Device Device

Figure 1-9 MRA PCIM in a MR-IOV Topology

1.1.1.4. Multi-Root Aware PCIe Switch (MRA PCIe Switch)

In a prior section it was noted that a MRA Switch is conceptually the overlay of multiple PCIe
Switches onto a single physical package. This is illustrated in more detail in Figure 1-9.

PCISIG Confidential 27
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

PCIe Switch MRA PCIe Switch

RP RP RP RP

PCIe PCIe PCIe PCIe


Base Base Base Base

P2P P2P P2P P2P

P2P P2P P2P P2P P2P P2P


P2P P2P P2P
IOV P2P P2P P2P
CFG

PCIe PCIe PCIe MRA MRA MRA


Base Base Base Link Link Link
PCIe PCIe PCIe MRA PCIe MRA PCIe MRA PCIe
Device Device Device Device Device Device

Figure 1-9 PCIe Switch and MRA PCIe Switch Functional Block Comparison
The PCIe Switch is composed of a set of logical P2P bridges with a single upstream Port attached to
a PCIe RP and one or more downstream Ports attached to either a PCIe Device or a PCIe to PCI /
PCI-X Bridge. A PCIe Switch also operates using a single address space.
In contrast a PCIe Switch, a MRA Switch is as follows:
‰ A MRA Switch is composed of one or more upstream Ports attached to either PCIe RP or
MRA PCIe RP or the downstream Port of a MRA Switch.

• If the upstream Port is attached to a PCIe RP, the MRA Switch must transparently
provide all MRA related services on behalf of the PCIe RP.
‰ A MRA Switch is composed of one or more downstream Ports attached to PCIe Devices,
MRA PCIe Devices, PCIe Switch upstream Ports, MRA Switch upstream Ports, or PCIe to
PCI / PCI-X Bridges.
‰ A set of logical P2P bridges that constitute a VH. A MRA Switch must support two or more
VH.
‰ Each VH represent a separate address space. The combination of a VH identifier and the
address contained within the PCIe TLP enable the MRA Switch to forward the a TLP to
appropriate egress Port as well as a MRA RP or MRA PCIe Device to delineate which PF or
VF is the source or sink of the PCIe TLP.
‰ <fill in additional attributes / operational semantics to flesh out this section>

28 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

1.1.2. MR Initialization Overview


<tbd>

1.1.3. MR Transaction Encapsulation Overview


<likely migrate sections within chapter 2 here>

1.1.4. MR Congestion Management Overview


<tbd>

1.1.5. MR Error and Event Handling Overview


<tbd>

1.1.6. MR-IOV and ARI (Alternative Routing Identifier)


<tbd>

1.1.7. MR-IOV Relationship to SR-IOV and ATS


Briefly:
‰ A MRA PCIe Device must support the SR-IOV Capability per the Single Root I/O Virtualization
Specification.

• A SR-IOV PCIe Device must support the PCI Express Base Specification.
• By requiring these specifications to be supported, the number of permutations is reduced
further enhancing the ability deploy and interoperate across a wide range of solution
options.
‰ A MRA PCIe Device may support the Address Translation Services Specification.

PCISIG Confidential 29
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

1.2. Overview of MR Transaction Layer


Deleted: Figure 1-2
Figure 1-2 shows an example Multi-Root Topology

Host A Host B Host C

SI 1 SI 2 SI 3 SI 4 SI 5
VI VI

PCIe Link

MR Enabled Link

MR Fabric

SR Dev W SR/MR Dev X MR Dev Y PCIe Dev Z

Figure 1-2: Example Multi-Root Topology


The solid links are using the Multi-Root Wire Protocol described in this specification. The dashed
links are using the PCIe Protocol. Functions on SR Devices are designated PF f and VF f,s where
f represents the Function Number of the PF and s indicates which VF Slot belonging to the PF is
involved. Functions on MR Devices are designated PF h:f, VF h:f,s or F h:f where h is added to
indicate which Virtual Hierarchy (VH) is involved.
This example shows a single Multi-Root Topology. TLPs inside the MR Topology are labeled with
the VH they belong to. TLPs outside the MR Topology belong to a single VH and no label is
needed. The MR Ingress point is the point where a TLP first encounters an MR Topology. The MR
Egress point is where a TLP Exits an MR Topology. These points exist inside some MR component
(in this example, the Switches and Devices X and Y).

30 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

In this example, all Root Ports use PCIe protocol. Each Root Port is the root of a PCIe Hierarchy.
There are three VHs (A, B, C) each associated with one of the root Ports. Hosts A and C are
running a Virtual Intermediary that supports SR sharing. B is running an Operating System directly
(no Virtual Intermediary is involved).
In this example, there are four Devices, one of each variety.
‰ Device W is a Single-Root IOV Device. It is assigned to VH A. Two System Images (SIs) are
in use on Host A. The Virtual Intermediary on Host A has further assigned VF 0,1 to SI 1 and
VF 0,2 to SI 2.
‰ Device X is using both Single Root and Multi-Root sharing. PF 1:0 is assigned to VH B.
PF 0:1 is assigned to VH C. The Virtual Intermediary running on Host C has further assigned
VF 0:1,1 to SI 4 and VF 0:1,2 to SI 5. The MR features of the device are managed through the
Base Function which is assigned to VH C.
‰ Device Y is using only Multi-Root sharing. F 1:0 is assigned to VH A and F 0:0 is assigned to
VH C. In VH A, the Virtual Intermediary has further assigned F 1:0 to SI 2. In VH C, the
Virtual Intermediary has further assigned F 0:0 to SI 4. The MR features of the device are
managed through the Base Function which is assigned to VH C.
‰ Device Z is a 3 Function PCIe Device. It is assigned to VH C. Virtual Intermediary software
on VH C has further assigned F 0 and F 1 to SI 4 and F 2 to SI 5.
All Switches shown in this example are Multi-Root Aware (MRA). Non-MRA Switches are also
possible; however, such Switches and all components below them will be associated with a single
Root Port in a single VH. Note that this non-MR sub-tree can be a mixture of SR aware and non-SR
aware PCIe components.
Multi-Root Aware Components enforce separation between VHs. Software running in one VH is
not allowed to affect other VHs. For example, every VH has a complete and independent address
space.
Figure 1-3 shows the same example as Figure 1-2 but only shows the components visible to Host A. Deleted: Figure 1-3
The MRA Switches and Devices appear to software as Single Root equivalents. Deleted: Figure 1-2

Similarly, Figure 1-4 also shows the example from Figure 1-2 but shows only components visible to Deleted: Figure 1-4
Host C. Note that the link between Switch 0 and Switch 1 changes direction between these views of Deleted: Figure 1-2
the topology. In Multi-Root systems, the logical upstream / downstream direction of a link is a per-
VH concept and is distinct from physical link direction that was established during link bring up.

PCISIG Confidential 31
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Host A

SI 1 SI 2
VI

PCIe Link

MR Enabled Link

MR Fabric

SR Dev W MR Dev Y

Figure 1-3: Example Multi-Root Topology as viewed from Host A

32 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Host C

SI 4 SI 5
VI

PCIe Link

MR Enabled Link

MR Fabric

SR/MR Dev X MR Dev Y PCIe Dev Z

Figure 1-4: Example Multi-Root Topology as viewed from Host C

PCISIG Confidential 33
2
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

2. MR Protocol Changes
There are five parts of the PCIe Protocol that are changed to support Multi-Root operation.
‰ Negotiating use of the MR link protocol

‰ Tagging TLPs with a TLP Prefix Header

‰ Supporting per-VH equivalent of Hot Reset

‰ Supporting enhanced, per-VH flow control

‰ Processing of certain messages (e.g. INTx, PME)

These will be discussed in the following sections.

2.1. MR Link Protocol Negotiation


MR Links use an enhanced link protocol. For each link, the use of this enhanced link protocol is
determined by a negotiation between link partners. This negotiation occurs after the Physical Layer’s
Link training but before PCIe Flow Control negotiation.
During this negotiation, MR components determine whether their Link Partner agrees to use the
MR link protocol and which version of the link protocol to use. They also communicate certain MR
parameters (MaxVH and MaxVL).
Deleted: Figure 2-1
This negotiation occurs by using the new MRInit DLLP as shown in Figure 2-1 and Table 2-1.
Deleted: Table 2-1
Phase
VH FC

Auth

Figure 2-1: MRInit DLLP Format

34 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Table 2-1: MRInit DLLP Fields

Location Description
Byte 0 Type – The value 0000 0001b indicates an MRInit DLLP.
Bits 7:0
Byte 1 Phase – Indicates the phase of the MR Negotiation protocol.
Bit 7
Byte 1 VH FC – If Set, indicates that the sender supports per-VH, VL flow
Bit 6 control. If Clear, indicates that the sender only supports per-VL flow
control. Must be set for Switches.
Byte 1 Reserved
Bits 5:4
Byte 1 Device / Port Type – Device / Port Type of the sender. Encoding is
Bits 3:0 identical to the Device / Port Type field in the PCI Express Capability
(Offset 02, Bits 7:4).
Byte 2 Authorized – If Device / Port Type indicates a Switch, indicates that
Bit 7 the sender is an Authorized port on an MR Capable Switch. Must be
0b if Device / Port Type does not indicate a Switch.
Byte 2 Protocol Version – must be 001b for this version of the specification.
Bits 6:4
Byte 2 Reserved – Transmit 0b.
Bit 3
Byte 2 MaxVL – Maximum number of Virtual Links supported by the sender.
Bits 2:0
Byte 3 MaxVH – Maximum number of Virtual Hierarchies supported by the
sender.

The MRInit DLLP is a new encoding, not defined in PCI Express. PCI Express components are
required to ignore DLLPs not defined in the Base PCI Express Specification (see section 3.5.2.2 of
the PCI Express 2.0 Specification, or section 3.5.2.1 of the PCI Express 1.1 Specification).
MR Devices will always negotiate to use the MR link protocol. MRA Switches and Root Ports will
only negotiate if enabled to do so.
The negotiation sequence between two MR components that are enabled to use the link in MR
Deleted: Figure 2-2
mode is shown in Figure 2-2.

PCISIG Confidential 35
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

MR Component 1 MR Component 2
MRInit Phase 0 MRInit Phase 0

MRInit Phase 1

MR Negotiated
MR InitFC1 VL0
MR InitFC1 VL0

MR InitFC2 VL0

MR VL0 Negotiated
MR InitFC1 VH0 / VL0
MR InitFC1 VH0 / VL0

MR InitFC2 VH0 / VL0

MR VH0 / VL0 Negotiated

Figure 2-2: MR to MR Initialization Sequence


The negotiation sequence between an MR component and a Base component (or an MR component
Deleted: Figure 2-3
that is not enabled to use MR on the link) is shown in Figure 2-3.

Figure 2-3: MR to Base Initialization


The Data Link Control and Management State Machine (DLCMSM) is modified for MR operation.
The states for this machine are described below, and are shown in Figure 2-4. Italic text and green Deleted: Figure 2-4
dashed lines represent Base PCI Express.
States:
‰ DL_Inactive – Physical Layer reporting Link is non-operational or nothing is connected to the Port

‰ DL_Init – Physical Layer reporting Link is operational, initialize Base Flow Control for the default Virtual
Channel

36 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

‰ DL_NegotiateMR – Physical Layer reporting Link is operational, negotiate the use of the MR
Link Protocol
‰ DL_InitMR – Physical Link reporting Link is operational, initialize MR Flow Control for VL0
and for VH0 / VL0
‰ DL_Active – Normal operation mode

DL_NegotiateMR

DL_InitMR

Figure 2-4: MR Data Link Control and Management State Machine (MR-DLCMSM)
The DL_Inactive state rules are modified as follows:
‰ DL_Inactive


• Exit to DL_Init if:
♦ Indication from the Transaction Layer that the Link is not disabled by software, the link is not
enabled for MR operation and the Physical Layer reports Physical LinkUp = 1b
• Exit to DL_NegotiateMR if:
♦ Indication from the Transaction Layer that the Link is not disabled by software, the
link is enabled for MR operation and the Physical Layer reports Physical LinkUp =
1b

PCISIG Confidential 37
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Rules are added for the new DL_NegotiateMR and DL_InitMR states:
‰ DL_NegotiateMR

• While in DL_NegotiateMR:
♦ Negotiate MR Link Protocol usage following the MR Link Protocol Negotiation
described in Section 2.1.1
♦ Report DL_Down status
♦ The Data Link Layer of a Port with DL_Down status is permitted to discard any
received TLPs provided that it does not acknowledge those TLPs by sending one or
more Ack DLLPs
• Exit to DL_Init if:
♦ MR Link Protocol negotiation completes indicating PCIe Link Mode and the
Physical Layer continues to report Physical LinkUp = 1b
• Exit to DL_InitMR if:
♦ MR Link Protocol negotiation completes indicating MR Link Mode and the Physical
Layer continues to report Physical LinkUp = 1b
• Terminate attempt to negotiate MR Link Protocol and Exit to DL_Inactive if:
♦ Physical Layer reports Physical LinkUp = 0b
‰ DL_InitMR

• While in DL_InitMR:
♦ Initialize Flow Control for the default Virtual Link, VL0, and default Virtual
Hierarchy on the default Virtual Link, VH0/VL0, following the Flow Control
initialization protocol described in Section 2.1.2
♦ Report DL_Down status while in state MRFC_INIT1_VL, MRFC_INIT2_VL or
MRFC_INIT1_VH; DL_Up status in state MRFC_INIT2_VH
♦ The Data Link Layer of a Port with DL_Down status is permitted to discard any
received TLPs provided that it does not acknowledge those TLPs by sending one or
more Ack DLLPs
• Exit to DL_Active if:
♦ Flow Control initialization completes successfully, and the Physical Layer continues
to report Physical LinkUp = 1b
• Terminate attempt to initialize Flow Control and Exit to DL_Inactive if:
♦ Physical Layer reports Physical LinkUp = 0b

2.1.1. MR Link Protocol Negotiation


The MR Link Protocol Negotiation involves two phases.

38 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

‰ Phase 0 entered when MR Link Protocol Negotiation is required

• Entrance to DL_NegotiateMR state


‰ While in Phase 0 of the MR Link Protocol Negotiation:

• Transaction Layer must block transmission of TLPs and DLLPs other than the MRInit
DLLP
• Continuously transmit an MRInit DLLP as shown in Figure 2-1. MaxVL, MaxVH, Auth Deleted: Figure 2-1
and Device / Port Type reflect the sender’s values. Protocol Version is 1h. Phase is 0b.
♦ This does not block Physical Layer initiated transmissions (for example, Ordered
Sets)
• Process received MRInit DLLPs:
♦ Record the MaxVL, MaxVH, VH FC, Device / Port Type and Authorized values
• Exit to Phase 1 of the MR Link Protocol Negotiation if:
♦ MRInit DLLP was received with Protocol Version 1h (with either Phase).
• Exit indicating PCIe Link Mode if:
♦ InitFC1 DLLP was received.
‰ While in Phase 1 of the MR Link Protocol:

• Transaction Layer must block transmission of TLPs and DLLPs other than the MRInit
DLLP
• Continuously transmit an MRInit DLLP as shown in Figure 2-1. MaxVL, MaxVH, Auth Deleted: Figure 2-1
and Device / Port Type reflect the sender’s values. Protocol Version is 1h. Phase is 1b.
♦ This does not block Physical Layer initiated transmissions (for example, Ordered
Sets)
• Process received MRInit DLLPs:
♦ Ignore the MaxVL, MaxVH, VF FC, Device / Port Type and Authorized values
• Exit indicating MR Link Mode if either:
♦ MRInit DLLP was received with Protocol Version 1h and Phase 1b
♦ Any MRInitFC1_VL DLLP was received

2.1.2. MR Flow Control Initialization Protocol


Before starting normal operation following power-up or interconnect Reset, it is necessary to
initialize Flow Control for the default Virtual Link, VL0 and the default Virtual Hierarchy, VH0. In
addition, when additional Virtual Links (VLs) and Virtual Hierarchies (VHs) are enabled, the Flow
Control initialization process must be completed for each newly enabled VL or VH before it can be
used. This section describes the initialization process that is used for all VLs and all VHs. Note that
since VL0 and VH0 are enabled before all other VLs and VHs, no TLP traffic of any kind will be

PCISIG Confidential 39
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

active prior to initialization of VL0 and VH0. However, when additional VLs or VHs are being
initialized there will typically be TLP traffic flowing on other, already enabled, VLs and VHs. Such
traffic has no direct effect on the initialization process for the additional VL(s) and VH(s).
Deleted: Figure 2-5
There are four states in the MR VL/VH initialization process. These states are shown in Figure 2-5.

VL Enabled

Non-Zero VH Mapped to a VC
MRFC_INIT1_VL and VC Enabled

MRFC_INIT2_VL MR_INITFC1_VH

MR_INITFC2_VH

Finshed

Figure 2-5: MR InitFC State Machine


There is a distinct instance of this state machine for each enabled VH and VL. Initialization between
different enabled VLs and VHs proceeds in parallel.
State MRFC_INIT1_VL is entered when a VL is enabled either automatically (VH0) or explicitly
(VL1-7). State MRFC_INIT1_VH is entered either from MR_INITFC2_VL or because some VH
was mapped by MR-PCIM and enabled by software in the VH resulting in a VH for which flow
control negotiation has not yet occurred.
The rules for this process are given in the following Section 2.1.2.2.

2.1.2.1. MR Flow Control DLLP Encoding

MR Flow Control DLLPs need to communicate the VH number in addition to the Base PCIe
information. It is no longer possible to fit this information in a single DLLP. Consequently, for MR-
IOV, Header and Data credits are communicated using different DLLPs. The formats for the
various MF Flow Control DLLPs are shown in Figure 2-6 through Figure 2-15. The DLLP fields are Deleted: Figure 2-6
Deleted: Figure 2-15

40 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

described in Table 2-2. If a Receiver advertises infinite VH credits, then the Receiver must transmit Deleted: Table 2-2
PCIe Base UpdateFC DLLPs for that VL instead of the MRUpdateFC DLLPs described below.
‰ The VL Credit Type is implicitly determined from the DLLP Type Encoding used by the
UpdateFC DLLP
‰ The VC ID field in the UpdateFC DLLP contains the VL number.

‰ The HdrFC field in the UpdateFC DLLP contains VL header credit value for the indicated
type (P, NP, or Cpl)
‰ The DataFC field in the UpdateFC DLLP contains the VL data credit value for the indicated
type (P, NP, or Cpl)

Figure 2-6: MRUpdateFC Header DLLP

PCISIG Confidential 41
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Figure 2-7: MRInitFC1_VL Header DLLP

Figure 2-8: MRInitFC1_VH Header DLLP

Figure 2-9: MRInitFC2_VL Header DLLP

42 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Figure 2-10: MRInitFC2_VH Header DLLP

Figure 2-11: MRUpdateFC Data DLLP

Figure 2-12: MRInitFC1_VL Data DLLP

PCISIG Confidential 43
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Figure 2-13: MRInitFC1_VH Data DLLP

Figure 2-14: MRInitFC1_VL Data DLLP

Figure 2-15: MRInitFC2_VH Data DLLP

44 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Table 2-2: MR Flow Control DLLP Fields

Location Description
Byte 0 DLLP Type – 1011b indicates an MRUpdateFC DLLP
Bits 7:4 0111b indicates an MRInitFC1_VL or MRInitFC1_VH DLLP
1111b indicates an MRInitFC2_VL or MRInitFC2_VH DLLP
Byte 0 VL Number – Indicates the Virtual Link
Bits 2:0
Byte 1 VH Number – Indicates the Virtual Hierarchy. This field is reserved if VHO is Set.
Byte 2 VH Omitted – Indicates whether the VH Number field is present in the DLLP. If Set, this
Bit 7 indicates the DLLP is MRInitFC1_VL or MRInitFC2_VL and the VH Number is omitted.
Byte 2 TT – TLP Type 00b indicates Posted credit
Bits 6:5 01b indicates Non-Posted credit
10b indicates Completion credit
11b is Reserved
Byte 3 Credit Type – 0 indicates Header Credit
Bit 4 1 indicates Data Credit
Byte 3 Bits 3:0 Data Credit Value – If Credit Type is Set. PCIe encoding applies (i.e. during
& Byte 4 initialization, zero means infinite)
Byte 4 Header Credit Value – If Credit Type is Clear. PCIe encoding applies (i.e. during
initialization, zero means infinite)

2.1.2.2. MR Flow Control Initialization State Machine Rules

‰ If at any time during initialization for VLs 1-7 a VL is disabled, any flow control initialization
process involving that VL is terminated
‰ Rules for state MRFC_INIT1_VL:

• Entered when initialization of a VL (VLx) is required


♦ Entrance to DL_InitMR state (VLx = VL0)
♦ When a VL (VLx = VL1-7) is enabled by software (see Sections 4.2.1.3 and 4.3.3.2)
• While in MRFC_INIT1_VL:
♦ Transaction Layer must block transmission of TLPs using VLx
♦ Transmit the following six MRInitFC1_VL DLLPs for VLx in the following relative
order:
■ MRInitFC1_VL – P – Header (first)
■ MRInitFC1_VL – P – Data (second)
■ MRInitFC1_VL – NP – Header (third)
■ MRInitFC1_VL – NP – Data (fourth)

PCISIG Confidential 45
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

■ MRInitFC1_VL – Cpl – Header (fifth)


■ MRInitFC1_VL – Cpl – Data (sixth)
♦ The six MRInitFC1 DLLPs must be transmitted at least once every 34 μs.
■ Time spent in the Recovery LTSSM state does not contribute to this limit.
■ It is strongly encouraged that the MRInitFC1 DLLP transmissions are repeated
frequently, particularly when there are no other TLPs or DLLPs available for
transmission.
♦ Except as needed to ensure at least the required frequency of MRInitFC1 DLLP
transmission, the Data Link Layer must not block other transmissions
■ Note that this includes all Physical Layer initiated transmissions (for example,
Ordered Sets), Ack and Nak DLLPs (when applicable), and TLPs using VLs and
VHs that have previously completed initialization (when applicable)
♦ Process received MRInitFC1_VL and MRInitFC2_VL DLLPs for VLx:
■ Record the indicated FC unit values
■ Set Flag FI1 once FC unit values have been recorded for each of P_Hdr,
P_Data, NP_Hdr, NP_Data, Cpl_Hdr, and Cpl_Data of VLx
• Exit to MRFC_INIT2_VL if:
♦ Flag FI1 has been set indicating that FC unit values have been recorded for each of
P_Hdr, P_Data, NP_Hdr, NP_Data, Cpl_Hdr, and Cpl_Data of VLx
‰ Rules for state MRFC_INIT2_VL:

• While in MRFC_INIT2_VL:
♦ Transaction Layer must block transmission of TLPs using VLx
♦ Transmit the following six MRInitFC2 DLLPs for VLx in the following relative
order:
■ MRInitFC2_VL – P – Header (first)
■ MRInitFC2_VL – P – Data (second)
■ MRInitFC2_VL – NP – Header(third)
■ MRInitFC2_VL – NP – Data (fourth)
■ MRInitFC2_VL – Cpl – Header (fifth)
■ MRInitFC2_VL – Cpl – Data (sixth)
♦ The six MRInitFC2_VL DLLPs must be transmitted at least once every 34 μs.
■ Time spent in the Recovery LTSSM state does not contribute to this limit.
■ It is strongly encouraged that the MRInitFC2_VL DLLP transmissions are
repeated frequently, particularly when there are no other TLPs or DLLPs
available for transmission.

46 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

♦ Except as needed to ensure at least the required frequency of MRInitFC2 DLLP


transmission, the Data Link Layer must not block other transmissions
■ Note that this includes all Physical Layer initiated transmissions (for example,
Ordered Sets), Ack and Nak DLLPs (when applicable), and TLPs using VLs and
VHs that have previously completed initialization (when applicable)
♦ Process received MRInitFC1_VL and MRInitFC2_VL DLLPs for VLx:
■ Ignore the indicated FC unit values
■ Set flag FI2 on receipt of any MRInitFC2_VL DLLP for VLx
♦ Set flag FI2 on receipt of any MRInitFC1_VH DLLP for any VH on VLx
• Exit to MRFC_INIT1_VH if:
♦ Flag FI2 has been set indicating that FC unit values have been recorded for each of
P_Hdr, P_Data, NP_Hdr, NP_Data, Cpl_Hdr, and Cpl_Data of VLx
‰ Rules for state MRFC_INIT1_VH:

• Entered when initialization of a VH/VL (VHx VLy) is required


♦ After completing MRFC_INIT2_VL (VHx VLy = VH0 VL0-7)
♦ When some VC from a VH is mapped this VL and the indicated VC is enabled by
software in the VH resulting a VH that have not been negotiated (VHx VLy = VH1-
255 VL1-7, i.e. any non-zero VH on any VL)
• If the link partner indicated no support for per-VH, VL flow control during
DL_NegotiateMR (i.e., the VH FC bit was cleared in sent MRInit DLLPs), then infinite
VH credits must be advertised for all VH Credit Types.
• While in MRFC_INIT1_VH:
♦ Transaction Layer must block transmission of TLPs using VHx VLy
♦ Transmit the following six MRInitFC1 DLLPs for VHx VLy in the following relative
order:
■ MRInitFC1_VH – P – Header (first)
■ MRInitFC1_VH – P – Data (second)
■ MRInitFC1_VH – NP – Header (third)
■ MRInitFC1_VH – NP – Data (fourth)
■ MRInitFC1_VH – Cpl – Header (fifth)
■ MRInitFC1_VH – Cpl – Data (sixth)
♦ The six MRInitFC1_VH DLLPs must be transmitted at least once every 34 μs.
■ Time spent in the Recovery LTSSM state does not contribute to this limit.
■ It is strongly encouraged that the MRInitFC1 DLLP transmissions are repeated
frequently, particularly when there are no other TLPs or DLLPs available for
transmission.

PCISIG Confidential 47
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

♦ Except as needed to ensure at least the required frequency of MRInitFC1 DLLP


transmission, the Data Link Layer must not block other transmissions
■ Note that this includes all Physical Layer initiated transmissions (for example,
Ordered Sets), Ack and Nak DLLPs (when applicable), and TLPs using VLs and
VHs that have previously completed initialization (when applicable)
♦ Process received MRInitFC1_VH and MRInitFC2_VH DLLPs for VHx VLy:
■ Record the indicated FC unit values
■ Set Flag FI3 once FC unit values have been recorded for each of P_Hdr,
P_Data, NP_Hdr, NP_Data, Cpl_Hdr, and Cpl_Data of VHx VLy
• Exit to MRFC_INIT2_VH if:
♦ Flag FI3 has been set indicating that FC unit values have been recorded for each of
P_Hdr, P_Data, NP_Hdr, NP_Data, Cpl_Hdr, and Cpl_Data of VHx VLy
‰ Rules for state MRFC_INIT2_VH:

• While in MRFC_INIT2_VH:
♦ Transaction Layer must block transmission of TLPs using VHx VLy
♦ If the link partner indicated no support for per-VH, VL flow control during
DL_NegotiateMR (i.e., the VH VL bit was cleared in sent MRInit DLLPs), then
infinite VH credits must be advertised for all VH Credit Types.
♦ Transmit the following six MRInitFC2 DLLPs for VHx VLy in the following relative
order:
■ MRInitFC2_VH – P – Header (first)
■ MRInitFC2_VH – P – Data (second)
■ MRInitFC2_VH – NP – Header (third)
■ MRInitFC2_VH – NP – Data (fourth)
■ MRInitFC2_VH – Cpl – Header (fifth)
■ MRInitFC2_VH – Cpl – Data (sixth)
♦ The six MRInitFC2_VH DLLPs must be transmitted at least once every 34 μs.
■ Time spent in the Recovery LTSSM state does not contribute to this limit.
■ It is strongly encouraged that the MRInitFC2_VH DLLP transmissions are
repeated frequently, particularly when there are no other TLPs or DLLPs
available for transmission.
♦ Except as needed to ensure at least the required frequency of MRInitFC2 DLLP
transmission, the Data Link Layer must not block other transmissions
■ Note that this includes all Physical Layer initiated transmissions (for example,
Ordered Sets), Ack and Nak DLLPs (when applicable), and TLPs using VLs and
VHs that have previously completed initialization (when applicable)
♦ Process received MRInitFC1_VH and MRInitFC2_VH DLLPs for VHx VLy:

48 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

■ Ignore the indicated FC unit values


■ Set flag FI4 on receipt of any MRInitFC2_VH DLLP for VHx VLy
♦ Set flag FI4 on receipt of any TLP on VHx VLy, or any UpdateFC DLLP for VHx
VLy
♦ Signal completion and exit if:
■ Flag FI4 has been set

2.2. TLP Prefix Tagging


After successful MR Link Protocol negotiation, links use the MR Enhanced Link Protocol. TLPs on
such links contain a TLP Prefix as shown in Figure 2-16. This prefix is located between the Deleted: Figure 2-16
Sequence Number and the PCIe TLP Header.

STP Reserved Sequence Number

Covered by LCRC
TLP Tag

Covered by ECRC
TLP Header (unchanged from PCIe)

TLP Data (optional)

ECRC (optional)

LCRC

END
Base PCIe TLP

Figure 2-16: TLP Prefix Header Location


The TLP Prefix is part of the wire packet. It is covered by LCRC and is retransmitted with the rest
of the TLP under the same rules. Sequence Numbers and the Ack/Nak protocol remain a per-link
notion and are not affected by Multi-Root operation.
The TLP Header, Data and ECRC are not changed from PCIe TLP usage.

PCISIG Confidential 49
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Figure 2-17: TLP Prefix Header Layout


The layout of the TLP Prefix Header is shown in Figure 2-17. The first byte is a fixed Prefix Deleted: Figure 2-17
Identifying Tag value. This value is not used by any TLP (now or in the future) so that the presence Comment [sdg1]: Coordination with
of the TLP Prefix can be determined by examining the TLP in isolation without needing additional the Protocol Working Group will occur to
define the value and appropriate steps will
link-state information. Doing so allows test equipment to better understand things without needing be taken to ensure that the selected value
to observe the initialization sequence. remains “unused” forever.

Virtual Hierarchy Numbers (VHN) are link-local. On a link that supports n VHs, valid VHNs are
[0 .. n-1]. A given TLP may take on different VHN values on each Multi-Root link that it traverses.
The VL # field contains the Virtual Link Number (VL #). It contains information used to support
Congestion Management and Isolation. TLPs are assigned to VLs using a combination of PCIe TC
to VC mapping rules and new MR VC to VL mapping rules. See Chapter 8 Congestion Management
for details.
The Global Key field is used to guard against “VH Hopping”. VH Hopping is when a TLP in one
VH inadvertently ends up on a different VH (either due to MR-PCIM table configuration errors or
due to a hardware error inside a MR Switch). The Global Key is added to the TLP when the TLP
Prefix is attached at the MR Ingress point. The Global Key value selected is based on which VH is
being used. This Global Key value is preserved, unchanged, through subsequent MR Switches. The
Global Key value is validated against the expected value at various points in the MR topology. The
Global key is always validated at MR Egress. The Global Key value is optionally validated at either
(or both) MR Switch Input or MR Switch Output. Global Key checking is similar to PCIe ECRC
checking. Failure of any Global Key validation is an unrecoverable error.
MR-PCIM software configures the tables used to generate and check the Global Key values. To
avoid Global Key mismatch errors, MR-PCIM must configure tables such that all TLPs in a given
VH have the same Global Key value. To provide maximum protection, MR-PCIM should configure
tables so that TLPs in different VHs have different Global Keys (this protection is not provided
between VHs with duplicate Global Key values).

2.2.1. MR Switch Transaction Layer Processing


MR Switches implement a set of Virtual Switches (VS). Each VS is assigned by software to a single
Virtual Hierarchy that, in turn is associated with a single Root Port. Within a Switch, TLPs are
associated with a VS and are routed within that VS using PCIe routing and ordering rules (e.g.
Comment [m2]: A VP properties list
address routed, ID routed, broadcast to/from root). TLPs for unrelated VHs are unordered. should be used so that all atrributes are
maintained in one place. This may be part
TLPs are processed by a switch as follows: of chapter 1.

1. TLPs arrive on an Input Port of the switch with a link-local Input VH Number.
a. For a link operating in MR mode, the Input VHN is contained in the TLP Prefix
Header.

50 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

b. For a link operating in Base PCIe mode the Input VHN is 0.


2. Switch mapping tables are used to map the incoming TLP to a VS and a specific PCI-to-PCI
Bridge of that VS.
f(Input Port, Input VHN) Æ {VS, Input Bridge}
3. Global Key input processing occurs.
a. If the input link was operating in MR mode, the Global Key from the TLP is
optionally validated against the global key associated with the VS. This is the
“entering check” as described in Section 4.3.5.2.
b. If the input link was operating in Base PCIe mode, there was no Global Key to
check against. This is the MR Ingres point for this TLP and the TLP is assigned the
Global Key associated with the VS.
4. The TLP is routed within the VS. This mapping uses conventional PCIe rules using mapping
tables controlled by software operating in the VH. If the TLP is consumed by the VS, these
rules indicate routing within the VS (i.e. which Type 1 Header, etc.). If the TLP is being
forwarded onward, these rules select an Output Bridge and an output Virtual Circuit (VC).
g(Input Bridge, Base TLP Header) Æ {Type 1 Header …} if TLP is local to the VS
Æ {Output Bridge, VC} if TLP is being forwarded
5. For TLPs being consumed by the VS or being forwarded to an Output Port operating in
Base PCIe mode, this switch is the MR Egress point for the TLP and Global Key checking
occurs. The Global Key associated with the TLP is optionally validated against the Global
Key associated with the VS. This is the “terminating check” as described in Section 4.3.5.2.
6. For TLPs being forwarded, the switch mapping tables are used to map the outgoing TLP to
an Output Port, an Output VHN of that Port and an Output Virtual Link (VL) of that Port
h(VS, Output Bridge, VC) Æ {Output Port, Output VHN, Output VL}
7. The VL and (VH, VL) Flow Control gates are checked using the Output Port’s values to
verify that there are sufficient credits to forward the TLP.
8. Port Arbitration is performed using PCIe rules within the VS. Each Downstream Bridge is
considered a distinct Port for this purpose.
9. For an output link operating in MR mode, VH Arbitration occurs within the Output Port,
Output VL. Arbitration scheme used is fixed round robin.
10. For an output link operating in MR mode, VL Arbitration occurs within the Output Port.
Arbitration scheme is controlled by the VL Arbitration information in the Port Table.
11. For an output link operating in Base PCIe mode, VC Arbitration occurs within the Output
Port. Arbitration scheme is controlled by software using VC Arbitration information
associated with the Type 1 Header within the VS. Programmable VC Arbitration is optional
in PCIe and remains optional for links operating in Base PCIe mode. VC Arbitration is not
supported for links operating in MR mode.
12. The Output Port forwards the TLP.

PCISIG Confidential 51
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

a. For an output link operating in Base PCIe mode, the Output VHN is zero. The
Virtual Circuit (VC) used to transmit the TLP was determined in step 4 above. The
Output VL value is not used.
b. For an output link operating in MR mode new Output VHN and Output VL values
are placed in the TLP Prefix Header overwriting the input values (if any). The
Output VL is used to transmit the TLP.
13. Global Key output processing occurs. If the output link is operating in MR mode, the
Global Key from the TLP is optionally validated against the global key associated with the
VF. This is the “exiting check” as described in Section 4.3.5.2. If the output link is operating
in Base PCIe mode, the “terminating check” in step 5 is used instead.
14. When receive buffer space is made available, Flow Control is returned to the VL and
(VH, VL) contained in the TLP Prefix. The TC to VC maps and VC to VL maps on the
receiver are not used (PCIe does not transmit the VC so the TC to VC map is used for this
purpose).

2.2.2. MR Device Transaction Layer Processing


MR Devices implement a collection of Functions in each VH. VH0 is used to manage the device.
Within a Device, TLPs are associated with a VH and are routed within that VH using PCIe routing
Comment [m3]: A VP properties list
and ordering rules. TLPs for unrelated VHs are unordered. should be used so that all atrributes are
maintained in one place. This may be part
of chapter 1.
2.2.2.1. Receiving TLPs

TLPs received by a MR Device are processed as follows:


1. TLPs arrive from the MR Switch with a link-local Input VH Number contained in the TLP
Prefix Header.
2. The Global Key from the TLP is validated against the global key associated with the VH.
3. The addressed Function is determined using PCIe rules applied within the VH.
a. If the TLP is ID routed, the captured Bus # and the TLP’s ID are used.
b. If the TLP is Address routed, these various BARs and the TLP’s address are used.
4. The addressed Function is one of PF, VF, Function or BF. These are designated as PF h:f,
VF h:f,s, F h:f or BF bf respectively (using the nomenclature described in Section 1.2).
5. PCIe TC checking occurs. For VFs, the PF capabilities are used.
a. If the Function has a VC Capability, the TC of the TLP is checked to verify that the
TC is enabled in the VC Capability.
b. If the Function does not have a VC Capability, the TC must be 0.
c. If Function 0 in the VH supports has a MFVC, the TC of the TLP is checked to
verify that the TC is enabled in the MFVC Capability.

52 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

6. The VH number h and PF/VF/Function number f determine the associated BF and


corresponding Function Table Entry of that BF. For unmanaged Functions in VH0, there
will not be an entry.
7. For Functions, BFs and PFs, a Vendor Specific mechanism determines the underlying
Function.
8. For VFs, VF Mapping is used to determine the underlying Function
a. If the BF located in step 6 supports VF Mapping, the MVF number is contained in
the LVF Table Entry located using the VF # together with the TotalVFs and
BaseLVF values from the Function Table Entry. This MVF number describes the
underlying Function.
b. In addition if VF Migration is supported and enabled, the VF State in the LVF Table
Entry must also be checked to ensure the mapping is in either of the
Active.Available or Active.MigrateOut states.
c. If the BF located in step 6 does not support VF Mapping, a Vendor Specific
Mechanism determines the underlying Function.
9. The TLP is handed to underlying Function for processing.
10. When receive buffer space is made available, Flow Control is returned to the VL and
(VH, VL) contained in the TLP Prefix. The TC to VC maps and VC to VL maps on the
receiver are not used (PCIe does not transmit the VC so the TC to VC map is used for this
purpose).

2.2.2.2. Transmitting TLPs

TLPs transmitted by a MR Device are processed as follows:


1. The underlying Function, TC, TLP size and TLP type (P, NP, Cpl) are determined.
2. For Functions, BFs and PFs a Vendor Specific mechanism determines the VH and
Function #.
3. For VFs, if VF Mapping is used, a reverse VF map is used to determine the VH and
Function #. Otherwise, a Vendor Specific mechanism determines the VH and Function #.
4. Initial TC to VCID mapping and VC Arbitration occurs. The Mapped Function’s VC
Capability is used to convert TC to VCID. For VFs, the VC Capability of the PF is used.
5. If Function 0 of the VH supports the MFVC Capability, MFVC Arbitration and another
round of TC to VCID mapping occurs.
6. VCID to VL mapping occurs using the map contained in the Function Table Entry
associated with the Function or PF.
7. Now that the VL is known, VL and (VH, VL) Flow Control gates checking occurs.
8. VH Arbitration occurs with the VL. The arbitration scheme is fixed round-robin.
9. VL Arbitration occurs. The VL Arbitration tables are contained in one of the BFs.
10. The TLP is sent on the wire. CREDITS_CONSUMED is updated to reflect this.

PCISIG Confidential 53
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

2.2.3. Global Key Processing


Global Keys are processed at various points within an MR topology:
‰ Global Keys are assigned at the MR Ingress point of the TLP. For TLPs originated by an MR
Component, the MR Ingress point is inside that originating MR Component. For TLPs
originated by a Base PCIe Component, the MR Ingress point is the input port of an MR
Switch.
‰ Global Keys are checked at the MR Egress point of the TLP. For TLPs destined for an MR
Component, the MR Egress point is inside the destination MR Component. For TLPs
destined for a Base PCIe Component, the MR Egress point is the output port of an MR
Switch.
‰ MR Switches can optionally check Global Keys as TLPs enter the switch from an MR link.

‰ MR Switches can optionally check Global Keys as TLPs exit the switch on an MR link.

Hardware support for the check as TLPs enter and exit MR Switches is optional. Hardware support
for the MR Egress check is mandatory.
The Global key value is a 12 bit value, assigned by software to each VH. To achieve maximum
protection, software should assign each VH a distinct Global Key value.
A Global Key check passes if the value in the TLP and the expected value match. A Global Key
check also passes if either the expected value is 000h or the TLP value is 000h (the wild card value).
In other words, when MR Ingress points assign a TLP the Global Key value of 000h, Global Key
checks will always pass for that TLP. When a Global Key is programmed with an expected value of
000h, any checks that use that value will always pass.
Global Key checking is disabled by default. This allows software time to program the Global Key
registers before enabling checking.

54 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

2.2.4. MR TLP Dataflow Examples

Host A Host B Host C

SI 1 SI 2 SI 3 SI 4 SI 5
VI

PCIe Link

MR Enabled Link

MR Fabric

SR Dev W SR/MR Dev X MR Dev Y PCIe Dev Z

Figure 2-18: MR Dataflow Examples


Consider a Memory Read TLP initiated by SI 4 targeting PF 1:0 in EP Y:
1. The Host C to MRA Switch 0 link operated in PCIe mode. There is no TLP Prefix.
2. MRA Switch 0 has been programmed so that all TLPs arriving at Port n are associated with the
VS for VH C.
3. The VS inside MRA Switch 0 address routes the TLP to a Virtual Downstream Port associated
with physical Port o (the link headed towards MRA Switch 2). MR-PCIM has assigned VH 0 to
this Virtual Downstream Port. The TLP exits MRA Switch 0 with a TLP Prefix having VH
Number 0.

PCISIG Confidential 55
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4. The TLP arrives at MRA Switch 2 Port p labeled as belonging to VH 0. MRA Switch 2 has
been programmed so that all TLPs arriving at Port p labeled VH 0 are associated with the VS
for VH C.
5. The VS inside MRA Switch 2 address routes the TLP to a Virtual Downstream Port associated
with physical Port q (the link headed towards Device Y). MR-PCIM has assigned VH 1 to this
Virtual Downstream Port. The TLP exits MRA Switch 2 with a TLP Prefix having VH
Number 1.
6. The TLP arrives at Device Y labeled as belonging to VH 1. The Device hands the transaction to
PF 1:0 s for execution.
7. PF 1:0 completes the transaction and emits a completion TLP. Device Y sends this TLP out
Port r labeled with VH 1.
8. The TLP arrives at MRA Switch 2 Port q labeled as belonging to VH 1. MRA Switch 2 has
been programmed so that all TLPs arriving at Port q labeled VH 1 are associated with the VS
for VH C.
9. The VS inside MRA Switch 2 ID routes the TLP to a Virtual Upstream Port associated with
physical Port p (the link headed towards MRA Switch 0). MR-PCIM has assigned VH 0 to this
Virtual Upstream Port. The TLP exits MRA Switch 2 with a TLP Prefix having VH Number 0.
10. The TLP arrives at MRA Switch 0 Port o labeled as belonging to VH 0. MRA Switch 0 has
been programmed so that all TLPs arriving at Port o labeled VH 0 are associated with the VS
for VH C.
11. The VS inside MRA Switch 0 ID routes the TLP to a Virtual Upstream Port associated with
physical Port n (the link headed towards Host C). MR-PCIM has designated this link a PCIe
link and has assigned this Virtual Upstream Port to it. The TLP exits Switch 0 with a no TLP
Prefix.
12. SI 4 in Host C sees a completion for the Memory Read Transaction.

2.3. Per-VH RESET


In Multi-Root, Reset DLLPs are used to implement a per-VH equivalent of PCIe Hot Reset. DLLPs
are used to ensure “quick” Reset propagation by avoiding delays introduced by TLP ordering and
flow control rules.
Reset of a VH requires discarding TLPs associated with that VH. TLPs associated with VHs not in
reset must not be affected. This discard process may take more time that the PCIe equivalent. For
example, only some of TLPs in the retry buffer might be affected.
Reset DLLPs supports an acknowledgment protocol to mimic the TS1/TS2 ACK behavior of PCIe.
This allows an upstream component to know when a downstream component has entered reset.

56 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

2.3.1. Per-VH Reset Example


Figure 2-19 shows an example MR Topology. Figure 2-20 shows Host A’s view of the same Deleted: Figure 2-19
topology. Table 2-3 describes the events and associated actions involved in propagating a Reset from Deleted: Figure 2-20
Host A. Deleted: Table 2-3

Host A Host B Host C

SI 1 SI 2 SI 3 SI 4 SI 5
VI VI

PCIe Link

MR Enabled Link

X0 Y0

X1 Y1 X2

Z1 Y2

SR Dev W SR/MR Dev X MR Dev Y PCIe Dev Z

Figure 2-19: Reset DLLP Example: Topology

PCISIG Confidential 57
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Host A

SI 1 SI 2
VI

PCIe Link

MR Enabled Link

X0 Y0

X1 Y1 X2

Z1 Y2

SR Dev W MR Dev Y

Figure 2-20: Reset DLLP Example: Host A’s View

58 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Table 2-3: Reset DLLP Example: Events and Actions

Component Event Action


Host A Sends Hot Reset on RPA to MRA
Switch 1
MRA Switch 1 Sees Hot Reset on Port X1 which is the 1.1 Discards all TLPs headed out Port
Upstream Port of VH A X1
1.2 Acks Hot Reset on Port X1 (PCIe
TS1/2)
1.3 Sends Hot Reset out Port Z1 to
EP W
1.4 Sends Reset DLLP Request for VH
A out Port Y1 to MRA Switch 0
Unlabeled Ports not affected
All TLPs for VH A are discarded or 1.5 Nothing – VH A Upstream Port is not
marked for discard and Port X1 Retry MR Link
Buffer empty
Sees Reset Ack DLLP on VH A from 1.6 Knows MRA Switch 0 will not send
Port Y1 any more TLPs for VH A through
Port Y1
1.7 Stops resending Reset DLLP for
VH A out Port Y1
MRA Switch 0 Sees Reset DLLP Request for VH A on 0.1 Starts discarding TLPs for VH A
Port X0 which is the Upstream Port of
0.2 Sends Reset DLLP Request for
VH A
VH A out Port Y0 to MRA Switch 2
Unlabeled Ports not affected
All TLPs for VH A are discarded or 0.3 Send Reset Ack DLLP out Port X0 to
marked for discard and Port X0 Retry MRA Switch 1
Buffer contains no TLPs for VH A
Sees Reset Ack DLLP on VH A from 0.4 Knows MRA Switch 2 will not send
Port Y0 any more TLPs for VH A through
Port Y0
0.5 Stops resending Reset DLLP for
VH A out Port Y0
MRA Switch 2 Sees Reset DLLP Request for VH A on 2.1 Starts discarding TLPs for VH A
Port Z0 which is the Upstream Port of
2.2 Sends Reset DLLP Request for
VH A
VH A out Port Y2 to EP Y
Unlabeled Ports not affected
All TLPs for VH A are discarded or 2.3 Send Reset Ack DLLP out Port X0 to
marked for discard and Port X2 Retry MRA Switch 1
Buffer contains no TLPs for VH A
Sees Reset Ack DLLP on VH A from 2.4 Knows EP Y will not send any more
Port Y2 TLPs for VH A through Port Y2
2.5 Stops resending Reset DLLP for VH

PCISIG Confidential 59
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

A out Port Y2
Device W Sees PCIe Hot Reset W.1 Discards all TLPs, enters Reset
W.2 Acks Hot Reset (PCIe TS1/2)
Device Y Sees Reset DLLP Request on VH A Y.1 Starts discarding TLPs for VH A
All TLPs for VH A are discarded or Y.2 Send Reset Ack DLLP
marked for discard and no TLPs for
VH A are in the Retry Buffer
Device X and Nothing, not part of VH A
Device Z

2.3.2. RESET DLLP Format


Deleted: Figure 2-21
Figure 2-21 shows the bit encoding of the Reset DLLP.
The A bit is 0 for a Reset Request (propagating downstream) and is 1 for a Reset Ack (propagating
upstream).
The VH Group contains the upper bits of the VHN.
The Assert field contains one bit for each of 16 VHs within a VH Group. An Assert bit is 1 to
indicate that the associated VH is in Reset, and a 0 to indicate that the associated VH is not in Reset.

Figure 2-21: Reset DLLP


Reset DLLPs can be sent at any time a link is in DL_Active. Reset Request DLLPs request a
downstream link partner enter or exit Reset state on one or more VHs. Reset Ack DLLPs indicate
that the downstream link partner has seen and processed the reset request.
Sending a Reset DLLP with the assert bit set is making a promise about flushing of stale TLPs. In
particular, a component sending a Reset Request or Ack DLLP with Assert == 1 is claiming that all
TLPs from before the VH entered Reset have been flushed (either discarded or marked for later
discard). This claim includes TLPs that are sitting in the Retry buffer. Consequently, typical
implementations will delay sending the Reset DLLP until the affected TLPs are retired from the
Retry Buffer using normal Ack protocol (TLPs transmitted during this time will be discarded at the
remote end of the link).

2.3.3. RESET DLLP Processing


The following sections describe Reset DLLP processing steps in an upstream and downstream ends
of a link. Recall that upstream and downstream are relative to a VH. In particular, one end of a link

60 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

may be using the upstream state machine for some VHs and the downstream state machine for
other VHs.
The upstream and downstream state machines run in parallel on every VH.

2.3.3.1. Upstream State Machine

The following steps are used to enter Reset. This is triggered by a request to send a Reset DLLP on
a link. This request can occur for a variety of reasons (e.g. DL_DOWN, Reset DLLP Request, or
Hot Reset on an upstream Switch link, setting Secondary Bus Reset bit in some Type 1
configuration header, etc.)
1. Reset Requested.
2. Start discarding new TLPs received for this VH from this link.
3. Start discarding new TLPs to be transmitted for this VH on this link.
4. Discard or mark for discard any TLPs waiting to be sent for this VH on this link.
5. Wait until the Retry Buffer contains no TLPs for this VH (i.e. they have been acknowledged
and thus removed from the Retry Buffer).1
6. Schedule a Reset Request DLLP to be sent with the Assert bit = 1
7. Schedule resending the Reset Request DLLP approximately every 30 μsec until a Reset Ack
DLLP with the Assert bit = 1 is received (see Section 2.3.3.3 for details).
8. If a Reset Ack DLLP is received with Assert bit = 1, the remote end has entered Reset, stop
scheduling Reset Request DLLPs for this VH (Reset Request DLLPs may continue to be
sent if needed by other VHs).
The following steps are used to exit Reset. This is triggered by the condition causing the entry into
Reset going away (e.g. clearing Secondary Bus Reset, Physical LinkUp transitions from 0 to 1 etc.).
1. Schedule a Reset Request DLLP to be sent with the Assert bit = 0
2. Schedule resending the Reset Request DLLP approximately every 30 μsec until Reset Ack
DLLP with the Assert bit = 0 is received (see Section 2.3.3.3 for details).
3. If a Reset Ack DLLP has been received with Assert bit = 0, the remote end has exited Reset,
stop scheduling Reset Request DLLPs for this VH (Reset Request DLLPs may continue to
be sent if needed by other VHs).
Reset Request DLLPs may be coalesced so that multiple schedule events result in a single DLLP
being transmitted.
Timely propagation of Reset is important. Components should send Reset Request DLLPs as soon
as possible after detecting after starting to enter or exit Reset.

1This could be implemented by waiting until all TLPs, for any VH, have been flushed from the Retry Buffer that
were in the Retry Buffer prior to Step 3 (Step 3 ensures that no new TLPs for this VH will enter the Retry Buffer).

PCISIG Confidential 61
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Init

US Received Reset Ack DLLP US


Deassert (with bit == 0) VH
Wait 2 Up

Sent Reset Ack DLLP


(with bit == 0) Reset
Sent Reset Req DLLP Wanted
(with bit == 0) (on this VH)

US US
Deassert Assert
Wait 1 Wait 1

Send 0 when sending a Reset Req DLLP

Send 1 when sending a Reset Req DLLP


Reset
No Longer Wanted Tx Queue Empty, Retry Empty
(on this VH) (for this VH)

US
VH Sent Reset Req DLLP US
Down (with bit == 1) Assert
Wait 2

Received US
Reset Ack DLLP Sent Reset Req DLLP
Assert
(with bit == 1) (with bit == 1)
Wait 3

Figure 2-22: Upstream Link Partner RESET SM

2.3.3.2. Downstream State Machine

The following steps are used to enter Reset. This is triggered receiving a Reset DLLP on a link with
the Assert bit = 1.
1. Reset Request DLLP is received with the Assert bit = 1
2. Start discarding new TLPs received for this VH from this link.
3. Start discarding new TLPs to be transmitted for this VH on this link.

62 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4. Discard or mark for discard any TLPs waiting to be sent for this VH on this link.
5. All Functions in the VH enter the Reset
6. Wait until the Retry Buffer contains no TLPs for this VH (i.e. they have been acknowledged
and thus removed from the Retry Buffer).
7. Schedule a Reset Ack DLLP to be sent with the Assert bit = 1
8. Schedule another Reset Ack DLLP to be sent whenever a Reset Request DLLP is received
and the Assert bits in the Reset Request match what would be transmitted in the Reset Ack
(i.e. retransmit the Ack in case it got lost).
The following steps are used to exit Reset. This is triggered by receiving a Reset DLLP on a link
with the Assert bit = 0.
1. Reset Request DLLP is received with the Assert bit = 0
2. If all Functions are ready to exit Reset, Schedule a Reset Ack DLLP to be sent with the
Assert bit = 0
3. Schedule another Reset Ack DLLP to be sent whenever a Reset Request DLLP is received.
Reset Ack DLLPs may be coalesced so that multiple schedule events result in one DLLP being
transmitted.
Timely acknowledgement of Reset is important. Components shall respond with Reset Ack within
1.9 ms (+0%, -100%) and are strongly encouraged to respond much quicker. The 1.9 ms value is
chosen to avoid inadvertent link retraining caused by the Reset DLLP Forward Progress Timer (see
section 2.3.3.3).

PCISIG Confidential 63
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Figure 2-23: Downstream Link Partner RESET SM

2.3.3.3. Reset DLLP Reliability

A VH is considered to be waiting for Reset Ack when it is states US Assert Wait 3 or US Deassert
Wait 2.
The Upstream component retransmits Reset Requests approximately every 30 µsec as controlled by
the Reset Request Retransmit timer. This timer is enabled whenever any VH is waiting for Reset
Ack. It restarts when a Reset Request DLLP is transmitted for any VH group (either initial DLLP or
a resend). It expires 30 µsec after being started (+50%, -0%). When the timer expires, Reset Request
DLLPs are scheduled to be sent for all VH groups that have some VH waiting for a Reset Ack.
The Upstream component also includes a Reset DLLP Forward Progress timer. This timer is
enabled when any VH is waiting for Reset Ack. It restarts when some VH enters either waiting for
Reset Ack state or a Reset Ack is received for any VH. It expires 2 ms after being started (+50%, -
0%). When the timer expires, a link retrain is requested.
The Upstream component also includes a 2 bit Reset Retrain counter. This counter is incremented
when a link retrain is requested due to the expiration of the Reset DLLP Forward Progress timer.
This counter resets to 00b whenever the Reset DLLP Forward Progress timer is restarted. If this
counter rolls over from 11b to 00b, the link shall enter enters Detect.

64 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Reset state machines are not affected by link retraining (either initiated by the Reset DLLP Forward
Progress Timer or through other means). DLLPs that were scheduled during the link retrain shall be
sent when the link retrain completes.
Requests to retrain the link initiated by this mechanism may be coalesced with requests to retrain the
link initiated by other mechanisms. For example, two “simultaneous” requests for a link retrain due
to (1) a REPLAY_NUM rollover and to (2) the Reset Forward Progress timer may result in either
one or two retrain sequences.

2.3.3.4. Flow Control and Reset / DL_DOWN

In Multi-Root systems, Flow Control DLLPs can affect more than one VH. It is critical that a VH
entering or exiting the Reset state does not disrupt other VHs. Flow Control credits on MR Enabled
Links must be returned to the originator even when the VH that originated the discarded TLPs is in
or is entering Reset.
For example, suppose TLPs enter a Switch on the MR Link associated with Port 1 destined for
Port 2. If Port 2 is enters Reset or DL_DOWN and thus discards these TLPs, credits must be
returned in the appropriate VH of Port 1. This must occur even if the associated VH at Port 1 is
also in or is entering reset (there may be other VHs not in Reset sharing Port 1). This must occur
whether or not Port 2 is an MR Link.
Reset propagation may not be affected by Flow Control. Reset DLLPs must be sent independent of
any Flow Control state.
Reset propagation is affected by the TLP Ack / Nak protocol. Allowing this avoids the complexity
of editing the Retry Buffer to remove TLPs for VHs that are now in Reset.

2.4. MR Flow Control

2.4.1. FC Information Tracked by Transmitter


A transmitter shall track the following two quantities for each supported (VH, VL) and VL. As in
PCIe, Header and Data credits are tracked independently.
‰ CREDITS_CONSUMED

• Computed and used in the same manner as in PCIe protocol.


‰ CREDIT_LIMIT

• Use in the same manner as in PCIe protocol


• Undefined at interface initialization.
• Set to the value indicated during MR Flow Control initialization
• Updated based on MRUpdateFC DLLPs as described below.
In MR, transmitting a TLP requires passing 4 gates instead of the 2 gates used in PCIe.

PCISIG Confidential 65
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

‰ One PH, NPH or CPLH header VL credit is required.

‰ One PH, NPH or CPLH header (VH, VL) credit is required.

‰ PD, NPD or CPLD data VL credits may be required. The number of credits needed depends
on the TLP size and uses the same rules as PCIe.
‰ PD, NPD or CPLD data (VH, VL) credits may be required. The number of credits needed
depends on the TLP size and uses the same rules as PCIe.
The transmitter gating function for a TLP is said to pass if the transmitter gating function passes for
all four gates. If any gate fails, the transmitter must block transmission of the TLP. If
CREDIT_LIMIT was specified as “infinite” during Flow Control initialization, then the
corresponding gating function is unconditionally satisfied for that type of credit.
The Transmitter must following the same ordering and deadlock avoidance rules as specified in the
PCIe protocol. TLPs mapped to different VLs have no ordering relationship, and must not block
each other.
The transmitter gating function rules for (VH, VL)s and VLs are the same as that specified in the
PCIe protocol.
TLPs associated with different VHs represent different flows and have no ordering relationship.
TLPs blocked due to failure of the (VH, VL) gating function should not block TLPs associated with
other VHs mapped to the same VL.
If any CREDIT_LIMIT_VHVL for a Credit Type is infinite, then all CREDIT_LIMIT_VHVL
Credit Type values for that VL must also be infinite.
If the CREDIT_LIMIT_VL and CREDIT_LIMIT_VHVL are both infinite, no UpdateFC or
MRUpdateFC DLLPs are sent. A transmitter may optionally raise a Receiver Error if one is received
Comment [po4]: PCIe still allows
(as in PCIe). UpdateFCs to be sent. Need to resolve
how to handle this.
Otherwise, if CREDIT_LIMIT_VHVL is non-infinite, then MRUpdateFC DLLPs are sent.
2.6.1. Flow Control Rules
‰ MRUpdateFC DLLPs update the values of VL and (VH, VL) CREDIT_LIMIT as follows:
If an Infinite Credit advertisement (value
♦ Credits_Received = (Update Value – CREDIT_LIMIT_VHVL) mod 2Field Size of 00h or 000h) has been made during
initialization, no
Flow Control updates are required
♦ CREDIT_LIMIT_VL = (CREDIT_LIMIT_VL + Credits_Received) mod 2Field_Size following initialization.
♦ CREDIT_LIMIT_VHVL = Update Value • If UpdateFC DLLPs are sent, the credit
value fields must be set to zero and must
♦ The CREDIT_LIMIT_VL value is updated using the CREDIT_LIMIT_VHVL be ignored
by the Receiver. The Receiver may
value before it is updated by this DLLP. optionally check for non-zero update
values (in violation
‰ The transmitter computes but otherwise ignores the CREDIT_LIMIT values associated with of this rule). If a component implementing
this check determines a violation of this
infinite CREDIT_LIMIT_VL credits. rule, the
violation is a Flow Control Protocol Error
Otherwise, if CREDIT_LIMIT_VHVL is infinite, then PCIe Base UpdateFC DLLPs are sent. A (FCPE)
transmitter may optionally raise a Receive Error if an MRUpdateFC DLLP is received.
‰ UpdateFC DLLPs update the value of VL CREDIT_LIMIT as follows:

♦ CREDIT_LIMIT_VL = Update ValueFC

66 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

2.4.2. Information Tracked by Receiver


A receiver shall track the following two quantities for each supported (VH, VL) and VL.
‰ CREDITS_ALLOCATED

• The initial value is Vendor Specific. This value was communicated to the transmitter
using the MR Flow Control Initialization Protocol
• This value is incremented as processing (or discarding) of received TLPs makes
additional receiver buffer space available.
• Changes to this value are communicated to the transmitter using flow control update
DLLPs.
‰ CREDITS_RECEIVED

• Computed in the same manner as in PCIe protocol.


It is a goal to reduce the amount of FCP traffic associated with inactive flows. After an
MRUpdateFC has been sent 4 times with the same value, subsequent MR UpdateFC FCPs with the
same value must be scheduled for transmission once every 100 ms (-0%/+50%).
If a Receiver advertises non-infinite VL credits, then it must send MRUpdateFC DLLPs whenever
an FCP for that VL must be scheduled for transmission.
If a Receiver advertises infinite VH credits, then it must send UpdateFC DLLPs whenever an FCP
for that VL must be scheduled for transmission.

2.5. MR Message Processing

2.5.1. Interrupts
Interrupt processing occurs within a VH. The associated TLPs contain a TLP Prefix allowing all
components to route them appropriately.
MSI and MSI-X Interrupts are indistinguishable from other Memory Write TLPs.
INTx Interrupts are represented using ASSERT_INTx / DEASSERT_INTx messages per the PCI
Express Base Specification. In MR, these messages are queued and ordered within a VH.

2.5.1.1. INTx Device Processing

In PCIe, INTx messages are emitted by Devices. An ASSERT_INTx message is emitted when an
interrupt condition is signaled. A DEASSERT_INTx message is emitted when the interrupt
condition has been satisfied.

PCISIG Confidential 67
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

In MR, INTx messages are emitted within a VH. If a function within a VH signals or satisfies an
interrupt, ASSERT_INTx / DEASSERT_INTx messages are emitted within that VH. These
messages are unrelated to INTx messages issued in any other VH.

2.5.1.2. INTx Switch Processing

In PCIe, INTx messages are processed by Switches. Each downstream Switch Port tracks an internal
INTx wire for each of INTA/B/C/D. These INTx wires are combined into four INTx wires at the
upstream Switch Port. Transitions of the combined INTx wires trigger sending of INTx messages
out the upstream Switch Port.
Similarly, in MR, INTx messages are processed by Virtual Switches. Each virtual downstream Switch
Port tracks an internal INTx wire for each of INTA/B/C/D. The INTx wires are combined into
four INTx wires at the virtual upstream Switch Port. Transitions of the combined INTx wires
trigger sending of INTx messages out the virtual upstream Switch Port.
Switch reconfiguration will also affect INTx. For example, when a Virtual Device is unmapped from
a Virtual Switch, the virtual downstream Switch Port sees DL_Down. This virtual downstream
Switch Port follows PCIe rules by deasserting the internal INTx wires. If this results in a transition
of the combined INTx wires a DEASSERT_INTx message is sent out the virtual upstream Switch
Port.

2.5.1.3. INTx Root Port Processing

In PCIe, the Host Processor tracks the INTx assert/deassert state. If INTx is asserted, and software
has not masked processing, an interrupt is sent to host processor.
In MR with a PCIe Root Port, this is unchanged. By the time the INTx message is seen by the Root
Port, the TLP Prefix has been dropped and the message is indistinguishable from non-MR usage.
In MR within an MR Root Port, the INTx messages are tagged with a VH and the RP must track
independent INTx assert/deassert state for each VH.

2.5.2. PME Turn Off Processing


PCIe switches must perform a PME Turn Off scoreboard function. MR switches must do so as well
within each VS. See Section 7.7 for details.

2.5.3. PM_PME Processing


MR Switches must convert PM_PME messages into Beacon / WAKE# indications in certain
situations. See Section 7.6.1 for details.

2.6. Miscellaneous Changes


There are certain secondary effects caused by other MR Protocol changes.

68 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

‰ TLPs in a retry buffer are slightly bigger since it includes a TLP Prefix. This has a minor ripple
effect (e.g. Ack/Nak timer values may need minor adjustments)
‰ The TLP Prefix is queued with the TLP. This increases the amount of buffering required for
the TLP Header.

2.7. Miscellaneous Non-Changes


The following items are not affected by Multi-Root operations.
‰ Locked transactions are processed by MR Switches within each VS. Locked transactions in
one VS will not affect another VS.
‰ Port arbitration occurs within each VS. Software in a VH may control Port arbitration using
the optional VC Extended Capability in each bridge of the VS.
‰ Function arbitration occurs within each VH. Software in a VH may control Function
arbitration using the optional MFVC Extended Capability in Function 0 of each VH.
‰ Values reported by the Power Budgeting Capability reflect the entire component. No attempt
is made to split reported power consumption between VHs.

PCISIG Confidential 69
3
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

3. Initialization and Resource Allocation

3.1. MR Topology Initialization


MR Fabrics consist of a collection of the following components interconnected in a mesh topology:
‰ one or more MRA Switches

‰ one or more PCIM enabled Switch Management Ports

‰ zero or more non-PCIM enabled RPs

‰ zero or more Devices – MRA, SR, PCIe Device2

Unlike conventional PCI Express, MR topologies need not be a tree. MR Topologies can be an
arbitrary mesh that may contain loops. Each VH sees a tree structure consisting of a subset of the
overall MR Topology.
Because MR Topologies are no a tree, they have no single notion of link upstream and downstream
directions. Links have a Physical direction (used in the Physical layer Link training process). In
addition, each VH using a Link has a Logical direction which may differ from the Link’s Physical
direction.
There are a number of steps or phases used to initialize and configure the above components into a
MR Topology. These steps are:
1. Initial State after Reset
2. PCIM Location Ploicy Decision
3. Topology Discovery
4. Component Discovery
5. Mapping Policy Decision
6. Mapping Implementation
7. Virtual Hierarchy Enumeration
Each step is described in more detail below.

2Useful MR Topologies will eventually have at least one device, but this is not strictly required. In particular, no
devices might initially be present with devices being added later using Hot-Add operations.

70 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

3.1.1. Initial State after Fundamental Reset


MRA Switch Ports are configured into one of two Initial Port Types: PCIM Capable Switch Port and
Non-PCIM Capable Switch Port. This configuration is set after Fundamental Reset using a Vendor
Specific mechanism. The configuration mechanism can change Port type assignments but the results
of such a change are not visible until the next Fundamental Reset.
The initial configuration of every MR Switch must have at least one Switch Management Port. This
port can be either a PCIe PCIM Capable Switch Port or a Vendor Specific non-PCIe Switch
Management Port.
The initial configuration of every MR Switch may have any number of Non-PCIM Ports (including
zero).
Initial Port Type is indicated by the setting of a few key characteristics of a Port. These
characteristics are configurable so that MR-PCIM software can change PCIe Port behavior from the
initial settings. Port Type configurability for non-PCIe Switch Management Ports is Vendor Specific.
Implementation Note: Vendor specific settings could be used to configure an MR Switch so it
operates as an SR Switch. The SR upstream Port would be a PCIM Capable Switch Port. SR
downstream Ports would be Non-PCIM Ports with their Link_Direction set as downstream and
their Port VH 0 mapped to some downstream P2P Bridge of the VS associated with the upstream
Port.

3.1.1.1. PCIM Capable Switch Ports

A PCIM Capable Switch Port i is defined as a Switch Port with the following characteristics:
‰ Port[i].Link_Direction indicates an upstream Port.

‰ Some VS[j] has mapped {Port[i], Port VH 0} to its upstream bridge.

‰ The upstream P2P Bridge Configuration Header for VH 0, VS[j] has a full MR-IOV
Capability.
‰ The bit j of the VS Authorization Bitmap is Set.

‰ The Management VS value is Vendor Specific. It could be j or it could be some other VS.

‰ VS[j] contains sufficient Enabled Downstream P2P Bridges to access all Ports of the MR
Switch.
These settings ensure that MR-PCIM could manage the Switch using this Port. MR-PCIM is not
required to be present on this Port. The RP running MR-PCIM could be directly attached to this Port
or could be connected indirectly using one or more MR or SR Switches.
This Port is authorized. In particular, any software using it will be allows to manage the Switch.
Software can later de-authorize the Port if desired.
Note: Since a VS has a single upstream bridge, these rules imply that every Potential PCIM port will
be associated with a distinct VS.
Implementation Note:

PCISIG Confidential 71
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

PCIM Capable Switch Ports need not be full width “expensive” Ports. They could be x1 ports
intended just to support management.

3.1.1.2. Non-PCIM Capable Switch Ports

A Non-PCIM Capable Switch Port j is defined as a Switch Port with the following characteristics:
‰ Port[j].Link_Direction is Vendor Specific.

‰ No Authorized VS has mapped {Port[j], any Port VH} to its upstream bridge

‰ Mapping of Port[j] into bridges in VSs is otherwise Vendor Specific.

• Any Port VH of Port[j] may be mapped to any downstream bridge of any VS.
• Any Port VH of Port[j] may be mapped to the upstream bridge of any non-authorized
VS. This upstream bridge may or may not have an MR-IOV capability in its Type 1
Configuration header. If it has one, only selected fields in the Type 1 header are
operational and none of the MR-IOV tables located in memory are visible. See <REF>
for details.
• Port VHs of Port[j] need not be mapped into any VS.
These settings ensure that Initial MR-PCIM will never be present on this Port. This Port can be
directly or indirectly via Switches connected to Devices, Root Ports or Bridges.
This Port is not authorized. Attempts by software attached to this Port to configure the MR Switch
will fail unless the port is later Authorized

3.1.1.3. Non-PCIe Switch Management Ports

A non-PCIe port may also be used to manage MR Topologies. These ports consist of Vendor
Specific hardware that appears to a MR Switch as an upstream Port. Such Ports have a Port Table
entry with the Non-PCIe bit set.
This Vendor Specific hardware allows MR-PCIM to issue and respond to the subset of PCIe
transactions needed to manage the MR Topology.
‰ Such hardware must allow MR-PCIM to issue:

• Configuration Read and Write Requests (Type 0 and Type 1) of sizes 1, 2 or 4 bytes,
naturally aligned.
• Memory Space Read and Write Requests (32 bit and 64 bit addressing) of sizes 1, 2, 4
and 8 bytes, naturally aligned.
• Message Transactions normally issued by Root Ports (e.g. PME Turn Off messages).
‰ Such hardware must allow MR-PCIM to respond to:

72 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

• Completions related to the above (including completion status and support for
Completion Timeout).
• Posted Memory Write transactions for MSI Interrupts
• Message Transactions (e.g. INTx, PME_TO_Ack, PM_PME, Errors, …)
This is the minimum support needed to configure the MR Topology. Additional support may be
required if MR-PCIM uses the MR Topology for general I/O (e.g. Logging). Additional support may
also be required if Vendor Specific device management software also needs to use this port.
A Non-PCIe Switch Management Port i is defined as a Port with the following characteristics:
‰ Port[i].Link_Direction indicates an upstream Port.

‰ The Port operates in Base PCIe mode (the link can not be MR Enabled).

‰ Some VS[j] has mapped {Port[i], Port VH 0} to its upstream bridge. This mapping may be
fixed.
‰ The upstream P2P Bridge Configuration Header for VH 0, VS[j] has a full MR-IOV
Capability.
‰ The bit j of the VS Authorization Bitmap is Set. This bit may be read only.

‰ The Management VS value is Vendor Specific. It could be j or it could be some other VS.

‰ VS[j] contains enough Enabled Downstream P2P Bridges to access all Ports of the MR
Switch.
These settings ensure that MR-PCIM could manage the Switch using this Port. MR-PCIM is not
required to be present on this Port.
This Port is authorized. In particular, any software using it will be allowed to manage the Switch.
Note: Since a VS has a single upstream bridge, these rules imply that every Non-PCIe Switch
Management Port will be associated with a distinct VS.

PCISIG Confidential 73
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

3.1.1.4. Initial State Example

Figure 3-1 shows an example MR Topology. Both Switches are MRA Switches. The three green Deleted: Figure 3-1
Root Ports labeled RP 0, RP 1 and RP 2 are capable of running MR-PCIM software and configuring
the topology. The blue RP 3 is not permitted to run MR-PCIM. Devices are a mixture of Base PCIe,
SR Aware and MR Aware Devices.

Figure 3-1: Example MR Topology


Deleted: Table 3-1
In this example, assignments are show in Table 3-1.
Table 3-1: Port Types – Example MR Topology

Port Initial Port Type Reason


A, B, C PCIM Capable After Reset, RP 0, RP 1 or RP 2 are allowed to run
Switch Port MR-PCIM
D Non-PCIM Capable After Reset, RP 3 is not allowed to run MR-PCIM
Switch Port
E, X PCIM Capable If RP 2 runs MR-PCIM, this setting allows it to
Switch Port manage Switch A after Reset
F, Y PCIM Capable If either RP 0 or RP 1 runs MR-PCIM, this setting
Switch Port allows it to manage Switch B after Reset
G, H, I, J Non-PCIM Capable Devices never run MR-PCIM

74 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Switch Port
K, L, M, N Non-PCIM Capable Devices never run MR-PCIM
Switch Port

3.1.2. Initial MR-PCIM Location Policy


After Fundamental Reset, a Vendor Specific mechanism selects the Initial MR-PCIM location.
Only the selected system is allowed to access the MR topology; a Vendor Specific mechanism
prevents other RPs and Non-PCIe Switch Management Ports from accessing the MR topology.
The Initial MR-PCIM software must be able to manage the entire MR Topology. To do so, it must
be connected to a PCIM Capable Switch Port on every MR Switch. This connection can be direct or
indirect using additional Switches.

3.1.2.1. Initial MR-PCIM Location Example

Continuing with the example from Section 3.1.1.4, assume that RP 0 is chosen for the Initial MR-
PCIM. Vendor Specific mechanisms are used to prevent RP 1 and RP 2 from accessing the MR
Topology. For example, the affected processors might be powered off or held in reset.

3.1.3. Topology Discovery


The selected Initial MR-PCIM can connect to the first MR Switch in a variety of ways:
‰ Base PCIe RP directly connected to a PCIM Capable Switch Port on the MR Switch. The link
trains in Base PCIe mode. Only VH 0 of this Port will be used.
‰ MR enabled RP directly connected to a PCIM Capable Switch Port on the MR Switch. The
link trains in MR Enabled mode. VH 0 of this Port will be used by MR-PCIM to manage this
switch.
‰ Vendor Specific non-PCIe interface connected to a Non-PCIe Switch Management Port on
the MR Switch.
The in each of these cases, the upstream P2P Bridge first seen by MR-PCIM contains the MR-IOV
capability and is mapped to VH 0 of an authorized VS. MR-PCIM uses this capability to configure
the Switch.
MR-PCIM invents a unique number for the Switch and writes it to the MR Switch Number field.
This number is by used MR-PCIM to detect topology loops during the enumeration process.
MR-PCIM then uses the Port table to enable additional links and to manage their Link_Direction.
MR-PCIM might use Vendor Specific of knowledge of the topology to aid in this process if, for
example, a system is using Link_Direction to prevent link training and thus hold off some RPs from
seeing the hierarchy until it is configured.

PCISIG Confidential 75
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

For each Port that trains as a downstream Port, MR-PCIM will examine the Link Partner Training
Status fields in the Port Table. For each Link Partner that is an Authorized Port on an MR Switch,
MR-PCIM will determine whether the Port is mapped into the MR-PCIM VS and if necessary map
the Port into an unused downstream bridge. It will then establish a Bus Number range for the
downstream bridge so that allow PCIe Configuration transactions can be issued to the Link Partner.
MR-PCIM then probes the configuration header of the MR Switch attached to the Port.
‰ If the component is a “new” MR Switch because the Switch’s MR Switch Number field has a
number not assigned by MR-PCIM in this enumeration cycle, the enumeration process repeats
to configure the new Switch.
‰ If the component is an “old” MR Switch that MR-PCIM has seen before, the MR Switch
Number and connection information is noted but further enumeration is not needed via this
connection. Note that enumeration of the “old” Switch may not be complete, but it will be
completed using other links into the Switch.

3.1.3.1. MR-PCIM Topology Discovery Example


Deleted: Figure 3-2
Figure 3-2 expands on the example shown in Figure 3-1 adding possible initial link directions.
Deleted: Figure 3-1

Figure 3-2: Example MR Topology with Initial Link Directions


The Links between Ports E and F and between Ports X and Y do not train since each Link consists
of two upstream Ports. Since RP 0 was chosen as the location of the Initial MR-PCIM the link

76 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

between RP 0 and Switch A trains. Vendor Specific mechanisms are used to prevent RP 1 and RP 2
from starting (links may train but no transactions will be generated from these RPs). The link
between RP 3 and Port D can’t train since it consists of two downstream Ports.
MR-PCIM software operating in RP 0 could proceed as follows:
1. Reads Type 1 Configuration Header at Port A, detects MR-IOV capability indicating an MR
Switch.
2. Configures BAR registers in Switch A so that the Port Table can be examined.
3. Assigns MR Switch Number 42 to Switch A.
4. Detects that Port A, VH 0 was mapped to VS n.
5. Notices link attached to Port E detected something present but did not train. Switches
Link_Direction for link E to downstream thus allowing the E to F link to train.
6. Notices link attached to Port X detected something present but did not train. Switches
Link_Direction for link X to downstream thus allowing the X to Y link to train.
7. Notices that Links G, H, I and J trained as downstream. Each Port’s Link Partner Training
Status indicates no MR Switch is connected so no additional enumeration is needed at this
time.
8. Notices that Port E is connected to an Authorized MR Switch. If needed, maps Port E to
some downstream bridge of VS n of Switch A (it might already be mapped). Using this
downstream bridge Switch B is enumerated.
9. Reads the Type 1 Configuration Header of Port F, detects MR-IOV capability indicating an
MR Switch.
10. Configures BAR registers in Switch B so that the Port Table can be examined.
11. Assigns MR Switch Number 86 to Switch B.
12. Detects that Port F, VH 0 was mapped to VS m.
13. Notices that Links K, L, M and N trained as downstream. Each Port’s Link Partner Training
Status indicates no MR Switch is connected so no additional enumeration is needed at this
time.
14. Notices link attached to Port D detected something present but did not train. Switches
Link_Direction for link D to upstream thus allowing the D to RP 3 link to train. This step
might be delayed until later if preventing link training was the Vendor Specific mechanism
used hold off RP 3 from enumerating the topology.
15. Notices that Port X is connected to an Authorized MR Switch. If needed, maps Port X to
some downstream bridge of VS n of Switch A (it might already be mapped). Using this
downstream bridge Switch B is enumerated. Reads the Type 1 Configuration Header of
Port Y, detects MR-IOV capability indicating an MR Switch. The previously assigned MR
Switch Number of 86 indicating that Port X is a second path to Switch B. Since Switch B
has been (or is still being) enumerated using Port E, no further enumeration is needed using
Port X.
This is an example; other enumeration orderings are also valid. Note that the link directions of the
E to F and X to Y links depend on the chosen ordering. If, for example, instead of changing the

PCISIG Confidential 77
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Link_Direction of Port X in Step 6 the Link_Direction of Port Y were changed later during the
enumeration of Switch B, the X to Y link would train in the opposite direction.
Note: MR-PCIM assigns MR Switch registers locations in PCIe Memory Space during this discovery
process. These assignments may change in subsequent software.

3.1.4. Component Discovery


Topology Discovery allows MR-PCIM to know what MR Switches exist and how they are
interconnected. Component Discovery is now used to locate MR Devices and non-MR
Components.
This process examines config space of every Component. This information is gathered to drive
Switch and Device Configuration Policy Decisions.
Information gathered from MR Devices includes:
‰ MaxVH

‰ Function number(s) of all BFs

‰ VF Mapping Supported / VF Migration Supported / VF MVF Region

Information gathered from MR Switches includes:


‰ MaxVH for each Port

‰ Number of VSs / Number of Bridges for each VS

Information gathered from every Link includes:


‰ Link Width and Speed

Information may not be available for Root Ports (MR or PCIe). The Vendor Specific mechanisms
used to hold off transactions might also prevent the link from training so the Link Partner Training
Status may not yet be meaningful.

78 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

3.1.4.1. Component Discovery Example


Deleted: Figure 3-3
Figure 3-3 expands on the example shown in Figure 3-2 adding Device characteristics.
Deleted: Figure 3-2

MR MR MR PCIe PCIe MR MR SR
8 VH 8 VH 8 VH 8 VH 8 VH
no no no VF Map VF Map
VF Map VF Map VF Map +
Migrate

Figure 3-3: Example MR Topology with Component Discovery Details


This example assumes:
‰ Single Function MR Devices attached to Ports G, H and I

• Each Device supports 8 VHs


• None of the Devices support VF Mapping or VF Migration
‰ Base PCIe Devices attached to Ports J and K

‰ Single-Function MR Device attached to Port L

• The Device supports 8 VHs


• VF Mapping is supported with 32 LVFs and 32 MVFs

PCISIG Confidential 79
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

• VF Migration is not supported


‰ Single Function MR Device attached to Port M

• The Device supports 8 VHs


• Both VF Mapping and VF Migration are supported with 32 LVFs and 30 MVFs
• VF and Function MVF Regions fully overlap
‰ Single Root Device attached to Port N

3.1.5. VH and VF Mapping Policy


Out-of-Scope mechanisms are used to decide what portions of the MR Topology should be assigned
to each VH.

3.1.5.1. Example VH and VF Mapping Policy

Table 3-2 expands on the example shown in Figure 3-3 adding VF and VF Mapping Policy Decision Deleted: Table 3-2
information. Deleted: Figure 3-3

Table 3-2: Example MR Topology VH and VF Mapping Policy

VH Authorized VH Mapping VF Mapping


RP 0 Yes VS in Switch A and B Device L: 16 VFs
(MR-PCIM)
Uses E to F Inter-Switch Link Device M: 8 VFs
(2 unpopulated)
VH in Devices G, H, I, L and M
RP 1 No VS in Switch A
VH in Devices G and H
Device J
RP 2 Yes VS in Switch A and B Device L: 8 VFs
(Backup MR-
Uses F to E Inter-Switch Link Device M: 8 VFs
PCIM)
(2 unpopulated)
VH in Devices G, H, I, L and M
Device K
RP 3 No VS in Switch A and B Device L: 8 VFs
Uses Y to X Inter-Switch Link Device M: 8 VFs
(2 unpopulated)
VH in Devices G, H, L and M
Two VHs in Device I
Device N

80 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

3.1.6. VH and VF Mapping Implementation


Once the assignment policy is decided, MR-PCIM configures the Components to implement it. This
involves writing the various tables in MR Switches and Devices. These tables are described in
Sections 4.2 and 4.3.
All configuration activity occurs in the MR-PCIM VH. The order software uses for this initial
configuration writes is unspecified. Configuration and memory requests originate exclusively from
MR-PCIM.

3.1.6.1. Example Topology: Switch Implementation

The view from each Root Port is shown in Figure 3-4 through Figure 3-7 below. Switch VS Table Deleted: Figure 3-4
Programming to implement this is shown in Table 3-3 and Table 3-4 Deleted: Figure 3-7
Deleted: Table 3-3

PCIM Deleted: Table 3-4

RP 0

A E F
A B
G
H I L M

Dev Dev Dev Dev Dev


G H I L M

Figure 3-4: Example MR Topology: RP 0 View

RP 1

B J
A
H
G

Dev Dev Dev


G H J

Figure 3-5: Example MR Topology: RP 1 View

PCISIG Confidential 81
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

PCIM

RP 2

E F C
A B
I M
G H L K

Dev Dev Dev Dev Dev Dev


G H I L K M

Figure 3-6: Example MR Topology: RP 2 View

RP 3

X Y D
A B
G I2 N
H I1 L M Dev
N

Dev Dev Dev Dev Dev Dev


G H I I L M

Figure 3-7: Example MR Topology: RP 3 View

82 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Table 3-3: Example Topology: Switch A VS Bridge Table Contents

Slot Root VS Bridge Enabled Mapped Port Port VHN


0 RP 0 VS 0 Upstream Yes Yes A VH 0
1 RP 0 VS 0 Downstream 0 Yes Yes G VH 0
2 RP 0 VS 0 Downstream 1 Yes Yes H VH 0
3 RP 0 VS 0 Downstream 2 Yes Yes I VH 0
4 RP 0 VS 0 Downstream 3 Yes Yes E VH 0

5 VS 0 Downstream 4 No
6 VS 0 Downstream 5 No
7 VS 0 Downstream 6 No
8 VS 0 Downstream 7 No

9 RP 1 VS 1 Upstream Yes Yes B VH 0 (PCIe)


10 RP 1 VS 1 Downstream 0 Yes Yes G VH 2
11 RP 1 VS 1 Downstream 1 Yes No
12 RP 1 VS 1 Downstream 2 Yes Yes H VH 1
13 RP 1 VS 1 Downstream 3 Yes Yes J VH 0 (PCIe)

14 RP 1 VS 1 Downstream 4 Yes No
15 RP 1 VS 1 Downstream 5 Yes No
16 RP 1 VS 1 Downstream 6 Yes No
17 RP 1 VS 1 Downstream 7 Yes No

18 RP 2 VS 2 Upstream Yes Yes D VH 0 (PCIe)


19 RP 2 VS 2 Downstream 0 Yes Yes L VH 2
20 RP 2 VS 2 Downstream 1 Yes Yes M VH 2
21 RP 2 VS 2 Downstream 2 Yes Yes N VH 0 (PCIe)
22 RP 2 VS 2 Downstream 3 Yes Yes Y VH 0

23 RP 2 VS 2 Downstream 4 Yes No
24 RP 2 VS 2 Downstream 5 Yes
25 VS 2 Downstream 6 No
26 VS 2 Downstream 7 No

27 VS 3 Upstream No
28 VS 3 Downstream 0 No
29 VS 3 Downstream 1 No
30 VS 3 Downstream 2 No
31 VS 3 Downstream 3 No

32 VS 3 Downstream 4 No
33 VS 3 Downstream 5 No
34 VS 3 Downstream 6 No
35 VS 3 Downstream 7 No

PCISIG Confidential 83
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Table 3-4: Example Topology: Switch B VS Bridge Table Contents

Slot Root VS Bridge Enabled Mapped Port Port VHN


0 RP 2 VS 0 Upstream Yes Yes C VH 0 (PCIe)
1 RP 2 VS 0 Downstream 0 Yes Yes F VH 1
2 RP 2 VS 0 Downstream 1 Yes Yes L VH 0
3 RP 2 VS 0 Downstream 2 Yes Yes M VH 0
4 RP 2 VS 0 Downstream 3 Yes Yes K VH 0 (PCIe)

5 VS 0 Downstream 4 No
6 VS 0 Downstream 5 No
7 VS 0 Downstream 6 No
8 VS 0 Downstream 7 No

9 RP 0 VS 1 Upstream Yes Yes F VH 0 (PCIe)


10 RP 0 VS 1 Downstream 0 Yes Yes L VH 1
11 RP 0 VS 1 Downstream 1 Yes Yes M VH 1
12 RP 0 VS 1 Downstream 2 Yes No
13 RP 0 VS 1 Downstream 3 Yes No

14 RP 0 VS 1 Downstream 4 Yes No
15 RP 0 VS 1 Downstream 5 Yes No
16 RP 0 VS 1 Downstream 6 Yes No
17 RP 0 VS 1 Downstream 7 Yes No

18 RP 2 VS 2 Upstream Yes Yes E VH 1


19 RP 2 VS 2 Downstream 0 Yes Yes G VH 1
20 RP 2 VS 2 Downstream 1 Yes No
21 RP 2 VS 2 Downstream 2 Yes Yes H VH 2
22 RP 2 VS 2 Downstream 3 Yes Yes I VH 1

23 VS 2 Downstream 4 No
24 VS 2 Downstream 5 No
25 VS 2 Downstream 6 No
26 VS 2 Downstream 7 No

27 RP 3 VS 3 Upstream Yes Yes X VH 0


28 RP 3 VS 3 Downstream 0 Yes Yes G VH 3
29 RP 3 VS 3 Downstream 1 Yes Yes H VH 3
30 RP 3 VS 3 Downstream 2 Yes Yes I VH 33
31 RP 3 VS 3 Downstream 3 Yes Yes I VH 24

32 VS 3 Downstream 4 No

3 Port I1 in Figure 3-7. Two PFs of Device I are assigned to RP 3. As far as RP 3 is concerned, these are Deleted: Figure 3-7
independent Devices.
Deleted: Figure 3-7
4 Port I2 in Figure 3-7.

84 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

33 VS 3 Downstream 5 No
34 VS 3 Downstream 6 No
35 VS 3 Downstream 7 No

3.1.6.2. Example Topology: Device Implementation

In addition to Switch mapping, MR Devices need configuration.


Devices G and H are both assigned NumVHs of 4. No additional configuration is necessary. Switch
configuration established the mapping between VH numbers and RPs.
Device I is assigned NumVHs of 6. No additional configuration is necessary. VH 4 and VH 5 of the
Device could be dynamically assigned to unmapped switch ports associated with RP 0, RP 1 or RP 2
(RP 3 has no unmapped switch ports on Switch A).
Deleted: Figure 3-8
Device L is assigned NumVHs of 3. VF Mapping is shown in Figure 3-8.

PCISIG Confidential 85
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

tate
#
LVF

VF S
PF 0:0
BaseLVF = 0 VF Map MVF Context
InitialVFs = 8 LVF 0,0 A.A VF 0:0,1 MVF 0,0 MVF 0,0
TotalVFs = 8 LVF 0,1 A.A VF 0:0,2 MVF 0,1 MVF 0,1
LVF 0,2 A.A VF 0:0,3 MVF 0,2 MVF 0,2
LVF 0,3 A.A VF 0:0,4 MVF 0,3 MVF 0,3
LVF 0,4 A.A VF 0:0,5 MVF 0,4 MVF 0,4
PF 1:0
LVF 0,5 A.A VF 0:0,6 MVF 0,5 MVF 0,5
BaseLVF = 8
LVF 0,6 A.A VF 0:0,7 MVF 0,6 MVF 0,6
InitialVFs = 16
LVF 0,7 A.A VF 0:0,8 MVF 0,7 MVF 0,7
TotalVFs = 16
LVF 0,8 A.A VF 1:0,1 MVF 0,8 MVF 0,8
LVF 0,9 A.A VF 1:0,2 MVF 0,9 MVF 0,9
LVF 0,10 A.A VF 1:0,3 MVF 0,10 MVF 0,10
PF 2:0 LVF 0,11 A.A VF 1:0,4 MVF 0,11 MVF 0,11
BaseLVF = 24 LVF 0,12 A.A VF 1:0,5 MVF 0,12 MVF 0,12
InitialVFs = 8 LVF 0,13 A.A VF 1:0,6 MVF 0,13 MVF 0,13
TotalVFs = 8 LVF 0,14 A.A VF 1:0,7 MVF 0,14 MVF 0,14
LVF 0,15 A.A VF 1:0,8 MVF 0,15 MVF 0,15
LVF 0,16 A.A VF 1:0,9 MVF 0,16 MVF 0,16
LVF 0,17 A.A VF 1:0,10 MVF 0,17 MVF 0,17
LVF 0,18 A.A VF 1:0,11 MVF 0,18 MVF 0,18
LVF 0,19 A.A VF 1:0,12 MVF 0,19 MVF 0,19
LVF 0,20 A.A VF 1:0,13 MVF 0,20 MVF 0,20
LVF 0,21 A.A VF 1:0,14 MVF 0,21 MVF 0,21
LVF 0,22 A.A VF 1:0,15 MVF 0,22 MVF 0,22
LVF 0,23 A.A VF 1:0,16 MVF 0,24 MVF 0,23
LVF 0,24 A.A VF 2:0,1 MVF 0,23 MVF 0,24
LVF 0,25 A.A VF 2:0,2 MVF 0,26 MVF 0,25
LVF 0,26 A.A VF 2:0,3 MVF 0,25 MVF 0,26
LVF 0,27 A.A VF 2:0,4 MVF 0,27 MVF 0,27
LVF 0,28 A.A VF 2:0,5 MVF 0,28 MVF 0,28
LVF 0,29 A.A VF 2:0,6 MVF 0,29 MVF 0,29
LVF 0,30 A.A VF 2:0,7 MVF 0,30 MVF 0,30
LVF 0,31 A.A VF 2:0,8 MVF 0,31 MVF 0,31

Figure 3-8: Example MR Topology: Device L PF / VF Mapping

86 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Device M is assigned NumVHs of 3. VF Mapping is shown in Figure 3-9. VF Migration Capable is Deleted: Figure 3-9
Set.

tate
#
LVF

VF S
PF 0:0
BaseLVF = 0 VF Map MVF Context
InitialVFs = 6 LVF 0,0 A.A VF 0:0,1 MVF 0,0 MVF 0,0
TotalVFs = 8 LVF 0,1 A.A VF 0:0,2 MVF 0,1 MVF 0,1
LVF 0,2 A.A VF 0:0,3 MVF 0,2 MVF 0,2
LVF 0,3 A.A VF 0:0,4 MVF 0,3 MVF 0,3
LVF 0,4 A.A VF 0:0,5 MVF 0,4 MVF 0,4
PF 1:0
LVF 0,5 A.A VF 0:0,6 MVF 0,5 MVF 0,5
BaseLVF = 8
LVF 0,6 I.U VF 0:0,7 none MVF 0,6
InitialVFs = 14
LVF 0,7 I.U VF 0:0,8 none MVF 0,7
TotalVFs = 16
LVF 0,8 A.A VF 1:0,1 MVF 0,6 MVF 0,8
LVF 0,9 A.A VF 1:0,2 MVF 0,7 MVF 0,9
LVF 0,10 A.A VF 1:0,3 MVF 0,8 MVF 0,10
PF 2:0 LVF 0,11 A.A VF 1:0,4 MVF 0,9 MVF 0,11
BaseLVF = 24 LVF 0,12 A.A VF 1:0,5 MVF 0,10 MVF 0,12
InitialVFs = 6 LVF 0,13 A.A VF 1:0,6 MVF 0,11 MVF 0,13
TotalVFs = 8 LVF 0,14 A.A VF 1:0,7 MVF 0,12 MVF 0,14
LVF 0,15 A.A VF 1:0,8 MVF 0,13 MVF 0,15
LVF 0,16 A.A VF 1:0,9 MVF 0,14 MVF 0,16
LVF 0,17 A.A VF 1:0,10 MVF 0,15 MVF 0,17
LVF 0,18 A.A VF 1:0,11 MVF 0,16 MVF 0,18
LVF 0,19 A.A VF 1:0,12 MVF 0,17 MVF 0,19
LVF 0,20 A.A VF 1:0,13 MVF 0,18 MVF 0,20
LVF 0,21 A.A VF 1:0,14 MVF 0,19 MVF 0,21
LVF 0,22 I.U VF 1:0,15 none MVF 0,22
LVF 0,23 I.U VF 1:0,16 none MVF 0,23
LVF 0,24 A.A VF 2:0,1 MVF 0,20 MVF 0,24
LVF 0,25 A.A VF 2:0,2 MVF 0,21 MVF 0,25
LVF 0,26 A.A VF 2:0,3 MVF 0,22 MVF 0,26
LVF 0,27 A.A VF 2:0,4 MVF 0,25 MVF 0,27
LVF 0,28 A.A VF 2:0,5 MVF 0,26 MVF 0,28
LVF 0,29 A.A VF 2:0,6 MVF 0,27 MVF 0,29
LVF 0,30 I.U VF 2:0,7 none
LVF 0,31 I.U VF 2:0,8 none

Figure 3-9: Example MR Topology: Device M VF Mapping

3.1.7. MR-PCIM Failover


There is a single active MR-PCIM in a topology. This MR-PCIM receives all “route to PCIM” errors
and events. The VS associated with this MR-PCIM is programmed into each Switch’s Management
VS register.

PCISIG Confidential 87
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

There may be multiple Authorized VSs. Software running in these VSs is allowed to view and
change the Switch MR-IOV configuration. To avoid confusion, coordination is needed between
such software but is not specified by this specification.
A “backup” MR-PCIM operating in any Authorized VS may become the active MR-PCIM simply by
changing the Management VS register in the affected Switch(s). This feature can support failover
from one MR-PCIM to a backup MR-PCIM. The Suppress Reset Propagation feature of a VS can
be used to prevent a reset due to the failure of one MR-PCIM resetting state needed to continue
operation using a backup MR-PCIM.
Mechanisms used to detect MR-PCIM failure, to select the new MR-PCIM location, to initiate the
failover, etc. are undefined by this specification.
TLPs targeting MR-PCIM are regular PCIe TLPs within the appropriate VH. Such TLPs are
forwarded based on the switch configuration when they are received. Old TLPs are not re-routed
due to switch reconfiguration or change in VS Authorization.

3.2. MR Device Initialization


After Conventional Reset, MR Devices negotiate to use the MR Link protocol. If this negotiation is
successful, the Device appears as VH 0 and contains one or more Base Functions (BFs) and zero or
more PFs or Non-IOV Functions. Each BF contains an MR-IOV Capability in the Type 0
Configuration Header.
Additional VHs beyond VH0 are enabled using the MR-IOV Capability.
Every BF is associated with a single PF or Non-IOV Function in each non-zero VH.
Every BF is optionally associated with a single PF or Non-IOV Function in VH0.
Every PF (in any VH) and every Non-IOV Function (in any non-zero VH) is associated with a
single BF.
Attached to each PF in each VH is an optional collection of Virtual Functions also in that VH.
These are described in the SR-IOV Specification."
PFs, BFs and VFs are designated as follows:
BF indicates the Base Function. It does not have a number since each Device only has one
BF.
PF f indicates a PF at Function number f (f must be between 0 and 255). This nomenclature is
used for SR systems or for the SR view of a single MR VH.
PF h:f indicates a PF within an MR system at Function number f in VH h (h must be
between 0 and the maximum VH number in use on the link).
VF f,s indicates VF number s attached to PF number f (s must be between 1 and the number
of VFs in use for PF f). This nomenclature is used for SR systems or for the SR view inside a
single VH.
VF h:f,s indicates VF number s attached to PF number f in VH h.

88 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

In addition, the optional VF Mapping and VF Migration features use the terms Logical Virtual
Function (LVF) and Mission Virtual Function (MVF). These are similarly designated as follows:
LVF f,s indicates LVF table slot number s attached to PF number f.
MVF f,s indicates MVF number s attached to PF number f. MVFs do not have a VH (a VH
is associated the LVF that an MVF is mapped to, see below).
During topology enumeration, MR-PCIM detects MR Devices by noticing the presence of an MR-
IOV Capability in PF 0’s Configuration Header.
Initializing and managing a Device in MR mode involves managing four aspects of the Device.
‰ Configuring and enabling the VHs.

‰ Enabling and managing the optional MR flow control features: This involves configuring the
number of Virtual Links used, configuring the number of VCs offered in each VH,
configuring the {VH, VC} to VL mapping hardware, configuring any VH to VL arbitration
hardware and configuring any VL to Link arbitration hardware.
‰ Enabling and managing the optional VF Mapping features: This involves configuring the
number of LVFs offered by each PF in each VH and configuring each PF’s LVF to MVF
map.
‰ Enabling and managing the optional VF Migration features: This involves leaving “holes” in
the LVF to MVH map to support migration, enabling VF Migration, responding to requests
for initiate VF migration and interacting with SR-PCIM software to accomplish VF migration.
These aspects will be described separately in the following sections.

3.2.1. Enabling MR Operation


Initially MR Devices always negotiate to use the MR Link Protocol. If this negotiation fails, they
operate in PCI Express mode. Initially, MR Devices operating in MR mode use only VH 0. VH 0
contains one or more Functions (PFs, BFs or non-IOV Functions) operating as either a PCI
Express single Function or Multi-Function Device.
In MR Devices, each BF contains an MR-IOV Capability block. There are a few key registers in this
Capability used in enabling MR Operation.
‰ The MR-IOV Capabilities register indicates which optional features are implemented by the
Device.
‰ One of the BFs is designated the Main BF. This BF contains certain fields that apply to the
entire Device. These fields are reserved in other BFs (if any). The Main BF is identified by the
“Is Main BF” bit in the MR-IOV Capability. The function number associated with the Main
BF is Device Specific.
‰ The MaxVH register in the Main BF indicates the number of VHs supported by Device
hardware. The value is Vendor Specific and must not change except after Fundamental Reset
of the Device.

PCISIG Confidential 89
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

‰ The NumVH register in the Main BF indicates to the Device how many VHs are going to be
used by MR-PCIM. Software should set this value based on the value of MaxVH, on the
number of VHs implemented at the upstream end of the link and on the number of VHs
needed by the system. Once software has enabled additional VHs, the NumVH value may not
be changed.
‰ The MR-IOV Capability of every BF contains a pointer to the Function Table. This table
contains one entry for every Function associated with the BF. This table is indexed by VH
number since every BF contains a single function in each VH (exception: VH 0 need not have
a function so the first Function Table entry might not be used see the Function Offset field in
Section 4.2.4.1)
‰ The Function Table entry for the Main BF contains a VC ID to VL Map. This map includes a
Map Enable bit. A VH is considered Enabled when, for some VC, the VC ID to VL Map
entry is Enabled, points to a VL that is Enabled, and software operating in the VH enables
that VC. See section 4.2.4.4 for additional details.

3.2.2. Managing Flow Control


There are a number of fields used to manage flow control and VC to VL mapping.
MR Flow Control uses the same concepts as PCIe flow control. TLPs can only be sent if the
transmitter has sufficient available flow control credits of the appropriate flavor. The differences
from PCIe include:
In MR, VCs are replaced by Virtual Links (VLs). All Flow Control information is related to VLs not
VCs. VCs continue to have meaning within a VH and serve as the mechanism used to present
software operating in each VH with the collection of flow control channels and the mechanism that
allows designating which TLPs should get use each flow control channel. As today, software
operating in a VH controls the TC to VC map. MR-PCIM controls the subsequent map from
{VH, VC} to VL. Performance characteristics of a VL are assigned by MR-PCIM.
If PF h:f supports more than one VC, the either optional VC Capability or optional MFVC
Capability exists in the PF h:f Type 0 Configuration Header. The values presented are managed by
MR-PCIM as follows:
‰ The PF h:f Extended VC Count value is configured by MR-PCIM. This value indicates to
software operating in the VH the number of VC Resources that it can use.
‰ The PF h:f Low Priority Extended VC Count value is configured by MR-PCIM. This value
indicates to software operating in the VH which VC Resources should be viewed as higher
priority.
‰ The PF h:f VC to VL map is configured by MR-PCIM. This map converts VC IDs chosen by
software running in the VH into the corresponding VL numbers. If more than one VC is
supported, this is a full eight entry map since all VC ID values are legal This map also includes
an Enable bit.

90 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

‰ The PF h:f VC Enabled fields allow MR-PCIM to determine which VC Resources are
enabled by software operating in the VH. When VC Enabled changes value, the VC Config
Changed interrupt is signaled to MR-PCIM.
‰ The PF h:f VC ID fields allow MR-PCIM to determine the VC ID assignments made by
software operating in the VH.
‰ The PF h:f TC to VC map field allows MR-PCIM to determine the mapping between TCs
and VCs made by software operating in the VH.

3.2.3. Managing VF Mapping


Software operating in the VH (e.g. SR-PCIM) can optionally see a collection of Virtual Functions
attached to each PF. In MR, these VFs are known as Logical Virtual Functions (LVFs). MR Devices
implement some number of underlying Mission Virtual Functions (MVFs). These MVFs are
mapped into LVFs. This mapping can be optionally controlled by MR-PCIM using VF Mapping
hardware. This VF Mapping support is optional. If VF Mapping is not supported for a BF, LVFs are
still mapped to MVFs but the mapping is Device Specific and is not visible or controllable by MR-
PCIM.
VF Mapping uses a set of mapping tables controlled by management software. These tables allow
software to (1) control the number of LVFs assigned to each PF, (2) specify which MVFs (if any) are
assigned to each LVF and (3) detect how many VFs software operating in the VH has indicated it
will use.
Figure 3-10 shows an example setup. There are 5 VHs, each with a single PF at Function 0. VH 1 Deleted: Figure 3-10
through VH 4 have been assigned some VFs.
‰ VFs in VH 1 have been assigned 8 LVFs and 4 MVFs. LVF 0-0 thru LVF 0-7 are associated
with the 8 VFs. There are 4 VF holes (VF 1:0.5 thru VF 1:0.8 a.k.a. LVF 0-4 thru LVF 0-7)
meaning that, if SR-PCIM enables VF Migration, up to 4 MVFs can be migrated in to VH 0.
‰ VFs in VH 2 have been assigned 4 LVFs and 4 MVFs. All LVFs are populated meaning that
no migration in is possible unless an MVF first migrates out.
‰ VFs in VH 3 have been assigned 7 LVFs and 5 MVFs. Like VH 0, holes were left for possible
VF Migration in to the VH.
‰ VFs in VH 4 have been assigned 3 LVFs and 3 MVFs, Like VH 1, no migration in is possible
unless an MVF first migrates out.
‰ No VFs have been assigned to VH 5.

PCISIG Confidential 91
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Figure 3-10: Example Mapping of VFs


The VF State table is used to coordinate migration of MVFs in / out of LVFs. If hardware supports
VF Mapping but not VF Migration, the VF State table is Read Only Zero. If hardware supports VF
Migration, management software must configure the VF State in the LVF Table even if the VF
Migration Capable bit is Clear. The VF State values shows in Figure 3-10 are the required initial Deleted: Figure 3-10
values. See Section 3.2.4 Managing VF Migration for details.
MR-PCIM manages the LVF slot allocation to PFs using the Base LVF and Total VFs registers.
Base LVF indicates the first LVF slot associated with the PF. Base LVF + Total LVFs indicates the
last LVF slot associated with the PF. The LVF slot designated by Base LVF contains the MVF
associated with the PF. The LVF slot designated by Base LVF+1 contains the MVF associated with
VF x.1 of PF x, etc. Every PF has at least one LVF slot.

92 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

The number of populated LVFs offered to SR-PCIM is contained in InitialVFs. If SR-PCIM does
not enable VF Migration, only these slots are used and any additional unpopulated slots remain
unused.
Initially, MR-PCIM must populate LVFs for a given PF using the lower numbered VFs first. Holes
left for migration (if any) follow the last populated LVF.
Before it enables SR-IOV operation, SR-PCIM in each VH must set the NumVFs value to indicate
the number of VFs it wishes to use. The value set must be less than or equal to TotalVFs. If VF
Migration not enabled by SR-PCIM, the value set must also be less than or equal to InitialVFs.
In some Devices, the Device Programming Model assumes that operations on one Function can
affect another Function of the Device. In MR, this dependency manifests itself as a relationship
between MVFs of some of the Device’s BFs. The Function Dependency Link field in a BF indicates
the presence a dependency. See section 4.2.1.2 for details. A similar dependency exists in the SR-
IOV specification but there it deals with dependencies associated with VF assignment to SIs.

3.2.4. Managing VF Migration


In Multi-Root systems, VFs can be migrated between VHs. VF Migration does not occur in Single-
Root only systems.
SR-IOV and MR-IOV support for VF Migration is Optional. VF Migration for VFs associated with a
BF is only possible if all three of the following are true:
‰ The VF Migration Supported bit in the BF MR-IOV Capability is Set.

‰ In at least one VH (h), MR-PCIM software has set the VF Migration Capable bit in the
Function Table entry controlling PF h:f.
‰ In that VH (h), SR-PCIM software controlling PF h:f has also Set the VF Migration Enabled
bit.
VF Migration centers around the LVF Table:
‰ The VF State field manages the SR-IOV and MR-IOV combined view of the migration state
of a VF. This table is used to gracefully add and remove VFs to or from VHs.
‰ The VF Map field maps LVFs onto MVFs. When the VF State is Inactive.Unavailable,
software can write this field to implement a change.
VF migration follows the state diagram shown in Figure 3-11. The state values shown are contained Deleted: Figure 3-11
in the VF State field associated with the VF. State transitions indicated by solid lines are initiated by
MR software by writing the new state value to the VF State field. State transitions indicated by
dashed lines are typically initiated by SR-PCIM and are visible to MR-PCIM via the VF State field.
The mechanisms used for this is described in the SR-IOV Specification.

PCISIG Confidential 93
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

VF Inactive (not usable by SR) VF Active (fully usable by SR)

SR: Complete Migrate Out


MR: Set VF Migration Status

SR: Clear VF Enable


SR: Clear VF Enable
MR: Retract
MR: Request SR: Set VF Enable Migrate Out
Migrate In
SR: Set VF Migration
SR: Set VF Migration Status
Status
MR: Request
Migrate Out
MR: Retract
Migrate In SR: Set VF Migration
Status
SR: Set VF Migration
Status SR: Set VF Enable
SR: Clear VF Enable
SR: Clear VF Enable

SR: Activate
SR: Deactivate

VF Dormant (responds to SR)

Figure 3-11: VF Migration State Diagram


The following state transitions may be initiated by MR software by writing the VF State field. Any
other writes are ignored and no state transition occurs.
Table 3-5: Valid MR State Transitions for VF Migration

Current State Written State Meaning


00b Inactive.Unavailable 01b Inactive.MigrateIn Request Migrate In
01b Inactive.MigrateIn 00b Inactive.Unavailable Retract Migrate In
11b Active.Available 10b Active.MigrateOut Request Migrate Out
10b Active.MigrateOut 11b Active.Available Retract Migrate Out

VFs that are in the Inactive.Unavailable state are not usable by software in the VH. Configuration,
IO and Memory Requests within the VH targeting the associated VF return UR. Within 100 ms of
transitioning to this state, a VF must stop issuing Requests.

94 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

VFs that are in the Inactive.MigrateIn state (1) will respond to Configuration Requests issued by
software running in the VH, (2) if MSE is Set, will respond to Memory requests issued by software
running in the VH and (3) will not issue Requests.
The state transition from Active.MigrateOut to Inactive.Unavailable Sets the MR VF Migration
Status bit. If the MR VF Migration Interrupt Enable is Set, this in turn causes an MSI to be queued
to MR-PCIM. MR-PCIM software can then scan the LVF Table to determine the cause of the
interrupt. Specifically, MR-PCIM is looking for the VFs that it previously placed in the
Active.MigrateOut state and are now in the Inactive.Unavailable state.
State transitions with the notation “SR: Set VF Migration Status” cause similar behavior within the
VH. See the SR-IOV Specification for details.
The following steps are use by MR-PCIM to migrate a VF from one VH to another.
1. Request arrives from higher level software requesting that VF h:f,s be migrated to VF a:f,c.
For the request to be valid, the VF Table Entry associated VF h:f,s must be in the
Active.Available state and the VF Table Entry associated with VF a:f,c must be in the
Inactive.Unavailable state.
2. Initiate a Migrate Out operation in VH h by writing the VF State entry associated with
VF h:f,s to the Active.MigrateOut.
3. Wait for SR-PCIM to stop using the VF and to indicate so by transitioning the VF State
to Inactive.Unavailable. This transition sets the MR VF Migration Status bit and can raise an
interrupt to MR-PCIM.
4. Save the value of the VF Map entry associated with VF h:f,s.
5. Set the VF Map entry associated with VF h:f,s to zero to indicate an empty slot
6. Set the VF Map entry associated with VF a:f,c the value saved in step 4.
7. Initiate a Migrate In operation in VH a by writing the VF Map entry associated with VF a:f,c
to the Inactive.MigrateIn state.
8. At some point, SR-PCIM will transition the VF the Active.Available state and start using it.
In addition graceful migration described above, MR-PCIM can retract a Migrate In or Migrate Out
request that it previously requested.

3.2.4.1. VF Migration Initial State

Software in a VH is expecting to see the initial VF configuration shown in Figure 3-12. MR-PCIM Deleted: Figure 3-12
must ensure this condition by appropriate programming of the VF Migration tables.

PCISIG Confidential 95
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

VF 1
VF 2
Active.Available VFs
...
InitialVFs VF n
VF n+1
... Inactive.Unavailable VFs

TotalVFs VF m

Figure 3-12: Initial VF State


Specifically, for each PF, MR-PCIM must configure the associated Function Table entry such that:
‰ InitialVFs ≥ 0

‰ TotalVFs ≥ InitialVFs

‰ 0 ≤ BaseLVF + TotalVFs < MaxLVF

‰ The LVF table region assigned to the PF, [BaseLVF .. BaseLVF + TotalVFs – 1], does not
overlap with the region assigned to any other PF of the BF.
‰ If InitialVFs > 0, all LVF entries in the range [BaseLVF .. BaseLVF + InitialVFs – 1] are in
state Active.Available and are mapped to valid MVFs.
‰ If InitialVFs != TotalVFs, all LVF entries in the range [BaseLVF + InitialVFs ..
BaseLVF + TotalVFs – 1] are in state Inactive.Unavailable.

3.2.4.2. VF Migration Reinitialization

After a PF is Reset or when VF Enable is Cleared and then Set, a valid initial VF configuration must
be re-established. The InitialVFs value may be different from an earlier initial configuration so long
as the configuration meets the rules described in Section 3.2.4.1.
This process can be accomplished by hardware or software and must be completed within 1 sec (to
avoid an SR software timeout resulting in the hardware being declared broken).
This process starts by adjusting InitialVFs to reflect the number of active VFs associated with the
PF and then rearranging those active VFs into lower numbered VFs, keeping the same relative VF
ordering.
Implementation Note: As described in the SR-IOV Specification, if VF Migration Enable was Set,
SR software must wait 1 second after clearing VF Enable for InitialVFs and TotalVFs to become
Comment [sdg5]: This constraint
valid. does not currently exist but will be added
in the 1.0 version of SR-IOV.

96 PCISIG Confidential
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

3.3. MR Root Port Initialization


MR Root Complexes consist of one or more traditional Root Complex each with one or more Root
Ports and a Vendor Specific mapping of those Root Ports into one or more MR Links.
For each MR Link, one Root Port is associated with VH 0 on that link and controls the physical
link. When the Link operates in MR Mode, additional Root Ports are be associated with additional
VHs of the Link.
For Links that are MR Enabled, the RP sends MRInit DLLPs as shown in Table 2-1 with the Device Formatted: Highlight
/ Port Type indicating Root Port (0100b) . Deleted: Table 2-1

Any MR Root Complex must address the following areas. The mechanisms used are vendor specific Formatted: Do not check spelling or
grammar, Highlight
and outside the scope of this specification.
Formatted: Highlight
‰ Method for deciding whether to enable MR operation on a given Link

‰ Method for associating a RP with a MR Link and a VH on that Link

‰ Method for arbitrating between RPs that share a MR Link

‰ Method for mapping {RP, VC} to VLs

‰ Method for enabling and disabling VLs (VL0 is always enabled, others are controlled by
software)
‰ Method for enabling and disabling VHs (VH0 is always enabled, others are controlled by
software)
‰ Method for controlling the number of VCs offered to each RP (could be fixed)

‰ Method for associating an ATPT with a RP

‰ Method for reporting and logging MR Errors

‰ Method for limiting Peer-to-Peer transactions from one RP to another RP

In addition, software running above the RC must determine how to configure and use the MR
topology. This involves determining, for example, how the various VHs on each MR Link should be
used, how VCs and VLs should be managed, etc.
The mechanism used to determine this is outside the scope of this specification. A variety of
mechanisms can be used (in any combination) including:
‰ Using arbitrary out-of-band communication paths

‰ Using registers is Switches and/or Devices defined by this specification

‰ Using vendor specific registers in Switches and/or Devices

‰ Using the MR Switch “Link Partner” registers in MR Switches to see the values sent in MRInit
DLLPs.

PCISIG Confidential 97
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Note that the MR RP is always upstream on all VHs. This means that MR-PCIM can not, in general,
access MR RP Configuration Space (Exception: MR-PCIM can access the RP it’s running above and
may be able to access other RPs in the same RC).

3.3.1. Example MR Root Port Topology


Formatted: Not Highlight
An example topology that includes an MR Root Complex is shown in Figure 3-13.
Deleted: Figure 3-13
Formatted: Do not check spelling or
Error! Objects cannot be created from editing field codes. grammar, Not Highlight
Formatted: Not Highlight
Figure 3-13: Example Topology with MR Root Complex
In this example, 5 Root Ports and Two Endpoint Functions are present in the Root Complex. The
Root Ports are associated with MR Links X and Y in the following fashion.
Table 3-6: Example MR Root Topology: RP Associations

Root Port Association


RP 0 Link X VH 0
RP 1 Link X VH 1
RP 2 Link X VH 2
RP 3 Link Y VH 0
RP 4 Link Y VH 1

If Switch X authorizes the port connected to Link X, MR-PCIM could manage Switch X via RP 0.
Similarly, if Switch Y authorizes the port connected to Link Y, MR-PCIM could manage Switch Y
via RP 3.
If Switches X and Y managed from the MR Root Complex and are distinct MR topologies with no
connection between them, the two MR-PCIMs above RP 0 and RP 3 are independent as well. If
there is a single interconnected MR topology, there must be a single MR-PCIM and it can use either
RP 0 or RP 3 but not both (i.e. the Management VH must always be a tree).

98 PCISIG Confidential
4
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4. Configuration
The following sections list the configuration requirements for Base Function (BF), Physical
Functions, MRA Switches and MR-PCIMs.

PCISIG Confidential 99
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.1. Configuration Field Summary


The following fields are used to manage MR IOV features of a Device or Switch.
Table 4-1: MR-IOV Fields

Field Name Width Type Opt Device Usage Switch Usage


MSI Vector # 11 RO Req BF / Capability MR-IOV Cap /
Capability
MR Switch Number 16 RW Req n/a MR-IOV Cap / Control
Function Dependency 8 RO Req BF / Capability n/a
Link
VS Interrupt Enable 1 RW Req n/a MR-IOV Cap / Control
Port Interrupt Enable 1 RW Req n/a MR-IOV Cap / Control
VS Interrupt Status 1 RO Req n/a MR-IOV Cap / Status
Port Interrupt Status 1 RO Req n/a MR-IOV Cap / Status
VS # 8 RO Req n/a MR-IOV Cap / Status
VS Bridge # 8 RO Req n/a MR-IOV Cap / Status
VS is Authorized 1 RO Req n/a MR-IOV Cap / Status
# VS 8 RO Req n/a MR-IOV Cap /
Capability
# VS Bridge 8 RO Req n/a MR-IOV Cap /
Capability
# Port 8 RO Req n/a MR-IOV Cap /
Capability
VS Table Entry Size 8 RO Req n/a MR-IOV Cap /
Capability
VS Table Offset / BIR 32 RO Req n/a MR-IOV Cap
VS Bridge Table Entry 8 RO Req n/a MR-IOV Cap /
Size Capability
VS Bridge Table Offset 32 RO Req n/a MR-IOV Cap
/ BIR
Port Table Entry Size 8 RO Req n/a MR-IOV Cap /
Capability
Port Table Offset / BIR 32 RO Req n/a MR-IOV Cap
Function Offset 16 RO Req BF / Capability n/a
Vendor Specific Fields
Vendor Specific 4 RW Opt BF / Control MR-IOV Cap / Control
Interrupt Enable Bits
Vendor Specific 4 RO / Opt BF / Status MR-IOV Cap / Status
Interrupt Status Bits RW1C

100 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Field Name Width Type Opt Device Usage Switch Usage


Watchdog Timer Support
Watchdog Timer 1 RW Req n/a MR-IOV Cap / Control
Interrupt Enable
Watchdog Timer 1 RW1C Req n/a MR-IOV Cap / Status
Interrupt Status
Timer Interval 1 8 RW Req n/a MR-IOV Cap /
Watchdog
Timer Interval 2 8 RW Req n/a MR-IOV Cap /
Watchdog
Watchdog 1 Expired 1 RW1C Req n/a MR-IOV Cap /
Watchdog
Rearm Watchdog 1 & 1 RW1C Req n/a MR-IOV Cap /
2 Watchdog
Performance Monitoring
Statistics Interrupt 1 RW Opt BF / Control MR-IOV Cap / Control
Enable
Statistics Interrupt 1 RW1C Opt BF / Status MR-IOV Cap / Status
Status
# Statistics Blocks 8 RO Req BF / Statistics MR-IOV Cap /
Statistics
# Statistics Descriptors 8 RO Req BF / Statistics MR-IOV Cap /
Statistics
Statistics Block Start / up to 32 RW Opt BF / Statistics MR-IOV Cap /
Busy Statistics
Statistics Block Offset / 32 RO Opt BF / Statistics MR-IOV Cap /
BIR Statistics
Statistics Descriptor 32 RO Opt BF / Statistics MR-IOV Cap /
Offset / BIR Statistics
Link Control
MaxVH 8 RO Req Main BF / VH Counts Port Table / Capability
NumVH 8 RW Req Main BF / VH Counts Port Table / Control
Port Present 1 RO Req n/a Port Table / Capability
Port Enable 1 RW Req n/a Port Table / Control
Port Interrupt Enable 8 RW Req n/a Port Table / Control
Port Interrupt Pending 8 RW1C Req n/a Port Table / Status
MR-IOV Link Enable 1 RW Req n/a Port Table / Control
Force Reset 1 RW Req n/a VS Bridge Table / Hot
Plug Signals
VS Interrupt Enable 1 RW Req n/a VS Table / Control

PCISIG Confidential 101


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Field Name Width Type Opt Device Usage Switch Usage


VS Suppress Reset 1 RW Req n/a VS Table / Control
Propagation
PM_PME Triggers 1 RW Opt n/a Port Table / Control
Beacon / WAKE#
Send PME_Enter_L23 1 RW Opt n/a Port Table / Control
DLLP
Non-PCIe 1 RO Req n/a Port Table / Capability
Management Port
MR Error Status 32 RW1C Opt Main BF / MR Log Port Table / MR Log
MR Error Log 160 (5*32) RO Opt Main BF / MR Log Port Table / MR Log
Max Payload Size 3 RO Req n/a VS Bridge Table /
Supported Capability
Max Payload Size 3 RW Opt n/a VS Bridge Table /
Offered Control
Link Direction Status 2 RO Req n/a Port Table / Status
Link Partner Detected 1 RO Req n/a Port Table / Status
Link Direction Control 2+2 RW Req n/a Port Table / Control
+ Backup Link
Direction Control
Vendor Specific MR 3 RW Req n/a Port Table / Control
Init Information
Link Partner Vendor 3 RO Req n/a Port Table / Link
Specific MR Init Partner
Information
Link Partner MaxVH 8 RO Req n/a Port Table / Link
Partner
Link Partner MaxVL 3 RO Req n/a Port Table / Link
Partner
Link Partner Trained in 1 RO Req n/a Port Table / Link
MR Mode Partner
Link Partner Protocol 3 RO Req n/a Port Table / Link
Version Partner
Link Partner was 1 RO Req n/a Port Table / Link
Authorized Partner
Link Partner Type 4 RO Req n/a Port Table / Link
Partner
VF Migration / VF Mapping
VF Migration 1 RO Req BF / Capability n/a
Supported
VF Mapping Supported 1 RO Req BF/ Capability n/a
VF Enable 1 RO Req Function Table / Status n/a

102 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Field Name Width Type Opt Device Usage Switch Usage


VF Enable Changed 1 RW1C Req Function Table / n/a
Control
VF Enable Enabled 1 RW Req Function table / n/a
Control
VF Migration Capable 1 RW Opt Function Table / n/a
Control
VF Migration Enabled 1 RO Opt Function Table / Status n/a
Max LVF # 16 RO Opt BF / VF Migration n/a
Max MVF # 16 RO Opt BF / VF Migration n/a
LVF Table Offset / BIR 32 RO Opt BF / VF Migration n/a
Base LVF 16 RW Opt Function Table n/a
InitialVFs 16 RW Opt Function Table / n/a
Control
TotalVFs 16 RW Opt Function Table / n/a
Control
NumVFs 16 RO Opt Function Table / Status n/a
First VF Offset 16 RO Opt Function Table n/a
VF Stride 16 RO Opt Function Table n/a
VF Migration Status 1 RW Opt Function Table n/a
Interrupt Enable
VF Migration Status 1 RW1C Opt Function Table n/a
PF Reset Indicated 1 RW1C Opt Function Table n/a
Interrupt Enable
PF Reset Indicated 1 RW1C Opt Function Table n/a
Congestion Management
VL Enable 8 RW Req Main BF / Control Port Table / Control
VL Negotiation 8 RO Req Main BF / Status Port Table / Status
Pending
MaxVL 3 RW Opt BF / VL Arb Port Table / VL Arb
BF VL 3 RW Req Main BF / Control n/a
Default VL 3 RW Opt Main BF / Control n/a
VL Arbitration Table 30 RO Opt BF / VL Arb Port Table / VL Arb
Offset
VL Arbitration 2 RO Opt BF / VL Arb Port Table / VL Arb
Reference Clock
VL Arbitration 8 RO Opt BF / VL Arb Port Table / VL Arb
Capability
Load VL Arbitration 1 RW Opt BF / VL Arb Port Table / VL Arb
Table

PCISIG Confidential 103


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Field Name Width Type Opt Device Usage Switch Usage


VL Arbitration Select 4 RW Opt BF / VL Arb Port Table / VL Arb
VL Arbitration Status 1 RO Opt BF / VL Arb Port Table / VL Arb
Max Time Slots 8 RO Opt BF / VL Arb Port Table / VL Arb
VL Strict Priority 8 RW Opt BF / VL Arb Port Table / VL Arb
Arbitration
VC Capability 1 RO Req Function Table / n/a
Supported Capability
Num VC Resources 3 RO Req Function Table / n/a
Hardware Present Capability
Num MFVC Resources 3 RO Req Function Table / n/a
Hardware Present Capability
MFVC Capability 1 RO Req Function Table / n/a
Supported Capability
Extended VC Count 3 RW Opt Function Table / VS Bridge Table /
Control Control
Low Priority Extended 3 RW Opt Function Table / VS Bridge Table /
VC Count Control Control
VC Resource Enabled 8 (1*8) RO Opt Main BF Function VS Bridge Table / VC
Table / VC State State
VC Resource VC 8 (1*8) RO Opt Main BF Function VS Bridge Table / VC
Negation Pending Table / VS State State
VC ID 24 (3*8) RO Opt Function Table / VC VS Bridge Table / VC
State State
TC to VC Map 64 (8*8) RO Opt Function Table / VC VS Bridge Table / VC
State State
VC to VL Map 24 (3*8) RW Opt Main BF Function VS Bridge Table / VC
Table / VC ID to VL ID to VL Map
Map
VC Mapped 8 (1*8) RW Opt Main BF Function VS Bridge Table / VC
Table / VC ID to VL ID to VL Map
Map
VC Config Changed 1 RW1C Opt Main BF Function VS Bridge Table /
Table / Status Status
VC Config Interrupt 1 RW Opt Main BF Function VS Bridge Table /
Enable Table / Control Control
MFVC Resource 8 (1*8) RO Opt Main BF Function n/a
Enabled Table / MFVC State
MFVC Resource VC 8 (1*8) RO Opt Main BF Function n/a
Negation Pending Table / MFVC State
MFVC: VC ID 24 (3*8) RO Opt Function Table / MFVC n/a
State
MFVC: TC to VC Map 64 (8*8) RO Opt Function Table / MFCV n/a

104 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Field Name Width Type Opt Device Usage Switch Usage


State
MFVC Config 1 RW1C Opt Main BF Function n/a
Changed Table / Status
MFVC Config Interrupt 1 RW Opt Main BF Function n/a
Enable Table / Control
Global Key Management
Global Key Value 12 RW Req Main BF / Function VS Table / Control
Table
Device Global Key 1 RW Req Main BF / Control n/a
Check Enable
VS Global Key Check 3 RW Req n/a VS Table / Control
Enable
Management Authorization
Management VS 8 RW Req n/a MR-IOV Cap / Control
Authorized VS Bitmap 12 RO Req n/a MR-IOV Cap / Control
Offset
Authorized VS Bitmap 2* RW Req n/a MR-IOV Cap / Control
+ Backup Authorized NumVS
Bitmap
Switch Mapping
Bridge Hardware 1 RO Req n/a VS Bridge Table /
Present Capability
Bridge Enable 1 RW Req n/a VS Bridge Table /
Control
Bridge Port 8 RW Req n/a VS Bridge Table /
Control
Bridge Port VH 12 RW Req n/a VS Bridge Table /
Control
Port Mapped to Bridge 1 RW Req n/a VS Bridge Table /
Control
VS Present 1 RO Req n/a VS Table / Capability
VS Enable 1 RW Req n/a VS Table / Control
VS Interrupt Vector 11 RO Req n/a VS Table / Capability
Num
Hot Plug Signals Interface
Hot Plug Hardware 1 RO Req n/a VS Bridge Table /
Present Capability
Bridge Controls 1 RW Req n/a VS Bridge Table / Hot
Physical Link Plug Signals
Slot Implemented 1 RW Opt n/a VS Bridge Table / Hot
Plug Signals

PCISIG Confidential 105


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Field Name Width Type Opt Device Usage Switch Usage


Virtual Hot Plug 1 RW Opt n/a VS Bridge Table / Hot
Interrupt Enable Plug Signals
PME Turn Off State 1 RO Req n/a VS Bridge Table /
Status
Physical Slot Number 13 RW Opt n/a VS Bridge Table / Hot
Plug Signals
Slot Power Limit Scale 2 RO Opt n/a VS Bridge Table / Hot
Plug Signals
Slot Power Limit Value 8 RO Opt n/a VS Bridge Table / Hot
Plug Signals
Hot Plug Capable 1 RW Opt n/a VS Bridge Table / Hot
Plug Signals
Hot Plug Surprise 1 RW Opt n/a VS Bridge Table / hot
Plug Signals
Attention Indicator 2 RO Opt n/a VS Bridge Table / Hot
State Plug Signals
Attention Indicator 1 RW1C Opt n/a VS Bridge Table /
Changed Status
Push Attention Button 1 RW Opt n/a VS Bridge Table / Hot
Plug Signals
Power Indicator State 2 RO Opt n/a VS Bridge Table / Hot
Plug Signals
Power Indicator 1 RW1C Opt n/a VS Bridge Table /
Changed Status
Power Controller State 1 RO Opt n/a VS Bridge Table / Hot
Plug Signals
Power Controller 1 RW1C Opt n/a VS Bridge Table /
Changed Status
Power Controller 1 RW Opt n/a VS Bridge Table / Hot
Present Plug Signals
Signal Power Fault 1 RW Opt n/a VS Bridge Table / Hot
Plug Signals
Presence Detect State 1 RW Opt n/a VS Bridge Table / Hot
Plug Signals

4.2. Device Configuration Space


For managing MR-IOV Devices, MR-PCIM uses MR-IOV Capabilities located in the Base
Functions. Base Functions are Type 0 Headers located in VH0 of the Device that contain the MR-
IOV Capability.

106 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Configuration controls are associated with:


‰ Entire component

‰ Each Base Function

‰ Each VH supported by the component

‰ Each Function or PF supported in some non-management VH by the component

There are up to five tables provided by each Base Function. These tables are located using the MR-
IOV Capability block located in the Type 0 Configuration Space of the associated BF. An overview
Deleted: Figure 4-1
of the tables is show in Figure 4-1.
‰ The MR-IOV Capability contains information concerning the BF.

‰ The VH Table contains information concerning each VH supported by the Device. There is
exactly one VH Table per Device and it is associated with an arbitrary BF.
‰ The optional VL Arbitration Table contains information describing how VL Arbitration is
performed by the Device. If present, this table is associated with the BF that contains the VH
Table.
‰ The optional LVF Table is used to control VF Mapping and VF Migration. This table is absent
if neither VF Mapping nor VF Migration are supported by the BF.
‰ The Function Table contains one entry for each Function or PF in each non-management VH.

PCISIG Confidential 107


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

MR-IOV Capability Hdr


Function
MR-IOV Capability Bits Interrupt Bitmap
MR-IOV Control Bits Function Table
MR-IOV Status Bits Entry 0

NumVH MaxVH
Function Table Offset Function Table
Entry MaxVH-1
VL and Function Mapping
Control / Status
LVF Table
LVF Table Offset

VL Arbitration VL Arb Table


Control / Status Statistics

# Stats Desc
Descriptor 0
VL Arb Table Offset
# Stats Blks # Stats Desc
Statistics
Statsistics Start / Busy
Descriptor Max
Stats Descriptor Offset Statistics
Stats Block Offset Block 0

Statistics 0

# Stats in Block N
Block N
Statistics
# Stats
Blocks

Block N

Statistics Max
Block N
Statistics
Block Max

Config Space BAR Memory Space

Figure 4-1: MR Device Configuration Space

4.2.1. Device MR-IOV Extended Capability


MR-IOV Base Functions contain a new PCIe Extended Capability. This capability is used by MR-
PCIM to determine if a Device is MR-IOV capable and to manage the MR-IOV features of the
Function.
The MR-IOV capability is located in the PCIe Extended Configuration Space. Figure 4-2 shows the Deleted: Figure 4-2
Device MR-IOV Extended Capability structure:

108 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

MR-IOV Header 00h Next Cap Offset Vers Capability ID (0011h)

04h MSI Vector # Fcn Dep Link


Is Main BF
MR-IOV Capability VL Arb Table Present
VF Migration Supported
VF Mapping Supported
BF Def.
08h VL Enable VL VL
Vendor Specific Interrupt Enables
MR-IOV Control
Statistics Interrupt Enable
Function Table Interrupt Enable
0Ch VL Neg Pend
MSI Scheduled Vendor Specific
MR-IOV Status Interrupt Status
Statistics Interrupt Status
Function Table Interrupt Status
Function Table
MR-IOV VH Counts 10h NumVH Entry Size
MaxVH

Function Table 14h Function Table Offset BIR

18h Max MVF # Max LVF#


VF Mapping
1Ch LVF Table Offset BIR

20h Max Time Slots VL Arb Cap.


Max VL VL Arb Status
VL Arb Table WRR Ref Clock
VL Arbitration 24h
VL Strict Priority VL Arb Select
Arbitration
Load VL Arb Table
28h VL Arbitration Table Offset BIR

2Ch MR Error Control and Status


30h MR Error Log 0 (TLP Prefix)

MR Error 34h MR Error Log 1 (TLP Header)


Logging 38h MR Error Log 2 (TLP Header)
3Ch MR Error Log 3 (TLP Header)
40h MR Error Log 4 (TLP Header)

44h # Stat Blocks # Stat Desc

48h
Statistics Block 31 Statistics Block 1 Start / Busy
Statistics Start / Busy
Statistics Block 0 Start / Busy
4Ch Statistics Descriptor Table Offset BIR

50h Statistics Block Table Offset BIR

PCISIG Confidential 109


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Figure 4-2: Device MR-IOV Capability

4.2.1.1. MR-IOV Extended Capability Header (00h)

Table 4-2 defines the Switch MR-IOV Extended Capability header. The Capability ID for the Deleted: Table 4-2
Switch MR-IOV Extended Capability is 0011h.
Table 4-2: Device MR-IOV Extended Capability Header

Bit Location Register Description Attributes


15:0 PCI Express Extended Capability ID – This field is a PCI-SIG RO
defined ID number that indicates the nature and format of the
Extended Capability.
The Extended Capability ID for the MR-IOV Extended Capability is
0011h.
19:16 Capability Version – This field is a PCI-SIG defined version number RO
that indicates the version of the Capability structure present.
Must be 1h for this version of the specification.
31:20 Next Capability Offset – This field contains the offset to the next PCI RO
Express Capability structure or 000h if no other items exist in the
linked list of Capabilities.
This offset is relative to the beginning of PCI compatible
Configuration Space and thus must always be either 000h (for
terminating list of Capabilities) or greater than 0FFh.

110 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.2.1.2. MR-IOV Capabilities (04h)

Table 4-3: MR-IOV Capabilities

Bit Location Register Description Attributes


0 VF Mapping Supported – If Set, this BF supports mapping of RO
Mission Virtual Functions (MVFs) into VFs. If Clear, any mapping is
Vendor Specific and is not controlled by MR-PCIM.
1 VF Migration Supported – If Set, this BF supports dynamic mapping RO
of Mission Virtual Functions into and out of VFs. If Clear, VF
Migration support is not available.
If VF Migration is Set, VF Mapping must also be Set.
VF Migration can not occur unless enabled by software. Migration is
enabled on a per-PF basis. See Section xxx for details.
2 VL Arbitration Table Present – If Set, the VL Arbitration Table is RO
present. If Clear, the VL Arbitration Table is not present and the
corresponding fields in this BF are Read Only Zero.
This field is only meaningful in the Main BF of the Device. In all other
BFs, this field is Read Only Zero.
3 Is Main BF – If Set, this BF is the “Main” BF of the Device. Certain RO
fields are only meaningful in the Main BF.
7:4 Reserved RO
15:8 Function Dependency Link –The programming model for a Device RO
may have Vendor Specific dependencies between sets of Functions.
The Function Dependency Link field is used to inform MR-PCIM
about these dependencies.
This field describes dependencies between BFs. PF and VF
dependencies are the same as the dependencies of their associated
BFs.
If a BF is independent from other BFs of a Device, this field shall
contain the Function Number of the BF.
If a BF is dependent on other BFs of a Device, this field shall contain
the Function Number of the next BF in the same Function
Dependency List. The last BF in a Function Dependency List shall
contain the Function Number of the first BF in the Function
Dependency List.
For BFs in a Function Dependency List, MR-PCIM must allocate
MVFs consistently. For these BFs, every VH should contain the exact
same mapping of MVFs to VFs.
20:16 Reserved RO
31:21 MSI Vector Number – This field indicates the MSI Vector number RO
used to signal MR-IOV events within this BF. This value may change
based on whether MSI or MSI-X is enabled.

PCISIG Confidential 111


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.2.1.3. MR-IOV Control (08h)

Table 4-4: Device MR-IOV Control

Bit Location Register Description Attributes


0 Function Table Interrupt Enable – If Set, when any bit in the RW
Function Interrupt Bitmap is Set, an interrupt is requested. If Clear,
transitions of the Function Interrupt Bitmap do not request an
interrupt. Default is 0b.
3:1 Reserved RO
7:4 Vendor Specific Interrupt Enables – If Set, various Vendor Specific RW
events can cause an interrupt. If Clear, these Vendor Specific Events
can not cause an interrupt.
Vendor Specific Interrupt support is optional. These bits are Read
Only Zero for any unimplemented Vendor Specific Interrupts. Default
is 0b.
10:9 Default VL – VL Number used for TLPs originated in VH0 and not RW
associated with a BF, PF or VF (i.e. a “Plain Old Function”). Such
TLPs must use TC 0 / VC 0.
This field is only meaningful in the Main BF of the Device. In all other
BFs, it is Read Only Zero.
If all functions in VH0 are BFs, PFs or VFs, this field may be Read
Only Zero.
11 Reserved RO
14:12 BF VL – VL Number used for TLPs originating in this BF. Such TLPs RW
must use TC 0 / VC 0.
This field is only meaningful in the Main BF of the Device. In all other
BFs, it is Read Only Zero.
15 Reserved RO
23:16 VL Enable –This bit, when Set, enables a Virtual Link – This bit, RW
when Set, enables a Virtual Link (see note 1 for exceptions). The
Virtual Link is disabled when this bit is cleared.
Software must use the VL Negotiation Pending bit to check whether
the VL negotiation is complete.
Default value of this bit is 1b for VL0 and is 0b for other VLs.
The number of bits implemented in this field is determined by the
MaxVL value. VL Enable bits for VLs greater than MaxVL are Read
Only Zero.
This field is is only meaningful in the Main BF of the Device. In all
other BFs, it is Read Only Zero.
Notes:
1. This bit is hardwired to 1b for the VL0, i.e., writing to this bit has no
effect for VL0.
2. To enable a Virtual Link, the VL Enable bits for that Virtual Link

112 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Bit Location Register Description Attributes


must be Set in both components on a Link.
3. To disable a Virtual Link, the VL Enable bits for that Virtual Link
must be cleared in both components on a Link.
4. Software must ensure that no traffic is using a Virtual Link at the
time it is disabled.
5. Software must fully disable a Virtual Link in both components on a
Link before re-enabling the Virtual Link.
31:24 Reserved RO

PCISIG Confidential 113


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.2.1.4. MR-IOV Status (0Ch)

Table 4-5: Device MR-IOV Status

Bit Location Register Description Attributes


0 Function Table Interrupt Status – Set if any bit in the Function RO
Interrupt Bitmap is Set.
3:1 Reserved RO
7:4 Vendor Specific Interrupt Status – Set if the corresponding Vendor RO / RW1C
Specific Interrupt condition has occurred.
Vendor Specific Interrupt support is optional. These bits are Read
Only Zero for any unimplemented Vendor Specific Interrupts.
Vendor Specific Interrupt support allows software to read a single
register in response to an interrupt and quickly determine the reason
for the interrupt.
It is Vendor Specific how a Vendor Specific Interrupt is cleared.
Clearing occurs either due to writing 1b values to this field or by
clearing the interrupt using other, Vendor Specific fields.
14:8 Reserved RO
15 MSI Scheduled – Set when an MSI has been requested. If Set, RW1C
subsequent MSIs are supporessed. If Clear, any enabled interrupt will
cause an MSI to be scheduled (and this bit to be Set). Default is 0b.
23:16 VL Negotiation Pending – These bits indicate whether Flow Control RO
negotiation for a VL is in the pending state.
The value of this bit is defined only when the Link is in the DL_Active
state and the Virtual Link is enabled (its VL Enable bit is Set).
When this bit is Set by hardware, it indicates that the VL resource has
not completed the process of negotiation. This bit is cleared by
hardware after the VL negotiation is complete (on exit from the MR
FC_INIT2 state on the VL).
This field is only meaningful in the Main BF of the Device. In all other
BFs, this field is Read Only Zero,
31:24 Reserved RO

114 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.2.1.5. MR-IOV VH Counts (10h)

Table 4-6: Device MR-IOV VH Counts

Bit Location Register Description Attributes


7:0 MaxVH – Maximum number of VHs supported on this Device. VHs RO
are numbered [0..MaxVH-1].
This field is only meaningful in the Main BF of the Device. In all other
BFs, this field is Read Only Zero,
15:8 Function Table Entry Size – Returns the size, in DWORDs of each RO
Function Table Entry. For this version of the specification, this value
is at least 16.
23:16 NumVH – Indicates the number of VHs enabled by the upstream RW
component. This value must be less than or equal to MaxVH.. The
default value of this field is 0 indicating one VH is enabled.
This field is only meaningful in the Main BF of the Device. In all other
BFs, this field is Read Only Zero,
31:24 ReservedZ RO

4.2.1.6. Function Table Offset (14h)

Table 4-7: Device Function Table Offset

Bit Location Register Description Attributes


2:0 Function Table BIR – Indicates which one of a function’s Base RO
Address registers, located beginning at 10h in Configuration Space,
is used to map the Function’s Function Table into Memory Space.
BIR Value Base Address register
0 BAR0 10h
1 BAR1 14h
2 BAR2 18h
3 BAR3 1Ch
4 BAR4 20h
5 BAR5 24h
6..7 Reserved
For a 64-bit Base Address register, the BIR indicates the lower
DWORD.
31:3 Function Table Offset – Used as an offset from the address RO
contained by one of the Function’s Base Address registers to point to
the base of the Function Table. The lower 3 BIR bits are masked off
(set to zero) by software to form a 32-bit QWORD-aligned offset.

The total size of the table (in bytes) is:


((MaxVH + 1) * Function_Table_Entry_Size * 4) + 32

PCISIG Confidential 115


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

This includes both the Function Table and the Function Interrupt Bitmap.
The Function Interrupt Bitmap immediately precedes the Function Table. The size of the Function
Table Interrupt Bitmap is 32 bytes (supporting a maximum of 256 VHs). Bit 0 of the first DWORD
corresponds to VH 0; bit 1 corresponds to VH 1, … Any unused bits are Read Only Zero.

4.2.1.7. MVF and LVF Sizes (18h)

If VF Mapping is supported, this register indicates the MVF Index values that may be assigned to a
VF. Values in the range [1..Max MVF] may be written to the LVF Table mapping field.
Table 4-8: VF MVF Region

Bit Location Register Description Attributes


15:0 Max LVF – Indicates size of the LVF Table. The LVF Table contains RO
MaxLVF+1 entries numbered 0 to Max LVF.
If VF Mapping is not supported, this field is Zero. All BFs from the
same Function Dependency Group must have the same Max LVF
value.
31:15 Max MVF – Indicates the highest MVF Index that can be assigned to RO
a VF. Zero if VF Mapping Supported is Clear. MVF numbers are in
the range [1..MaxMVX #].
If VF Mapping is not supported, this field is Zero. All BFs from the
same Function Dependency Group must have the same Max MVF
value.

116 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.2.1.8. LVF Table Offset (1Ch)

Table 4-9: LVF Table Offset

Bit Location Register Description Attributes


2:0 LVF Table BIR – Indicates which one of a function’s Base Address RO
registers, located beginning at 10h in Configuration Space, is used to
map the Function’s LVF Table into Memory Space.
BIR Value Base Address register
0 BAR0 10h
1 BAR1 14h
2 BAR2 18h
3 BAR3 1Ch
4 BAR4 20h
5 BAR5 24h
6..7 Reserved
For a 64-bit Base Address register, the BIR indicates the lower
DWORD.
31:3 LVF Table Offset – Used as an offset from the address contained by RO
one of the Function’s Base Address registers to point to the base of
the VH Table. The lower 3 BIR bits are masked off (set to zero) by
software to form a 32-bit QWORD-aligned offset.

The total size of the LVF Table (in bytes) is: Max LVF * 4

PCISIG Confidential 117


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.2.1.9. VL Arbitration Capability and Status (20h)

Table 4-10: Device VL Arbitration Capability and Status

Bit Location Register Description Attributes


11:0 VL Arbitration Capability – Indicates the types of VL Arbitration RO
supported by the Port. This field is valid for all Functions that report a
Low Priority Extended VC Count field greater than 0. For all other
Functions, this field must hardwired to 00h.
Each bit location within this field corresponds to a VC Arbitration
Capability defined below. When more than 1 bit in this field is Set, it
indicates that the Port can be configured to provide different VC
arbitration services. Defined bit positions are:
Bit 0 Hardware fixed arbitration scheme, e.g., Round Robin
Bit 1 Weighted Round Robin (WRR) arbitration with 32 phases
Bit 2 WRR arbitration with 64 phases
Bit 3 WRR arbitration with 128 phases
Bit 4 Time-based WRR with 128 phases
Bit 5 WRR Arbitration with 256 phases
Bits 6-7 Reserved
Bit 10-8 Vendor Defined VL Arbitration Scheme
This field is Read Only Zero if VL Arbitration Present is Clear.
12 VL Arb Status – This bit indicates the coherency status of the VL RO
Arbitration Table. This bit is valid only when the VL Arbitration Table
is used.
This bit is Set by hardware when any entry of the VL Arbitration Table
is written to by software. This bit is cleared by hardware when
hardware finishes loading values stored in the VL Arbitration Table
after software sets the Load VL Arbitration Table bit.
Default value of this bit is 0b.
This field is Read Only Zero if VL Arbitration Present is Clear.
13 Reserved RO
15:14 Reference Clock – Indicates the reference clock for Virtual Links that RO
support time-based WRR VL Arbitration. This field is valid only if time-
based WRR is supported.
Defined encodings are:
00b 100 ns reference clock
01b – 11b Reserved
This field is Read Only Zero if VL Arbitration Present is Clear.
18:16 MaxVL – Indicates the number of VLs supported. The Device RO
supports VL0 through VLMaxVL inclusive. This field is only meaningful
in the Main BF of the Device. In all other BFs, it is Read Only Zero.

118 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Bit Location Register Description Attributes


23:19 Reserved RO
30:24 Maximum Time Slots – Indicates the maximum number of time slots RO
(minus one) that are supported when configured for time-based WRR
VL Arbitration. For example, a value 000 0000b in this field indicates
the supported maximum number of time slots is 1 and a value of 111
1111b indicates the supported maximum number of time slot is 128.
This field is valid only when the VL Arbitration Capability field
indicates that time-based WRR VL Arbitration is supported.
This field is Read Only Zero if VL Arbitration Present is Clear.
31 Reserved RO

PCISIG Confidential 119


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.2.1.10. VL Arbitration Control (24h)

Table 4-11: Device VL Arbitration Control

Bit Location Register Description Attributes


0 Load VL Arbitration Table – When Set, this bit updates the VL RW
Arbitration logic from the VL Arbitration Table. This bit is valid only
when the VL Arbitration Table is used by the selected VL Arbitration
scheme (that is indicated by a Set bit in the VL Arbitration Capability
field selected by VL Arbitration Select).
Software sets this bit to signal hardware to update VL Arbitration logic
with new values stored in VL Arbitration Table; clearing this bit has no
effect. Software uses the VL Arbitration Table Status bit to confirm
whether the new values of VL Arbitration Table are completely
latched by the arbitration logic.
This bit always returns 0b when read.
Default value of this bit is 0b.
This field is Read Only Zero if VL Arbitration Present is Clear.
7:4 Reserved RO
11:8 VL Arbitration Select – This field configures the Port to provide a RW
particular VL Arbitration service.
The permissible value of this field is a number corresponding to one
of the asserted bits in the VL Arbitration Capability field.
This field is Read Only Zero if VL Arbitration Present is Clear.
15:12 Reserved RO
23:16 VL Strict Priority Arbitration – This field contains one bit per VL. RW
Bit 0 corresponds to VL0. Bit 7 corresponds to VL7.
When a bit is Set, the corresponding VL is configured to arbitrate as
Strict Priority based on VL number. When a bit is Clear, the
corresponding VL is configured to arbitrate as normal priority (using
the scheme selected by VL Arbitration Select).
Among the VLs configured for strict priority, priority is based on
increasing VL number. VL0 is the lowest strict priority, VL7 is the
highest.
Strict Priority VLs have priority over normal priority VLs.
Behavior is Undefined if a VL configured for Strict Priority is also
included in the VL Arbitration Table.
If a VL is Disabled, the value of the corresponding bit in this field is
ignored.
Default value of this field is 0000 0000b.
This field is Read Only Zero if VL Arbitration Present is Clear.
31:24 Reserved RO

120 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.2.1.11. VL Arbitration Table Offset (28h)

Table 4-12: Device VL Arbitration Table Offset

Bit Location Register Description Attributes


2:0 VL Arbitration Table BIR – Indicates which one of a function’s Base RO
Address registers, located beginning at 10h in Configuration Space,
is used to map the Function’s VL Arbitration Table into Memory
Space.
BIR Value Base Address register
0 BAR0 10h
1 BAR1 14h
2 BAR2 18h
3 BAR3 1Ch
4 BAR4 20h
5 BAR5 24h
6..7 Reserved
For a 64-bit Base Address register, the BIR indicates the lower
DWORD.
This field is Read Only Zero if VL Arbitration Present is Clear.
31:3 VL Arbitration Table Offset – Used as an offset from the address RO
contained by one of the Function’s Base Address registers to point to
the base of the VL Arbitration Table. The lower 3 BIR bits are masked
off (set to zero) by software to form a 32-bit QWORD-aligned offset.
This field is Read Only Zero if VL Arbitration Present is Clear.

PCISIG Confidential 121


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.2.1.12. MR Error Status (2Ch)

Table 4-13: Device MR Error Status

Bit Location Register Description Attributes


3:0 MR First Error Pointer RO
4 MR Uncorrectable TLP Error Status RW1C

5 MR Global Key Error Status RW1C


14:6 Reserved RsvdZ
15 MR DLLP Error Status – DLLP Errors are not logged. RW1C

4.2.1.13. MR Error Control (2Eh)

Table 4-14: Device MR Error Control

Bit Location Register Description Attributes


3:0 Reserved RO
4 MR Uncorrectable TLP Error Mask RW
5 MR Global Key Error Mask RW
14:6 Reserved RO
15 MR DLLP Error Mask RW

4.2.1.14. MR Error Log (30h to 40h)

These fields contain the TLP Prefix and TLP Header corresponding to the error described by the
First Error Pointer in the MR Error Status register.
The value of these fields is undefined if the First Error Pointer is zero or points to a bit number that
is not Set.
Headers are not logged and the First Error Pointer is not updated for DLLP Errors.

4.2.1.15. Statistics Capability and Control (44h to 50h)

Device and Switch Statistics related fields are described in Section 4.3.7.

4.2.2. Device VL Arbitration Table


Switch and Device VL Arbitration tables are identical. See Section 4.3.7 for details.

122 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.2.3. LVF Table

LVF Table Offset

Config Space
BAR Memory Space

LVF0 MVF #

LVF0 VF State
LVF Table
LVFMax VF State
LVFMax MVF #

Figure 4-3: LVF Table


The LVF Table is present if VF Mapping Supported is Set.
LVF Table Entries are one DWORD.

4.2.3.1. LVF Table Entry

Table 4-15: LVF Table Entry

Bit Location Register Description Attributes


15:0 MVF # for this LVF – MVF mapped to this LVF. Only enough bits are RW
implemented to express values in the range [0..VF MVF Region
High]. Unimplemented high order bits are read only zero.
The value 0 indicates this LVF is not mapped to any MVF.
Default value is Vendor Specific.
17:16 VF State – VF Migration State for this LVF. Values are: RW
00 Inactive.Unavailable
01 Inactive.MigrateIn
11 Active.Available
10 Active.MigrateOut
MR-PCIM changes VF Migration state by writing this value.
If VF Migration is not supported, this field is Read Only Zero.
Even when VF Migration is Disabled, MR-PCIM must ensure that this
field is meaningful (i.e. VFs below InitialVFs should be
Active.Available).
31:18 Reserved RO

PCISIG Confidential 123


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.2.4. Function Table


The Function table contains one entry for each VH. The first entry represents VH0; the second
entry represents VH1, etc.
Every non-zero VH contains exactly one PF associated with this BF. VH0 can optionally have a
single PF.
Certain Function Table Entry fields are only implemented in the Main BF of the Device (for
example, the Global Key fields and MFVC related fields).
The first Function Table Entry is always present. If VH0 does not contain a PF, most of the fields
are Read Only Zero but a few fields remain meaningful (e.g. Global Key).

124 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Entry Size Max VH Function Offset


Function Table Offset
Num MFVC Resources Hardware Present
Config Space MFVC Capability Supported
Capability
Memory Space Num VC Resources Hardware Present
VC Capability Supported
Function
VF Stride First VF Offset
Interrupt Bitmap

Function Table Global Key


Entry 0 Global Key Check Enable
MFVC Low Prio Extended VC Count
MFVC Extended VC Count
Function Table
Entry Max Low Prio Extended VC Count
Extended VC Count
InitialVFs Control
VF Migration Capable
VF Enable Enable
VF Migration Status Enable
PF Reset Initiated Enable
MFVC Config Enable
VC Config Enable
TotalVFs BaseLVF

NumVFs
VF Migration Enabled
VF Enabled
VF Initialization Pending Status
VF Enable Changed
VF Migration Status
PF Reset Initiated
MFVC Config Changed
VC Config Changed
VC3 VC2 VC1 1 VC0 VC ID to
VC7 VC6 VC5 VC4 VL Map
VC Resource 1 State VC Resource 0 State
VC Resource 3 State VC Resource 2 State
VC Resource 5 State VC Resource 4 State
Read Only
VC Resource 7 State TCàVC Map6
VC State
VC Resource 6 VC ID
VC Resource 6 VC Negotiation Pending
VC Resource 6 Enabled
MFVC Resource 1 State MFVC Resource 0 State
MFVC Resource 3 State MFVC Resource 2 State
MFVC Resource 5 State MFVC Resource 4 State
Read Only
MFVC Resource 7 State TCàVC Map6
MFVC State
MFVC Resource 6 VC ID
MFVC Resource 6 VC Negotiation Pending
MFVC Resource 6 Enabled

PCISIG Confidential 125


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Figure 4-4: Device Function Table

4.2.4.1. Function Capability (00h and 04h)

Table 4-16: Function Capability 1 (00h)

Bit Location Register Description Attributes


0 VC Capability Supported –If Set, the Function contains a VC RO
Capability. If Clear, the Function does not contain a VC Capability.
3:1 Num VC Resources Hardware Present –Number of VCs supported RO
in hardware in each VH’s VC. Maximum value of this field is MaxVL.
If VC Capability Supported is Clear, this field is hardwired 000b.
4 MFVC Capability Supported – If Set, in every VH, the PF RO
associated with this BF contains an MFVC Capability. If Clear, the PF
associated with this BF does not contain an MFVC Capability.
This bit may only be Set in the BF that manages Function 0 within the
VH (i.e. Function Offset is 0000h). Comment [sdg6]: 0100h for VH0?
What do we support for MFVC in the case
7:5 Num MFVC Resources Hardware Present –Number of VCs RO where PFs are on two bus #s?
supported in hardware in each VH’s MFVC. Maximum value of this
field is MaxVL.
If MFVC Capability Supported is Clear, this field is hardwired 000b.
15:8 Reserved RO
31:16 Function Offset – Offset that describes the PF’s RID within the RO
associated VH. If the Captured Bus Number is NNh, the PF RID is
Function Offset plus NN00h.
For the first Function Table Entry (i.e. VH0), Function Offset must be
in the range [0000h..01FFh] or FFFFh. The values 0000h to 01FFh
allow the VH0 PF to be located anywhere on either the Captured Bus
Number or the Captured Bus Number + 1. The special FFFFh value
indicates that this BF does not have a PF in VH0.
For the remaining Function Table Entries, Function Offset must be in
the range [0000h..00FFh]. these values allow \the associated PF to
be located anywhere on the captured Bus Number.
With the exception of the first Function Table Entry (i.e. VH0), all
Function Offset values for a given BF must have the same value.

Table 4-17: Function Capability 2 (04h)

Bit Location Register Description Attributes


15:0 First VF – Contains the First VF of this PF’s SR-IOV Capability. Zero RO
if this Function is not a PF (i.e. it has no SR-IOV Capability).
31:16 VF Stride – Contains the VF Stride of this PF’s SR-IOV Capability. RO
Zero if this Function is not a PF (i.e. it has no SR-IOV Capability).

126 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.2.4.2. Function Control (08h to 10h)

Table 4-18: Function Control 1 (08h)

Bit Location Register Description Attributes


0 Reserved RO
3:1 VC Extended VC Count – The value presented to software in the VH RW
in the Extended VC Count field of the MFVC Capability. Valid values
are [0..Num VC Hardware Resources Present].
MR-PCIM may set this value to restrict the number of VCs offered to
a VH.
Default value is 0h. If VC Capability Supported is Clear, this field is
hardwired 0h.
4 Reserved RO
7:5 VC Low Priority Extended VC Count – The value presented to RW
software in the VH in the Low Priority Extended VC Count field of the
VC Capability. Valid values are [0..VC Extended VC Count].
The value of this field does not affect arbitration in any manner. This
field is allows MR-PCIM to indicate to software in the VH which VCs it
should think are strict priority arbitration.
Default value is 0h. If VC Capability Supported is Clear, this field is
hardwired 0h.
8 Reserved RO
11:9 MFVC Extended VC Count – The value presented to software in the RW
VH in the Extended VC Count field of the MFVC Capability. Valid
values are [0..Num MFVC Hardware Resources Present].
MR-PCIM may set this value to restrict the number of VCs offered to
a VH.
Default value is 000b. If MFVC Capability Supported is Clear, this
field is hardwired 000b.
12 Reserved RO
15:13 MFVC Low Priority Extended VC Count – The value presented to RW
software in the VH in the Low Priority Extended VC Count field of the
MFVC Capability. Valid values are [0..MFVC Extended VC Count].
The value of this field does not affect arbitration in any manner. This
field is allows MR-PCIM to indicate to software in the VH which VCs it
should think are strict priority arbitration.
Default value is 000b. If MFVC Capability Supported is Clear, this
field is hardwired 000b.
27:16 Global Key – TLPs received are checked against this value. TLPs RW
sent contain this value. If this field contains 000h, the “wild card”
value, checking is disabled. If a TLP contains 000h, the TLP is a wild
card TLP and checking always passes.
Default value of this field is 000h.

PCISIG Confidential 127


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Bit Location Register Description Attributes


This field is only implemented in one BF of the Device. For that BF,
all Function Table entries are implemented (even if VH0 the BF that
manages Function 0 within the non-zero VHs (i.e. Function Offset is
0000h).
30:28 Reserved RO
31 Global Key Check Enable – If Set, Global Key checking is RW
performed. If Clear, Global Key mismatches are ignored

128 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Table 4-19: Function Control 2 (0Ch)

Bit Location Register Description Attributes


0 VC Config Interrupt Enable – If Set, setting the VC Config Changed RW
bit for this Function Table Entry sets the corresponding Function
Interrupt Bitmap. If Clear, the Function Interrupt Bitmap associated
with this Function Table Entry is not affected by VC Config Changed.
Default is 0b. This field is Read Only Zero if VC Capability Supported
is Clear.
1 MFVC Config Changed Enable – If Set, setting the MFVC Config RW
Changed bit for this Function Table Entry sets the corresponding
Function Interrupt Bitmap. If Clear, the Function Interrupt Bitmap
associated with this Function Table Entry is not affected by MFVC
Config Changed.
Default is 0b. This field is Read Only Zero if MFVC Capability
Supported is Clear.
2 PF Reset Initiated Enable – If Set, setting the PF Reset Initiated bit RW
for this Function Table Entry sets the corresponding Function
Interrupt Bitmap. If Clear, the Function Interrupt Bitmap associated
with this Function Table Entry is not affected by PF Reset Initiated.
Default is 0b. This field is Read Only Zero if VF Migration Supported
is Clear
3 VF Migration Status Enable – If Set, setting the VF Migration Status RW
bit for this Function Table Entry sets the corresponding Function
Interrupt Bitmap. If Clear, the Function Interrupt Bitmap associated
with this Function Table Entry is not affected by VF Migration State.
Default is 0b. This field is Read Only Zero if VF Migration Supported
is Clear.
4 VF Enable Enable – If Set, setting the VF Enable changed bit for this RW
Function Table Entry sets the corresponding Function Interrupt
Bitmap. If Clear, the Function Interrupt Bitmap associated with this
Function Table Entry is not affected by VF Enable Changed.
Default is 0b.
8:5 Reserved RO
9 VF Migration Capable – Set by MR-PCIM to indicate to SR-PCIM RW
that VF Migration support is available. VF Migration is possible only if
this bit and VF Migration Enabled are both set.
Default is 0b. This field is Read Only Zero if VF Migration Supported
is Clear.
15:10 Reserved RO
31:16 InitialVFs – Number of VFs provided to SR-PCIM and populated by RW
MR-PCIM with MVFs.
This field is meaningful only for PFs. If this Function does not contain
an SR-IOV Capability, this field is Read Only Zero.
If VF Mapping is not supported on this PF, this field is Read Only and
indicates the number of VFs that were provisioned using Vendor

PCISIG Confidential 129


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Bit Location Register Description Attributes


Specific mechanisms.
If VF Mapping is supported, the default value of this field is 0000h.
The value 0 indicates that no populated VFs are offered to SR-PCIM.

Table 4-20: Function Control 3 (10h)

Bit Location Register Description Attributes


15:0 Base LVF – Set by MR-PCIM to contain the index of first LVF table RW
entry assigned to this PF.
If VF Mapping is not supported, this field is Read Only Zero.
The default value of this field is Vendor Specific.
31:16 TotalVFs – Total Number of VFs provided to SR-PCIM including RW
populated and non-populated VFs.
This field is meaningful only if VF Migration is supported. If VF
Migration is not supported, this field is Read Only Zero and the
InitialVFs value is returned as TotalVFs to SR-PCIM.
The default value of this field is 0000h.
MR-PCIM must configure this field to be greater than or equal to
InitialVFs.
The value 0 indicates that no VFs are offered to SR-PCIM.

130 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.2.4.3. Function Status (14h)

Setting of any of bits 7:0 of this register Sets the corresponding Function Interrupt Bitmap entry if
the Function Interrupt Enable bit is Set. When software clears all of these bits, or clears the
Function interrupt Enable bit, the corresponding Function Interrupt Bitmap entry is also cleared.
Table 4-21: Function Status

Bit Location Register Description Attributes


0 VC Config Changed – Set when software in a VH changes any of RW1C
the VC Resource State bits. Default is 0b. If VC Capability Supported
is Clear, this field is hardwired 0b.
1 MFVC Config Changed – Set when software in a VH changes any of RW1C
the MFVC Resource State bits. Default is 0b. If MFVC Capability
Supported is Clear, this field is hardwired 0b.
2 PF Reset Initiated – Set when the PF is Reset. This can be due to a RW1C
Reset DLLP within the VH or due to software in the VH issuing a
Function Level Reset (FLR) to the PF.
This field is Read Only Zero if VF Migration Supported is Clear.
3 VF Migration Status – Set when a VF Migration event is triggered for RW1C
some VF associated with this PF. See Section 3.2.4 for details.
This field is Read Only Zero if VF Migration Supported is Clear.
4 VF Enable Changed – Set when SR-PCIM changes VF Enable. RW1C
5 VF Initialization Pending – Set when VF Migration Capable is Set RW1C
and either the PF is Reset or VF Enable is Cleared. When Set,
access with the VH to VF Configuration Space will return CRS.
This field is Read Only Zero if VF Migration Supported is Clear.
To avoid software within the MR-PCIM software should re-establish a
valid initial VF Configuration and Clear this bit within 1 second. See
Section 3.2.4.2 for details.
7:6 Reserved RO
8 VF Enabled – MR-IOV visible copy of the SR-IOV VF Enable bit. RO
9 VF Migration Enabled – MR-IOV visible copy of the SR-IOV VF RO
Migration Enable bit.
15:10 Reserved RO
31:16 NumVFs – MR-IOV visible copy of the SR-IOV NumVFs value. RO

PCISIG Confidential 131


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.2.4.4. Function VC to VL Map (18h and 1Ch)

This table maps VCs in a VH to VLs. To preserve PCI Express ordering and flow control
independence assumptions, each VC must be assigned to a distinct VL.
This map is only meaningful on the Main BF of the Device. In all other BFs, all fields of this map
are Read Only Zero.

Table 4-22: Function Table VC to VL Map 1 (VC Capability)

Bit Location Register Description Attributes


2:0 VC0 VL Map – Indicates the VL number used for traffic labeled VC0. RW
If VC Capability Supported is Clear or if MaxVL is 0h, this field is
Read Only Zero.
The default value of this field is 0h.
3 Reserved RO
4 VC0 VL Map Enable – Indicates that the VC0 VL Map field contains RW
a valid VL number. The default value of this field is 0b.
Hardware behavior on the 1b to 0b transition of this bit is undefined.
7:5 Reserved RO
10:8 VC1 VL Map – Indicates the VL number used by traffic labeled VC1. RW
This field is Read Only Zero if Num VC Resources Hardware Present
is 0. The default value of this field is 1h.
11 Reserved RO
12 VC1 VL Map Enable – Indicates that the VC1 VL Map field contains RW
a valid VL number. The default value of this field is 0b.
Hardware behavior on the 1b to 0b transition of this bit is undefined.
15:13 Reserved RO
18:16 VC2 VL Map – Indicates the VL number used by traffic labeled VC2. RW
This field is Read Only Zero if Num VC Resources Hardware Present
is 0 or 1. The default value of this field is 2h.
19 Reserved RO
20 VC2 VL Map Enable – Indicates that the VC2 VL Map field contains RW
a valid VL number. The default value of this field is 0b.
Hardware behavior on the 1b to 0b transition of this bit is undefined.
23:21 Reserved RO
26:24 VC3 VL Map – Indicates the VL number used by traffic labeled VC3. RW
This field is Read Only Zero if Num VC Resources Hardware Present
is 0 to 2. The default value of this field is 3h.
27 Reserved RO
28 VC3 VL Map Enable – Indicates that the VC3 VL Map field contains RW
a valid VL number. The default value of this field is 0b.

132 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Bit Location Register Description Attributes


Hardware behavior on the 1b to 0b transition of this bit is undefined.
31:29 Reserved RO

Table 4-23: Function Table VC to VL Map 2 (VC Capability)

Bit Location Register Description Attributes


3:0 VC4 VL Map – Indicates the VL number used by traffic labeled VC4. RW
This field is Read Only Zero if Num VC Resources Hardware Present
is 0 to 3. The default value of this field is 4h.
4 Reserved RO
5 VC4 VL Map Enable – Indicates that the VC4 VL Map field contains RW
a valid VL number. The default value of this field is 0b.
Hardware behavior on the 1b to 0b transition of this bit is undefined.
7:6 Reserved RO
10:8 VC5 VL Map – Indicates the VL number used by traffic labeled VC5. RW
This field is Read Only Zero if Num VC Resources Hardware Present
is 0 to 4. The default value of this field is 5h.
11 Reserved RO
12 VC5 VL Map Enable – Indicates that the VC5 VL Map field contains RW
a valid VL number. The default value of this field is 0b.
Hardware behavior on the 1b to 0b transition of this bit is undefined.
15:13 Reserved RO
18:16 VC6 VL Map – Indicates the VL number used by traffic labeled VC6. RW
This field is Read Only Zero if Num VC Resources Hardware Present
is 0 to 5. The default value of this field is 6h.
19 Reserved RO
20 VC6 VL Map Enable – Indicates that the VC6 VL Map field contains RW
a valid VL number. The default value of this field is 0b.
Hardware behavior on the 1b to 0b transition of this bit is undefined.
23:21 Reserved RO
26:24 VC7 VL Map – Indicates the VL number used by traffic labeled VC7. RW
This field is Read Only Zero if Num VC Resources Hardware Present
is 0 to 6. The default value of this field is 7h.
27 Reserved RO
28 VC7 VL Map Enable – Indicates that the VC7 VL Map field contains RW
a valid VL number. The default value of this field is 0b.
Hardware behavior on the 1b to 0b transition of this bit is undefined.
31:29 Reserved RO

PCISIG Confidential 133


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.2.4.5. Function VC Resource (20h to 2Ch)

These fields return data from the VC Capability of the associated Type 1 Configuration Header.
They allow MR-PCIM software to track the enabling and mapping of VCs with each VH.
VC Resource fields for resource numbers above Num VC Resource Hardware Present are not
implemented and return 0 when read. VC Resource fields for resource numbers above Extended VC
Count are Undefined.
VC Resource State 0 is located at offset 1Ch; VC Resource 1 is located at offset 1Eh; etc. Fields
Deleted: Table 4-25
within VC Resource State are described in Table 4-25.
Table 4-24: Function Table VC Resource State

Bit Location Register Description Attributes


0 VC Enabled – This field tracks the VC Enabled bit set by software RO
operating in the VH. Per the PCI Express specification, VC Resource
0 is always enabled and thus VC Enabled for VC Resource 0 is
always set.
1 VC Negotiation Pending – This field tracks the VC Negotiation RO
Pending bit view in the VH. This bit is Set when VC Enabled is Set
and either no VL has been mapped to this VC (associated VL Map
Enable bit is Clear) or Flow Control negotiation has not completed on
the mapped VH and VL.
3:2 Reserved RO
6:4 VC ID – This field tracks the VC ID field set by software operating in RO
the VH. Per the PCI Express specification, VC ID for VC Resource 0
is always 0.
7 Reserved RO
15:8 TC to VC Map – This field tracks the TC to VC Map field set by RO
software operating in the VH. Per the PCI Express specification, bit 0
of this field is fixed and the remaining bits may be set by software.

4.2.4.6. Function Table MFVC Resource Status (30h to 3Ch)

These fields return data from the MFVC Capability of the associated Type 1 Configuration Header.
They allow MR-PCIM software to track the enabling and mapping of VCs with each VH.
VC Resource fields for resource numbers above Num MFVC Resource Hardware Present are not
implemented and return 0 when read. VC Resource fields for resource numbers above Extended
MFVC Count are Undefined.
VC Resource State 0 is located at offset 30h; VC Resource 1 is located at offset 32h; etc. Fields
Deleted: Table 4-25
within VC Resource State are described in Table 4-25.

134 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Table 4-25: VH Table MFVC Resource State

Bit Location Register Description Attributes


0 VC Enabled – This field tracks the VC Enabled bit set by software RO
operating in the VH. Per the PCI Express specification, VC Resource
0 is always enabled and thus VC Enabled for VC Resource 0 is
always set.
1 VC Negotiation Pending – This field tracks the VC Negotiation RO
Pending bit view in the VH. This bit is Set when VC Enabled is Set
and either no VL has been mapped to this VC (associated VL Map
Enable bit is Clear) or Flow Control negotiation has not completed on
the mapped VH and VL.
3:2 Reserved RO
6:4 VC ID – This field tracks the VC ID field set by software operating in RO
the VH. Per the PCI Express specification, VC ID for VC Resource 0
is always 0.
7 Reserved RO
15:8 TC to VC Map – This field tracks the TC to VC Map field set by RO
software operating in the VH. Per the PCI Express specification, bit 0
of this field is fixed and the remaining bits may be set by software.

4.2.4.7. Function Interrupt Bitmap (minus 20h)

The Function Interrupt Status bitmap precedes the Function Table. It is always 32 bytes (supporting
switches with a maximum of 256 ports).
Bits in this table are Read Only. A bit is Set to indicate the Function has an interrupt pending and
Clear otherwise. These Interrupt Status bits are cleared either by clearing the appropriate Function
Interrupt Pending bit or by masking the interrupt using the Function Interrupt Enable.
An MSI Interrupt is requested on any zero to one transition of any of these bits.

4.2.5. Misc. Device Configuration Space Requirements

4.2.5.1. BIST (Device)

BIST remains optional in MR-IOV. The results of invoking BIST in any non-BF Function must not
affect any other VH.
The results are undefined if software invokes BIST in a BF when any VC to VL Map Enable bit in
any Function Table entry of any BF of the Device is Set..

PCISIG Confidential 135


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.3. Switch Configuration Space


For managing MR-IOV Switches, MR-PCIM must use the Switch Type 1 Headers to manage the
Switch. Configuration controls are associated with:
‰ Entire component

‰ Physical Port of the component

‰ Virtual Switch within the component

‰ PCI-to-PCI Bridges within each Virtual Switch

There are nine tables provided by the Switch. These tables are located using the MR-IOV Capability
block located in the Type 1 Configuration Space of upstream P2P Bridge(s). An overview of the
Deleted: Figure 4-5
tables is show in Figure 4-5.

136 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Port Interrupt
Bitmap
MR-IOV Capability Hdr Port0 Port0 VL Arb Table
MR-IOV Capability Bits Table Entry

# Ports
MR-IOV Control Bits
MR-IOV Status Bits PortN
VS# VS Bridge# Table Entry PortN VL Arb Table

Watchdog Timer Control

Auth VS Mgmt VS VS Interrupt


Bitmap
Entry Size # Ports VS0 Bridge0
VS0 Table Entry
Port Table Offset

# Bridges
Table Entry

VS0
Entry Size # VS

# VS
VS Table Offset VS0 BridgeM
VSN Table Entry
Entry Size # Bridges
Table Entry
VS Bridge Table Offset
# Stats Blks # Stats Desc VSN Bridge0
Statsistics Start / Busy Table Entry

# Bridges
Statistics

VSN
Stats Descriptor Offset
# Stats Desc
Descriptor 0
Stats Block Offset VSN BridgeM
Table Entry
Statistics
Descriptor Max
VS Authorization Bitmap
Backup VS Authorization Statistics
Block 0

Statistics 0

# Stats in Block N
Block N
# Stats
Blocks

Statistics
Block N

Statistics Max
Block N
Statistics
Block Max

Config Space BAR Memory Space

Figure 4-5: Switch Mapping Tables


‰ The MR-IOV Capability contains the controls that apply to the entire Switch. It also contains
BAR relative offsets to tables located in BAR Memory Space and associated table size
information.
‰ The VS Authorization Bitmap contains one bit for each Virtual Switch. If the corresponding
bit is Set, the associated VS is authorized and can be used to mange the MR switch.

PCISIG Confidential 137


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

‰ The Port Table contains an entry for every Port on the Switch. The table may be sparse (i.e.
contain unused table entries) for hardware implementation flexibility. The Port Table contains
the fields that control the physical link. The Port table also points to the optional VL
Arbitration Table. Preceding the Port Table is a 256 bit Port Interrupt Summary.
‰ The optional VL Arbitration Table supports controlling the arbitration between Virtual Links
for access to a Port. This table is modeled after the VC Arbitration Table in PCI Express.
‰ The VS Table contains an entry for every Virtual Switch in the MR Switch. The table may be
sparse for hardware implementation flexibility. Preceding the VS Table is a 256 bit VS
Interrupt Summary.
‰ The VS Bridge Table contains an entry for every PCI-to-PCI bridge in every Virtual Switch.
This table is a two dimensional array indexed by VS number and by P2P Bridge number.
Within a VS, the first entry corresponds to the upstream P2P Bridge and the remaining entries
are the downstream bridge(s). This table may also be sparse but the upstream entry of each VS
must be present if the associated VS is present. No entries are present in this table unless the
associated VS table is also present.
‰ The optional Statistics Descriptor Table contains descriptions of the varieties of performance
counters and statistics information supported. This table is read only and contains one entry
for each counter style supported by the component.
‰ The optional Statistics Block Table contains an entry for every block of related statistics
counters. Each entry contains controls for the block and an offset to the array of counters.
‰ The optional Statistics Counter Table contains the actual counters and sampled values. There
is one table for each Statistics Block. The offset and size of this table is contained in the
associated Statistics Block Table.
These tables must be visible in the Upstream P2P Bridge of all authorized VHs as. A subset of the
MR-IOV Capability (but none of the other tables) may optionally be present in the Upstream P2P
Bridges of non-authorized VHs. This subset MR-IOV Capability must be present in VH 0
Upstream P2P Bridges that are attached to MR aware Root Ports.

4.3.1. Switch MR-IOV Extended Capability


Deleted: Figure 4-6
Figure 4-6 shows the Switch MR-IOV Extended Capability structure in more detail.

138 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

00h Next Cap Offset Vers Capability ID (0011h) MR-IOV Capability Header
04h MSI Vector # MR-IOV Capability Bits
08h MR Switch Number

Vendor Specific Interrupt Enable


Statistics Interrupt Enable
MR-IOV Control Bits
Watchdog Timer Interrupt Enable
VS Interrupt Enable
Port Interrupt Enable

0Ch

MSI Scheduled Vendor Specific


Interrupt Status
Statistics Interrupt Status
MR-IOV Status Bits
Watchdog Timer Interrupt Status
VS Interrupt Status
Port Interrupt Status

10h VS# VS Bridge #


MR-IOV This Bridge Map
VS Is Authorized

14h Timer Interval1 Time Interval2

Watchdog Timer 1 Expired Watchdog


Rearm Watchdog 1 and 2 Timer Control
18h Mgmt VS #
Authorization
VS Authorization Bitmap Offset

These fields and the data structures they point to


1Ch

are visible only to Authorized Virtual Switches


Port Table Entry Size
Port Table
# Port Entries

20h Port Table Offset BIR

24h

VS Table Entry Size


VS Table
# VS Table Entries

28h VS Table Offset BIR

2Ch

VS Bridge Table Entry Size VS Bridge


# Bridge Table Entries Table
30h VS Bridge Table Offset BIR

34h # Stat Blocks # Stat Desc

38h
Statistics Block 31 Statistics Block 1 Start / Busy
Start / Busy Statistics
Statistics Block 0 Start / Busy
3Ch Statistics Descriptor Table Offset BIR

40h Statistics Block Table Offset BIR

PCISIG Confidential 139


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Figure 4-6: Switch MR-IOV Capability Diagram

4.3.1.1. Switch MR-IOV Extended Capability Header (00h)

Table 4-26 defines the Switch MR-IOV Extended Capability Header. The Capability ID for the Deleted: Table 4-26
Switch MR-IOV Extended Capability is 0011h.
Table 4-26: Switch MR-IOV Extended Capability Header

Bit Location Register Description Attributes


15:0 PCI Express Extended Capability ID – This field is a PCI-SIG RO
defined ID number that indicates the nature and format of the
Extended Capability.
The Extended Capability ID for the MR-IOV Extended Capability is
0011h.
19:16 Capability Version – This field is a PCI-SIG defined version number RO
that indicates the version of the Capability structure present.
Must be 1h for this version of the specification.
31:20 Next Capability Offset – This field contains the offset to the next PCI RO
Express Capability structure or 000h if no other items exist in the
linked list of Capabilities.
This offset is relative to the beginning of PCI compatible
Configuration Space and thus must always be either 000h (for
terminating list of Capabilities) or greater than 0FFh.

4.3.1.2. Switch MR-IOV Capability (04h)

Table 4-27: Switch MR-IOV Capability Bits

Bit Location Register Description Attributes


21:0 Reserved RO
31:21 MSI Vector Number – This field indicates the MSI Vector number RO
used to signal MR-IOV events within this Switch. This value may
change based on whether MSI or MSI-X is enabled.

140 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.3.1.3. Switch MR-IOV Control (08h)

Table 4-28: Switch MR-IOV Control Bits

Bit Location Register Description Attributes


0 Port Interrupt Enable – Enables delivery Port Interrupts. See the RW
Port Table for details. This bit is implemented per MR-IOV capability.
This bit is Read Only Zero unless this VS is Authorized. The default
value of this field is 0b.
1 VS Interrupt Enable – Enables delivery of VS Interrupts. See the VS RW
Table for details. This bit is implemented per MR-IOV capability. This
bit is Read Only Zero unless this VS is Authorized. The default value
of this field is 0b.
2 Watchdog Timer Interrupt Enable – Enables delivery of Watchdog RW
Timer Interrupts. This bit is implemented per MR-IOV capability. This
bit is Read Only Zero unless this VS the is Authorized or is
Authorized in the Shadow Authorization bitmap. The default value of
this field is 0b.
10:3 ReservedP RO
14:11 Vendor Specific Interrupt Enable– These bits enable delivery of a RW
variety of Vendor Specific Interrupts.
All Vendor Specific Interrupts are optional. Unimplemented Interrupt
Enable bits are Read Only Zero. Different Ports may implement
different Vendor Specific Interrupts.
Implementation and usage of these bits may vary based on whether
the VS is Authorized. The default value of this field is 0000b.
15 ReservedP RO
31:16 MR Switch Number – Scratchpad register used by MR-PCIM in RW
detecting loops during Topology Enumeration. This field is Read Only
unless this VS is Authorized.

PCISIG Confidential 141


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.3.1.4. Switch MR-IOV Status (0Ch)

Table 4-29: Switch MR-IOV Status Bits

Bit Location Register Description Attributes


0 Port Interrupt Status – Indicates that a Port Interrupt is pending. RO
This bit is Read Only and indicates that some bit in the Port Interrupt
Bitmask is Set. This bit is Read Only Zero if this VS is not Authorized.
1 VS Interrupt Status – Indicates that a VS Interrupt is pending. This RO
bit is Read Only and indicates that some bit in the VS Interrupt
Bitmask is Set. This bit is Read Only Zero if this VS is not Authorized.
2 Watchdog Timer Interrupt Status – Indicates that the watchdog RW1C
timer expired. This bit is implemented per MR-IOV capability.
10:3 ReservedZ RO
14:11 Vendor Specific Interrupt Status– These bits indicate that one of a RW1C / RO
variety of Vendor Specific Interrupt is pending.
All Vendor Specific Interrupts are optional. Unimplemented Interrupt
Status bits are Read Only Zero.
Bits in this field are either RW1C or RO. If RO, clearing the interrupt
status involves Vendor Specific mechanisms.
Implementation and usage of bits may vary based on whether the VS
is Authorized.
15 MSI Scheduled – Set when an MSI has been requested. If Set, RW1C
subsequent MSIs are supporessed. If Clear, any enabled interrupt will
cause an MSI to be scheduled (and this bit to be Set). Default is 0b.
31:16 ReservedZ RO

142 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.3.1.5. MR-IOV This Bridge Map (10h)

This Read Only register returns the VS number and P2P Bridge number within the VS of this type 1
config header. In the upstream P2P Bridge of a VS, it also indicates the Authorization status of the
VS.
Table 4-30: Switch MR-IOV This Bridge Map

Bit Location Register Description Attributes


0 VS is Authorized – Indicates that the VS is authorized to manage RO
the switch. See Authorization Control for details.
15:1 ReservedZ RO
23:16 VS Bridge Number – Indicates the P2P Bridge number within the RO
indicated VS of this Type 1 config space. This value can be used to
locate the associated VS Bridge Table Entry
31:24 VS Number – Indicates the VS number of this Type 1 config space. RO
This value can be used to locate the associated VS Table and VS
Bridge Table Entries.

4.3.1.6. Watchdog Timer Control (14h)

The Watchdog Timer is used ensure that a backup MR-PCIM can recover and take over from the
primary MR-PCIM. One key area is the situation where discovery has failed and the Initial MR-
PCIM has altered the switch configuration(s) such that the backup MR-PCIM can’t manage a switch
(e.g. the Backup MR-PCIM VS could have been de-authorized, some inter-switch link directions
could be configured inappropriately, etc.).
Two Watchdog Timers are provided:
‰ Timer 1 supports sending an interrupt, reauthorizing selected Virtual Switches and resetting
the Link Direction on selected Ports.
‰ Timer 2 supports a complete reset of the switch. This can be used as a “fall back” mechanism
in the case that Timer 1 was not able to reconfigure.

PCISIG Confidential 143


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Table 4-31: Switch MR-IOV Authorization Control

Bit Location Register Description Attributes


0 Rearm Watchdog 1 and 2 – Writing 1 causes both Watchdog Timers RW
to start counting. Writing 0 has no effect. Reads as zero.
1 Timer 1 Expired – Returns 1 if Watchdog Timer 2 has expired. RO
Returns 0 if the timer is disabled or is still counting.
15:2 ReservedZ RO
23:16 Watchdog Timer Interval 2 – Determines the duration of the RW
Watchdog Timer 2. The timer expires if this period of time elapses
before software restarts the watchdog timer.
If the timer expires, the MR Switch is returned to its Initial Power-On
Condition (e.g. parameters are reloaded based on straps, EEPROM,
etc.).
Encoding for this field is TBD. One encoding is used to indicate the
timer is disabled and will never expire.
31:24 Watchdog Timer Interval 1 – Determines the duration of the RW
Watchdog Timer 1. The timer expires if this period of time elapses
before software restarts the watchdog timer.
If the timer expires, an interrupt is signaled, the backup Authorization
Bitmap is copied to the Authorization Bitmap, and for all Ports, the
Backup Link Direction Control is copied to the Link Direction Control.
Encoding for this field is TBD. One encoding is used to indicate the
timer is disabled and will never expire.

4.3.1.7. Authorization (18h)

This register determines which Virtual Switches are authorized to manage the MR Switch. It also
indicates which VS receives “route to MR-PCIM” messages initiated by this switch.
Table 4-32: Switch MR-IOV Authorization Control

Bit Location Register Description Attributes


7:0 Management VS – Indicates which VS is considered the primary RW
management VS. The active MR-PCIM is running above the Root
Port at the top of the hierarchy containing this VS.
The indicated VS is automatically authorized independent of the state
of the corresponding VS Authorization Bitmap entry.
This field is Read Only Zero if this VS is not Authorized.
19:8 ReservedP RO
31:20 VS Authorized Bitmap Offset – This field contains the offset to the RO
Authorization Bitmap. This offset is relative to the beginning of PCI
compatible Configuration Space. See section 4.3.2 for details.

144 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.3.1.8. Port Table Entry Size / Num Port Entries (1Ch)

Table 4-33: Switch Port Table Sizes

Bit Location Register Description Attributes


7:0 Num_Port_Table_Entries – Returns the number of entries in the RO
Port Table.
15:8 Port_Table_Entry_Size – Returns the size of a Port Table Entry in RO
DWORDs. For the current version of this specification this value must
be 15h or larger. Implementations may use larger values to simplify
address arithmetic.
31:16 ReservedZ RO

The total size of the Port Table (in bytes) is:


Num_Port_Table_Entries * Port_Table_Entry_Size * 4

4.3.1.9. Port Table Offset (20h)

Table 4-34: Switch Port Table Offset

Bit Location Register Description Attributes


2:0 Port Table BIR – Indicates which one of a function’s Base Address RO
registers, located beginning at 10h in Configuration Space, is used to
map the Function’s Port Table into Memory Space.
BIR Value Base Address register
0 BAR0 10h
1 BAR1 14h
2..7 Reserved
For a 64-bit Base Address register, the BIR indicates the lower
DWORD.
31:3 Port Table Offset – Used as an offset from the address contained by RO
one of the Function’s Base Address registers to point to the base of
the Port Table. The lower 3 BIR bits are masked off (set to zero) by
software to form a 32-bit QWORD-aligned offset.

The Port Table starts at the Port Table Offset. The Port Interrupt Bitmap immediately precedes the
Port Table (i.e. it starts 32 bytes before the Port_Table_Offset).

PCISIG Confidential 145


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.3.1.10. VS Table Entry Size / Num VS Table Entries (24h)

Table 4-35: Switch VS Table Sizes

Bit Location Register Description Attributes


7:0 Num_VS_Table_Entries – Returns the number of entries in the VS RO
Table.
15:8 VS_Table_Entry_Size – Returns the size of a VS Table Entry in RO
DWORDs. For the current version of this specification this value must
be 3h or larger (more if some VS supports more than 32 Bridges).
Implementations may use larger values to simplify address arithmetic.
31:16 Reserved RO

The total size of the VS Table (in bytes) is:


Num_VS_Table_Entries * VS_Table_Entry_Size * 4

4.3.1.11. VS Table Offset (28h)

Table 4-36: Switch VS Table Offset

Bit Location Register Description Attributes


2:0 VS Table BIR – Indicates which one of a function’s Base Address RO
registers, located beginning at 10h in Configuration Space, is used to
map the Function’s VS Table into Memory Space.
BIR Value Base Address register
0 BAR0 10h
1 BAR1 14h
2..7 Reserved
For a 64-bit Base Address register, the BIR indicates the lower
DWORD.
31:3 VS Table Offset – Used as an offset from the address contained by RO
one of the Function’s Base Address registers to point to the base of
the VS Table. The lower 3 BIR bits are masked off (set to zero) by
software to form a 32-bit QWORD-aligned offset.

The VS Table starts at the VS Table Offset. The VS Interrupt Bitmap immediately precedes the VS
Table (i.e. it starts 32 bytes before the VS_Table_Offset).

146 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.3.1.12. VS Bridge Table Entry Size / Num VS Bridge Table


Entries per VS (2Ch)

Table 4-37: Switch VS Bridge Table Sizes

Bit Location Register Description Attributes


7:0 Num_Bridge_Table_Entries – Returns the number of Bridge entries RO
in the VS Bridge Table associated with each VS.
15:8 Bridge_Table_Entry_Size – Returns the size of a VS Bridge Table RO
Entry in DWORDs. For the current version of this specification, this
value must be at least 8. Implementations may use larger values to
simplify address arithmetic.
31:16 Reserved RO

The total size (in bytes) of the VS Bridge Table is:


Num_Bridge_Table_Entries * Num_VS_Table_Entries * Bridge_Table_Entry_Size * 4
Note: Num_Bridge_Table_Entries reflects the size of the table and thus is the maximum size across
all Virtual Switches. Not all VS Bridge Table Entries need be present (see Section 4.3.6.1).

4.3.1.13. VS Bridge Table Offset (30h)

Table 4-38: Switch VS Bridge Table Offset

Bit Location Register Description Attributes


2:0 VS Bridge Table BIR – Indicates which one of a function’s Base RO
Address registers, located beginning at 10h in Configuration Space,
is used to map the Function’s VS Bridge Table into Memory Space.
BIR Value Base Address register
0 BAR0 10h
1 BAR1 14h
2..7 Reserved
For a 64-bit Base Address register, the BIR indicates the lower
DWORD.
31:3 VS Bridge Table Offset – Used as an offset from the address RO
contained by one of the Function’s Base Address registers to point to
the base of the VS Bridge Table. The lower 3 BIR bits are masked off
(set to zero) by software to form a 32-bit QWORD-aligned offset.

The VS Bridge Table Entry associated with Bridge 0 of VS N immediately follows the last Bridge
Table Entry associated with VS N-1.

PCISIG Confidential 147


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.3.1.14. Statistics Capability and Control (30h to 3Ch)

Device and Switch Statistics related fields are described in section TBD.

4.3.2. Switch VS Authorization Bitmap


The VS Authorization Bitmap and Backup VS Authorization Bitmap are located in Configuration
space at the byte offset indicated by the VS Authorization Bitmap Offset. These tables must be
located in extended Configuration Space (i.e. the offset must be greater than FFh).
Both tables contain Num_VS_Table_Entries bits. Bit 0 of the first byte corresponds to VS0, bit 1
corresponds to VS1, etc.
A VS is authorized if either the corresponding bit in the VS Authorization Bitmap is Set or if the VS
is the Management VS.
Only transactions from Authorized VSs are allowed to access memory space MR-IOV tables of this
Switch.
Only transactions from Authorized VSs are allowed to access certain fields Config space of the MR-
IOV Extended Capability.
Both Authorization Bitmaps are Read Only Zero if this VS is not Authorized.
Note: The bitmap entry for the Management VS is read/write but its value does not affect
Authorization state of that VS. If the “old” Management VS is to remain Authorized after a
transition to a “new” Management VS, software should Set the “old” VS this before changing
Management VS to ensure that Authorization remains seamless.
The Backup VS Authorization Bitmap is copied to the VS Authorization Bitmap when Watchdog
Timer 1 expires. The Backup VS Authorization Bitmap values have no other effect.

4.3.3. Switch Port Table


The Port Table contains up to 256 Port Table Entries. This table is located in memory starting at a
location determined by the Port Table Offset field in the MR-IOV Capability.

148 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Entry
MR-IOV
Size Capability
# Ports
00h PCIe offset MaxVH
Port
Non-PCIe Port (Management Port)
Port Table Offset Capability
Port Present
Config Space
04h NumVH Port Interrupt Enable
MMIO Space
08h VL Enable
Send PME_Enter_L23 DLLP Link Port
PM_PME Triggers Beacon / Wake#/ MR-IOV Control
Enable
Backup Link Direction Control
Port Enable
Link Direction Control
Port Interrupt Bitmap
0Ch Port Interrupt Pending
Port Status
Port0 VL Negotiation Pending
Table Entry
10h
Link Partner Link Link Trained in
PortN VH FC Partner MR Mode Link
Table Entry MaxVH Partner
Link Partner Type Link Direction
Status Training
Link Partner Authorized
Link Partner Status
Link Partner Protocol Version Detected
Link Partner MaxVL

14h Max Time Slots VL Arb Cap.


Max VL VL Arb Status
VL Arb Table WRR Ref Clock
18h
VL
Arbitration
VL Strict Priority VL Arb Select
Arbitration
Load VL Arb Table
1Ch VL Arb Table Offset

20h MR Error Control and Status


24h MR Error Log 0 (TLP Prefix)
28h MR Error Log 1 (TLP Header) MR Error
2Ch MR Error Log 2 (TLP Header) Logging
30h MR Error Log 3 (TLP Header)
34h MR Error Log 4 (TLP Header)

N+00h PCIe Capabilities PCI Bridge Control


N+04h Device Capabilities
N+08h Device Status Device Control
N+0Ch Link Capabilities
N+10h Link Status Link Control PCI Bridge
N+14h Slot Capabilities and PCI
N+18h Slot Status Slot Control Express
N+1Ch Device Capabilities2 Capabilities
N+20h Device Status2 Device Control2 Registers
N+24h Link Capabilities2
N+28h Link Status2 Link Control2
N+2Ch Slot Capabilities2
N+30h Slot Status2 Slot Control2

Figure 4-7: Switch Port Table

PCISIG Confidential 149


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.3.3.1. Port Capability (00h)

Table 4-39: Switch Port Capability

Bit Location Register Description Attributes


0 Port Present – Indicates that this Port is present on the Switch. RO
1 Non-PCIe Switch Management Port – Indicates that this is Vendor RO
Specific non-PCI Express Port used to manage a switch (see section
3.1.1.3).
Certain bits in the Port Entry are no longer meaningful. TODO:
Specify these bits.
15:2 Reserved RO
23:16 MaxVH – Maximum number of VHs supported on this Device RO
31:24 PCIe Offset – DWORD Offset to the PCIe Capabilities section of the RO
Port Table. For the current version of the specification, this must be at
least 0Eh (i.e. 38h divided by 4).

150 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.3.3.2. Port Control (04h and 08h)

Table 4-40: Switch Port Control1

Bit Location Register Description Attributes


0 Port DL_Up Interrupt Enable – When both this bit and the Port RW
DL_Up Interrupt Pending bit are set, the Port interrupt Status bit
associated with this Port is Set and a Port Interrupt is requested.
1 Port DL_Down Interrupt Enable – When both this bit and the Port RW
DL_Down Interrupt Pending bit are set, the Port Interrupt Status bit
associated with this Port is Set and a Port Interrupt is requested.
2 Port PME Turn Off Interrupt Enable – When both this bit and the RW
Port PME Turn Off Interrupt Pending bit are set, the Port Interrupt
Status bit associated with this Port is Set and a Port Interrupt is
requested.
3 Link Retrain Interrupt Enable – When both this bit and the Link RW
Retrain Interrupt Pending bit are set, the Port Interrupt Status bit
associated with this Port is Set and a Port Interrupt is requested.
4 Beacon / WAKE# Interrupt Enable – When both this bit and the RW
Beacon / WAKE# Interrupt Pending bit are set, the Port Interrupt
Status bit associated with this Port is Set and a Port Interrupt is
requested.
Beacon / WAKE# support is optional. If not supported, this bit is Read
Only Zero.
5 MR Error Uncorrectable Interrupt Enable – When both this bit and RW
the MR Uncorrectable Error Interrupt Pending bit are set, the Port
Interrupt Status bit associated with this port is Set and a Port Interrupt
is requested.
6 MR Advisory Error Interrupt Enable – When both this bit and the RW
MR Advisory Error Interrupt Pending bit are set, the Port Interrupt
Status bit associated with this Port is Set and a Port Interrupt is
requested.
7 Physical Hot-Plug Interrupt Enable – When both this bit and the RW
Physical Hot-Plug Interrupt Pending bit are set, the port Interrupt
Status bit associated with this Port is Set and a Port Interrupt is
requested.
15:8 Reserved RO
23:16 NumVH – Indicates the number of VHs enabled. This value must be RW
less than or equal to MaxVH. The default value of this field is Vendor
Specific.
This value may be used by the Switch to optimize resource usage.
31:24 Reserved RO

PCISIG Confidential 151


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Table 4-41: Switch Port Control2

Bit Location Register Description Attributes


0 Port Enable – Set by MR-PCIM to indicate that it wishes to use the RW
Port.
1 Link MR-IOV Enable – If Set, the Link will attempt to negotiate to use RW
the MR-IOV enhanced protocol. If Clear, the link will not attempt to
use MR-IOV and will thus train the link in Base PCIe mode. Read
Only if the Link is up as indicated by Link Direction Status.
7:2 Reserved RO
9:8 Link Direction Control – Controls how the Link should train. Values RW
are:
0 Upstream Switch Port
1 Downstream Switch Port
2 Cross-Link (if supported)
3 Don’t Train, keep Link down
11:10 Backup Link Direction Control – The Backup Link Direction Control RW
value is copied to the Link Direction Control field when Watchdog
Timer 1 expires. The Backup Link Direction Control value has no
other effect.
12 PM_PME Triggers Beacon / WAKE# – If Set, the automatic RW
triggering of Beacon / WAKE# on reception of a Beacon, WAKE# or
PM_PME message at this port is suppressed. See Section 7.6 for
details.
13 Send PME_Enter_L23 DLLP – If software writes 1b to this bit and Write 1 to
this Port’s Link Direction Status indicates Upstream Switch Port, Send
initiate the PME_Enter_L23 handshake to power down the Link.
Writing 0 to this bit has no effect. If Link Direction Status is not
Upstream Switch Port, writing any value to this bit has no effect.
15:14 Reserved RO
23:16 VL Enable –This bit, when Set, enables a Virtual Link – This bit, RW
when Set, enables a Virtual Link (see note 1 for exceptions). The
Virtual Link is disabled when this bit is cleared.
Software must use the VL Negotiation Pending bit to check whether
the VL negotiation is complete.
Default value of this bit is 1b for the first VL and is 0b for other VLs.
Notes:
1. This bit is hardwired to 1b for the VL0, i.e., writing to this bit has no
effect for VL0.
2. To enable a Virtual Link, the VL Enable bits for that Virtual Link
must be Set in both components on a Link.
3. To disable a Virtual Link, the VL Enable bits for that Virtual Link
must be cleared in both components on a Link.
4. Software must ensure that no traffic is using a Virtual Link at the
time it is disabled.

152 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Bit Location Register Description Attributes


5. Software must fully disable a Virtual Link in both components on a
Link before re-enabling the Virtual Link.
31:24 Reserved RO

PCISIG Confidential 153


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.3.3.3. Port Status (0Ch)

Table 4-42: Switch Port Status

Bit Location Register Description Attributes


0 Port DL_Up Interrupt Pending – Set the Link enters DL_Up. This bit RW1C
is Cleared when software writes 1b.
1 Port DL_Down Interrupt Pending – Set the Link enters DL_Down. RW1C
This bit is Cleared when software writes 1b.
2 Port PME Turn Off Interrupt Pending – Set when all VS Bridges RW1C
mapped to this Port and associated with a non-Authorized VS have
their PME Turn Off State Set indicating the completion of the PME
Turn Off handshake.
This bit is Cleared when software writes 1b.
3 Link Retrain Interrupt Pending – Set when the link retrains. Can be RW1C
used to monitor link health.
4 Beacon / WAKE# Interrupt Pending – Set when the link detects a RW1C
Beacon or WAKE# event. Beacon / WAKE# support is optional. If not
supported, this bit is Read Only Zero. When supported, it is form
factor specific whether Beacon or WAKE# is used.
5 MR Uncorrectable Error Interrupt Pending – Set when the link RW1C
detects an MR Uncorrectable Error and Sets one of the MR Error
Status bits.
6 MR Advisory Error Interrupt Pending – Set when the link detects RW1C
an MR Advisory Error and Sets one of the MR Error Status bits.
7 Physical Hot-Plug Interrupt Pending – Set when the Physical Hot- RW1C
Plug controller indicates that software should be notified.
15:8 Reserved RO
23:16 VL Negotiation Pending – These bits indicate whether Virtual Link RO
Negotiation for some VL is in pending state.
The value of this bit is defined only when the Link is in the DL_Active
state and the Virtual Link is enabled (its VL Enable bit is Set).
When this bit is Set by hardware, it indicates that the VL resource has
not completed the process of negotiation. This bit is cleared by
hardware after the VL negotiation is complete (on exit from the MR
FC_INIT2 state on the VL).
Before using a Virtual Link, software must check whether the VL
Negotiation Pending bits for that Virtual Link are Clear in both
components on the Link.
31:24 Reserved RO

154 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.3.3.4. Link Partner Training Status (10h)

Table 4-43: Switch Link Partner Training Status

Bit Location Register Description Attributes


0 Link Partner Detected – Set to indicate that a PCI Express RO
component was detected at the remote end of the one or more lanes
of the link.
2:1 Link Direction Status – Indicates whether (and how) the link trained. RO
Values are:
0 Upstream Switch Port
1 Downstream Switch Port
2 Reserved
3 Link Down
3 Link Partner is MR – Link was successfully brought up in MR-IOV RO
mode.
7:4 Reserved RO
31:8 MR Init DLLP Bits – These bits were captured during link training RO
from bytes 1 to 3 of the MR Init DLLP that was sent by the Link
Partner. These bits are not meaningful unless Link Partner is MR is
Set. These fields allow MR-PCIM to know some information about the
Link Partner without needing to read Configuration space (which is
not possible if the Link Partner is an MR Root Port since Config
requests cannot flow Upstream). The bits are further described Table
4-44 below. Deleted: Table 4-44

PCISIG Confidential 155


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Table 4-44: Switch Link Partner Training Status – MRInit DLLP Bits

Bit Location Register Description Attributes


15:8 Link Partner MaxVH – Maximum number of VHs that the Link RO
Partner can support.

18:16 Link Partner MaxVL – Maximum number of VLs that the Link Partner RO
can support.
19 Reserved RO
22:20 Link Partner Protocol Version – MR-IOV Protocol version RO
supported by the Link Partner. For this version of the specification,
this is the value 1h.
23 Link Partner was Authorized – If the Link Partner is an MR Switch, RO
this bit indicates that at the time of Link Training the VS associated
with VH0 of this link was allowed to manage the switch (authorization
can be revoked so this may no longer be accurate). This corresponds
to the VS is Authorized bit in the Link Partner’s MR-IOV Capability.
27:24 Link Partner Type – Indicates what kind of PCI Express Device is RO
present as Function 0 of the Link Partner. Encoding is identical to the
Device/Port Type field in the PCI Express Capabilities (Offset 02h,
Bits 7:4).
29:28 Reserved RO
30 Link Partner VH FC – If Set, indicates the Link Partner supports per- RO
VH and Per-VL Flow Control. If Clear, indicates that the Link Partner
supports only per-VL Flow Control.
31 Reserved RO

156 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.3.3.5. VL Arbitration Capability and Status (14h)

Table 4-45: Switch VL Arbitration Capability and Status

Bit Location Register Description Attributes


11:0 VL Arbitration Capability – Indicates the types of VL Arbitration RO
supported by the Port. This field is valid for all Functions that report a
Low Priority Extended VC Count field greater than 0. For all other
Functions, this field must hardwired to 00h.
Each bit location within this field corresponds to a VC Arbitration
Capability defined below. When more than 1 bit in this field is Set, it
indicates that the Port can be configured to provide different VC
arbitration services. Defined bit positions are:
Bit 0 Hardware fixed arbitration scheme, e.g., Round Robin
Bit 1 Weighted Round Robin (WRR) arbitration with 32 phases
Bit 2 WRR arbitration with 64 phases
Bit 3 WRR arbitration with 128 phases
Bit 4 Time-based WRR with 128 phases
Bit 5 WRR Arbitration with 256 phases
Bits 6-10 Reserved
Bit 10 Vendor Defined VL Arbitration Scheme
12 VL Arb Status – This bit indicates the coherency status of the VL RO
Arbitration Table. This bit is valid only when the VL Arbitration Table
is used.
This bit is Set by hardware when any entry of the VL Arbitration Table
is written to by software. This bit is cleared by hardware when
hardware finishes loading values stored in the VL Arbitration Table
after software sets the Load VL Arbitration Table bit.
Default value of this bit is 0b.
13 Reserved RO
15:14 Reference Clock – Indicates the reference clock for Virtual Links that RO
support time-based WRR VL Arbitration. This field is valid only if time-
based WRR is supported.
Defined encodings are:
00b 100 ns reference clock
01b – 11b Reserved
18:16 MaxVL – Indicates the number of VLs supported. The Port supports RO
VL0 through VLMaxVL inclusive.
18:16 MaxVL – Maximum number of VLs that the Port can support. RO
23:19 Reserved RO
30:24 Maximum Time Slots – Indicates the maximum number of time slots RO
(minus one) that are supported when configured for time-based WRR
VL Arbitration. For example, a value 000 0000b in this field indicates

PCISIG Confidential 157


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Bit Location Register Description Attributes


the supported maximum number of time slots is 1 and a value of 111
1111b indicates the supported maximum number of time slot is 128.
This field is valid only when the VL Arbitration Capability field
indicates that time-based WRR VL Arbitration is supported.
31 Reserved RO

158 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.3.3.6. VL Arbitration Control (18h)

Table 4-46: Switch VL Arbitration Control

Bit Location Register Description Attributes


0 Load VL Arbitration Table – When Set, this bit updates the VL RW
Arbitration logic from the VL Arbitration Table. This bit is valid only
when the VL Arbitration Table is used by the selected VL Arbitration
scheme (that is indicated by a Set bit in the VL Arbitration Capability
field selected by VL Arbitration Select).
Software sets this bit to signal hardware to update VL Arbitration logic
with new values stored in VL Arbitration Table; clearing this bit has no
effect. Software uses the VL Arbitration Table Status bit to confirm
whether the new values of VL Arbitration Table are completely
latched by the arbitration logic.
This bit always returns 0b when read.
Default value of this bit is 0b.
7:4 Reserved RO
11:8 VL Arbitration Select – This field configures the Port to provide a RO
particular VL Arbitration service.
The permissible value of this field is a number corresponding to one
of the asserted bits in the VL Arbitration Capability field.
15:12 Reserved RO
23:16 VL Strict Priority Arbitration – This field contains one bit per VL. RW
Bit 0 corresponds to VL0. Bit 7 corresponds to VL7.
When a bit is Set, the corresponding VL is configured to arbitrate as
Strict Priority based on VL number. When a bit is Clear, the
corresponding VL is configured to arbitrate as normal priority (using
the scheme selected by VL Arbitration Select).
Among the VLs configured for strict priority, priority is based on
increasing VL number. VL0 is the lowest strict priority, VL7 is the
highest.
Strict Priority VLs have priority over normal priority VLs.
Behavior is Undefined if a VL configured for Strict Priority is also
included in the VL Arbitration Table.
If a VL is Disabled, the value of the corresponding bit in this field is
ignored.
Default value of this field is 0000 0000b.
31:24 Reserved RO

PCISIG Confidential 159


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.3.3.7. VL Arbitration Table Offset (1Ch)

Table 4-47: Switch VL Arbitration Table Offset

Bit Location Register Description Attributes


1:0 Reserved RO
31:2 VL Arbitration Table Offset – DWORD Offset to the VL Arbitration RO
Table

4.3.3.8. MR Error Status (20h)

Table 4-48: Switch MR Error Status

Bit Location Register Description Attributes


3:0 MR First Error Pointer RO
4 MR Uncorrectable TLP Error Status RW1C
5 MR Global Key Error Status RW1C
14:6 Reserved RsvdZ
15 MR DLLP Error Status – DLLP Errors are not logged RW1C

4.3.3.9. MR Error Control (22h)

Table 4-49: Switch MR Error Control

Bit Location Register Description Attributes


3:0 Reserved RO
4 MR Uncorrectable TLP Error Mask RW
5 MR Global Key Error Mask RW
14:6 Reserved RO
15 MR DLLP Error Mask RW

4.3.3.10. MR Error Log (24h to 34h)

These fields contain the TLP Prefix and TLP Header corresponding to the error described by the
First Error Pointer in the MR Error Status register.
The value of these fields is undefined if the First Error Pointer is zero or points to a bit number that
is not Set.
Headers are not logged and the First Error Pointer is not updated for DLLP Errors.

160 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.3.3.11. PCI Bridge Control (N+00h)

These fields are the PCI Bridge controls that affect the Physical Port.
Table 4-50: PCI Bridge Control

Bit Location Register Description Attributes


0 Secondary Bus Reset – If the Link is a Downstream PCIe Link RW
setting this bit causes the Port to initiate a PCI Express Hot Reset
(TS1 with the Hot Reset bit Set). Clearing this bit causes the Port to
remove the Hot Reset and attempt to bring the link back up. Flow
Control is renegotiated after TS1 style Hot Reset is removed.
If Bridge Control Physical is Set, this bit is identical to the Secondary
Bus Reset bit in the PCI Bridge Control register.
If the Link Direction Status does not equal 1 (i.e. Link is Upstream or
Down) or if the Link Partner is MR bit is 1b, this bit must be 0b.
15:0 Reserved RsvdP

4.3.3.12. PCIe Capability Structure (N+02h)

These fields are the PCI Express controls that affect the Physical Port. Values in various
Configuration Spaces are either Virtual values or map to these values (see the Bridge Controls
Physical bit described in Section 4.3.6.2)
The layout of this structure is similar to the PCI Express Capability dropping the Root Capability,
Control and Status words. All fields are implemented as defined in the PCI Express Specification except
Deleted: Table 4-51
as indicated in Table 4-51.

PCISIG Confidential 161


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Table 4-51: Port PCIe Capability Structure

Register Field(s) Attributes


PCI Express Slot Implemented HwInit. Reflects Physical Hot-Plug.
Capabilities
Device Capabilities Phantom Functions Supported Must be 00b
Device Capabilities Captured Slot Power Limit Value Must be 0h if Link Direction Status is non-zero
Captured Slot Power Value (i.e. Link Down or Downstream Switch Port). If
Base PCIe Link, reflects the value received. If
MR Link reflects the value received on VH0, if
VH0 is Upstream for this port, otherwise 0h.
Device Capabilities Function Level Reset Capability Must be 0b.
Device Control Correctable Error Reporting Not Implemented. Errors are raised and
Enable controlled within a VH.
Non-Fatal Error Reporting
Enable
Fatal Error Reporting Enable
Unsupported Request Reporting
Enable
Device Control Enable Relaxed Ordering Not Implemented. Controlled within a VH.
Device Control Max_Payload_Size Not Implemented. Controlled within a VH.
Device Control Extended Tag Field Enable Not Implemented. Controlled within a VH.
Device Control Phantom Function Enable Must be 0b.
Device Control Enable No Snoop Not Implemented, Controlled within a VH.
Device Control Max_Read_Request_Size Not Implemented. Controlled within a VH.
Device Status Correctable Error Detected Not Implemented. Errors are raised and
Non-Fatal Error Detected controlled within a VH.
Fatal Error Detected
Unsupported Request Detected
Unsupported Request Detected
Device Status Transactions Pending Not Implemented. Meaningful only within a VH.
Link Capabilities Surprise Down Error Reporting Must be 1b.
Capable
Link Capabilities Port Number Not Implemented. In MR-IOV, Port Number is
the index into the Port table.
Slot Capabilities Attention Button Present HwInit. Reflects Physical Hot-Plug capabilities
Power Controller Present
MRL Sensor Present
Attention Indicator Present
Power Indicator Present
Hot-Plug Surprise
Hot-Plug Capable
Electromechanical Interlock
Present
Slot Capabilities Slot Power Limit Value Must be 0h if Link Direction Control is 0 (i.e.
Upstream Switch Port). If MR Link,

162 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Register Field(s) Attributes


Slot Power Limit Scale Set_Slot_Power_Limit messages resulting from
writing this value will be sent on VH0 if VH0 is
Downstream for this port.
Slot Control all fields Reflect Physical Slot / Hot-Plug controls.
Software Notification causes the Physical Hot-
Plug Interrupt Pending bit in the Port Status
register to be Set.
Device Capabilities 2 Completion Timeout Ranges Not Implemented. Implemented within a VH.
Supported
Completion Timeout Disable
Supported
ARI Forwarding Supported
Device Control 2 Completion Timeout value Not Implemented. Implemented within a VH.
Completion Timeout Disable
ARI Forwarding Enable

PCISIG Confidential 163


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.3.3.13. Port Interrupt Status Bitmap (minus 20h)

Entry
MR-IOV
Size Capability
# Ports
Port Table Offset

Config Space
MMIO Space Port 31 Interrupt Status Ports
Port 1 Interrupt Status 0..31
Port Interrupt Bitmap Port 0 Interrupt Status

Port0
Table Entry Port 63 Interrupt Status
Port 33 Interrupt Status Ports
Port 32 Interrupt Status 63..32
PortN
Table Entry

Port 255 Interrupt Status Ports


Port 225 Interrupt Status 255..224
Port 224 Interrupt Status
Note: Port Interrupt Bitmap is always 32 bytes. This size supports the maximum
size MR Switch. Port 0 is always at byte Physical_Port_Table_Offset - 32.
Bits for ports that are not present read as zero. Write 1 to Clear.

Figure 4-8: Port Interrupt Status Bitmap

The Port Interrupt Status bitmap precedes the Port Table. It is always 32 bytes (supporting switches
with a maximum of 256 ports).
Bits in this table are Read Only. A bit is Set to indicate the Port has an interrupt pending and Clear
otherwise. These Interrupt Status bits are cleared either by clearing the appropriate Port Interrupt
Pending bit or by masking the interrupt using the Port Interrupt Enable.
An MSI Interrupt is requested on any zero to one transition of any of these bits.
Bits corresponding to Ports that are not Present are Read Only Zero.

4.3.4. Switch VL Arbitration Table


Switch and Device VL Arbitration tables are identical. See Section 4.3.7 for details.

164 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.3.5. Switch VS Table


The VS Table contains up to 256 VS Table Entries. This table is located in memory starting at a
location determined by the VS Table Offset field in the MR-IOV Capability.

Entry Size # VS
VS Table Offset

Config Space
BAR Memory Space
00h
VS Capability
VS Present
VS Interrupt
Bitmap 04h VS Global Key Value
VS0 VS Global Key Check Enable Bits
Table Entry
VS Suppress Reset Propagation VS Control
VS Interrupt Enable
VSN VS Enable
Table Entry
08h-??h
VS Bridge
BridgeN Interrupt Status
Interrupt
Bridge1 Interrupt Status Status
Bridge0 Interrupt Status

Figure 4-9: VS Table

4.3.5.1. VS Capability and Status (00h)

Table 4-52: Switch VS Capability and Status

Bit Location Register Description Attributes


0 VS Present – Read only bit indicating that the VS is implemented. RO
When VS Enable is zero, this bit may be controlled in a Vendor
Specific mechanism to allow for flexible silicon implementation.
31:1 Reserved RO

PCISIG Confidential 165


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.3.5.2. VS Control (04h)

Table 4-53: Switch VS Capability and Status

Bit Location Register Description Attributes


0 VS Enable – If set, indicates that MR-PCIM is using this VS. When RW
set, the capabilities of the VS may not change. When clear, Vendor
Specific mechanisms may change the capabilities offered by this VS.
If the Switch does not support changing VS capabilities, this bit may
be read only with the value one.
1 VS Interrupt Enable – If Set, the VS Interrupt Summary bit RW
corresponding to this VS will be affected when any of the Bridge N
Interrupt Status bits in this VS are Set. If Clear, the Bridge N Interrupt
Status bits have no effect on the corresponding VS Interrupt
Summary bit.
2 VS Suppress Reset Propagation – If Set, the automatic sending of RW
Hot Reset or Reset DLLPs downstream is suppressed. If Clear,
DL_Down, Hot Reset and/or Reset DLLPs received on the upstream
Bridge of this VS cause downstream Bridges to send Hot Reset.
Suppressing Reset Propagation can be used to ensure that a failure
in the management VH does not prematurely reset the entire MR
Topology.
Suppressing Reset Propagation does not affect TLP discarding. TLPs
destined to or from the upstream Bridge will be discarded as a result
of the upstream Bridge being down or in Hot Reset. Forwarding of
TLPs between downstream ports of the VS is not affected by the
Reset state of the upstream Bridge.
15:3 Reserved RO
27:16 VS Global Key Value – Expected Global Key Value to for TLPs RW
associated with the VS.
This value is inserted in TLPs originated by the VS.
This value is inserted in TLPs entering the VS from a non-MR
Enabled Link.
If enabled, Global Keys for TLPs associated with the VS are checked
against this value.
28 Reserved RO
31:29 VS Global Key Check Enable – Enables checking of the Global Key RW
in the TLP Prefix against the Global Key Value for TLPs associated
with the VS.
This is a three bit field that enables checks at various points in TLP
processing.
Bit 29 enables the entering check. This validates forwarded TLPs as
they are received by the Switch. This error is reported in the Port
where the TLP entered the switch. Implementing this check is
optional. If not implemented, this bit is read only zero.
Bit 30 enables the exiting check. This validates forwarded TLPs as
they are transmitted by the Switch. This error is reported in the Port

166 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Bit Location Register Description Attributes


where the TLP exited the Switch. Implementing this check is optional.
If not implemented, this bit is read only zero.
Bit 31 enables the terminating check. This validates TLPs addressing
the VS and TLPs being forwarded to a Base PCIe Link. This error is
reported in the Port where the TLP entered the Switch. Every VS
must implement this bit.
A Switch may implement the bit 29 check, the bit 30 check, both
checks or neither check. When implemented, a check must be
implemented in every VS.

4.3.5.3. VS Bridge Interrupt Status (08h to ??h)

This field contains one bit per P2P Bridge in the VS. Bit 0 of the first DWORD corresponds to VS
Bridge Table Entry 0 (i.e. the Upstream Bridge). Bit 1 corresponds to the VS Bridge Table Entry 0.
This field contains INT((Num_Bridge_Table_Entries + 31) / 32) DWORDs.
Bits in this field are Set only when the any of the following bits are Set in the VS Bridge Table.
‰ VS Status Changed

‰ Attention Indicator State Changed

‰ Power Indicator State Changed

‰ Power Controller State Changed

Bits in this field are Cleared by clearing the one of these “Changed” bits.
If VS Interrupt Enable is Set and any of the bits in this field are Set, the corresponding VS Interrupt
Summary bitmap bit is Set.

PCISIG Confidential 167


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.3.5.4. VS Interrupt Status Bitmap (minus 20h)

Entry
MR-IOV
Size Capability
# VS
VS Table Offset

Config Space
MMIO Space VS 31 Interrupt Status VS
VS 1 Interrupt Status 0..31
VS Interrupt Bitmap VS 0 Interrupt Status

VS0
Table Entry VS 63 Interrupt Status
VS 33 Interrupt Status VS
VS 32 Interrupt Status 63..32
VSN
Table Entry

VS 255 Interrupt Status VS


VS 225 Interrupt Status 255..224
VS 224 Interrupt Status
Note: VS Interrupt Bitmap is always 32 bytes. This size supports the
maximum size MR Switch. VS 0 is always at byte VS_Table_Offset - 32.
Bits for VSs that are not present are zero. The Bitmap is read only. An
entry is 1 if a Global Key Fault is set or if any BridgeN Interrupt Status
bit is set in the corresponding VS Table Entry.

Figure 4-10: VS Interrupt Status Bitmap

The VS Interrupt Status bitmap precedes the VS Table. It is always 32 bytes (supporting switches
with a maximum of 256 Virtual Switches).
Bits in this table are Read Only. A bit is Set to indicate the VS has an interrupt pending and Clear
otherwise. These Interrupt Status bits are cleared by clearing the VS BridgeN Interrupt bit Pending
bit using the VS Bridge Table or by masking the interrupt using the VS Interrupt Enable.
An MSI Interrupt is requested on any zero to one transition of any of these bits.
Bits corresponding to Virtual Switches that are not Present are Read Only Zero.

4.3.6. Switch VS Bridge Table


The VS Bridge Table contains up to 256 Bridge Table Entries. This table is located in memory
starting at a location determined by the VS Bridge Table Offset field in the MR-IOV Capability.

168 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

00h
Entry Size Num Bridges
Power Controller Changed
VS Bridge Table Offset
Power Indicator Changed
Config Space Attention Indicator Changed Bridge
BAR Memory Space VC Config Changed Capability
PME Turn Off State & Status
Num VC Resources Present
Max Payload Size Supported
Hot-Plug Hardware Present
Bridge Hardware Present
VS0 Bridge0
Table Entry 04h Bridge Port
Bridge Port VHN
Port Mapped to Bridge
VS0 BridgeM VC Config Interrupt Enable
Table Entry
Bridge Controls Physical Link Bridge
Bridge Enable Control
08h
VSN Bridge0 Low Priority Extended VC Count
Table Entry Extended VC Count
Max Payload Size Offered
map map

map map

map map

map map
0Ch VC3 VC2 VC1 VC0 VC ID to
VSN BridgeM
Table Entry 10h VC7 VC6 VC5 VC4 VL Map
14h VC Resource 1 State VC Resource 0 State
18h VC Resource 4 State VC Resource 2 State
1Ch VC Resource 5 State VC Resource 4 State
Read Only
20h VC Resource 7 State TC VC Map VC State
VC Resource 6 VC ID
VC Resource 6 VC Negotiation Pending
VC Resource 6 VC Enabled
24h Physical Slot Number
Slot Power Limit Scale
Slot Power Limit Value
Hot Plug Capable
Hot Plug Suprise
Power Controller Present
Hot Plug
28h “Virtual
Signal Force Signals
Power Reset Virtual
Fault Hot-Plug Interface”
Presence Interrupt
Push Detect Enable
Attention State
Button
Slot Implemented
Power Controller State
Attention Indicator State
Power Indicator State

Figure 4-11: VS Bridge Table


VS Bridge Table Entry 0 corresponds to the upstream P2P Bridge of the VS. This bridge is always
present.

PCISIG Confidential 169


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

VS Bridge Table Entry 1 corresponds to the downstream P2P Bridge located at Device 0, Function
0 on the VS internal bus. VS Bridge Table Entry N (1≤N≤32) corresponds to the downstream P2P
Bridge located at Device N-1, Function 0 on the VS Internal bus. VS Bridge Table Entries above 32
are used for P2P Bridges located at non-zero Function numbers as shown in the following table.
VS Bridge Table Entry N Device Function
1..32 N-1 0
33..64 N-33 1
65..96 N-65 2
97..128 N-97 3
129..160 N-129 4
161..192 N-161 5
193..224 N-193 6
225..256 N-225 7

170 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.3.6.1. VS Bridge Capability and Status (00h)

Table 4-54: Switch VS Bridge Capability and Status

Bit Location Register Description Attributes


0 Bridge Present – Set if the hardware supports a bridge in this slot. RO
The VS Bridge Table has the same number of Table Entries
associated with each VS. This bit allows the table to be sparse and to
be populated in a Vendor Specific manner. This bit is not allowed to
change if VS Enabled is set.
The Upstream Bridge is always present in any enabled VS.
When VS Enable is clear, Vendor Specific mechanisms may change
the value of this field.
1 Hot-Plug Hardware Present – This field indicates whether the P2P RO
Bridge supports Hot-Plug. If clear, Hot-Plug hardware is not present
and the remainder of the “signals interface” bits are read only zero.
This bit is always Clear for the upstream Bridge of a VS.
When VS Enable is clear, Vendor Specific mechanisms may change
the value of this field.
3:2 Reserved RO
6:4 Max Payload Size Supported – Returns the maximum Payload Size RO
supported by the hardware.
7 Reserved RO
10:8 Num VC Resources Hardware Present – Indicates the index of the RO
last VC Resource array structure implemented in the Type 1
Configuration Header associated with this Bridge in the VH. The
value 0 indicates one VC Resource is provided. The value 7 indicates
that all 8 VC Resources are provided.
This value indicates the number that the hardware implements. MR-
PCIM software may offer a lower number to the VH by setting
Extended VC Count.
When VS Enable is clear, Vendor Specific mechanisms may change
the value of this field.
14:11 Reserved RO
15 PME Turn Off State – Set when this bridge completes the PME Turn RO
Off Handshake. Cleared when this bridge sends or receives any TLP
other than PME_Turn_Off or PME_TO_Ack or when this bridge
enters Reset.
The PME Turn Off handshake completes when an Upstream bridge
sends or a Downstream Bridge receives a PME_TO_Ack message.
27:16 Reserved RO
28 Power Controller State Changed – This field indicates to MR-PCIM RW1C
that the Power Controller State has been changed by software in the
VH.
29 Power Indicator State Changed – This field indicates to MR-PCIM RW1C

PCISIG Confidential 171


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Bit Location Register Description Attributes


that the Power Indicator State has been changed by software in the
VH.
30 Attention Indicator State Changed – This field indicates to MR- RW1C
PCIM that the Attention Indicator State has been changed by
software in the VH.
31 VC Config Changed – Indicates that software running in the VH RW1C
changed the VC Configuration of some VC Resource associated with
this Bridge.
This bit is Set if a VC Resource is enabled or disabled. This bit is also
Set if the VC ID or TC to VC Map is changed while the VC is
Enabled.

172 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.3.6.2. VS Bridge Control (04h and 08h)

Table 4-55: Switch VS Bridge Control 1

Bit Location Register Description Attributes


0 Bridge Enable – If set, the associated P2P Bridge is visible in the RW
associated VS and the associated Type 1 Configuration Header is
accessible.
If clear, the associated P2P Bridge is disabled. Accesses by software
in the VS to Memory and Configuration Space of the Bridge as well
as Memory, Configuration and I/O Space of Functions below the
Bridge will return UR.
Clearing Bridge Enable for an Upstream VS Bridge Table entry
blocks access to bridges below the upstream bridge and thus has the
effect of hiding the entire VS from the view of software running in the
VH.
Enabling a VS is a two step process. First enable the VS. This
ensures that Vendor Specific mechanisms will not alter the Bridge
Present bits during the rest of this process. Then configure and
enable specific bridges in the VS starting with downstream bridges
and finishing with the upstream bridge.
If Bridge Present is clear this field is Read Only Zero.
The default value of this field is Vendor Specific.
1 Bridge Controls Physical Link – If Set and this Bridge is mapped to RW
some Port’s VH0, the fields in this Bridge’s Type 1 Header control the
Physical Link, and are not Virtual. Otherwise, the fields this Bridge’s
Type 1 Header are Virtual. The fields involved are all fields from the
PCI Express Capability and selected additional fields from the Type 1
Header (see section 4.3.3.8 for the list of fields affected).
If this bit is Set, the registers in the Type 1 Header are the same as
the registers in the Switch Port Table Entry corresponding to the Port
mapped to this Bridge. Changes to Type 1 Header bits for the bridge
where this bit is Set affects the Type 1 Header and also affects the
equivalent register in the Port Table. Similarly, changes in the Port
Table registers are visible in the Type 1 Header corresponding to the
Bridge where this bit is Set.
By setting this bit, MR-PCIM is ceding control of the physical link to
software running in the associated VH. This is necessary when a
switch is being used as a Base PCIe switch. This may also be useful
for Base PCIe links attached to MR Switches.
This Bridge is mapped to VH0 of some Port if (1) the Port Mapped to
Bridge bit is Set, (2) the Bridge VHN field is 0 and (3) the Bridge Port
field contains a valid Port number.
2 VC Config Interrupt Enable – If Set, the VC Config Changed bit can
trigger an interrupt. If Clear, VC Config Changed will not trigger an
interrupt.
14:3 Reserved RO
15 Port Mapped to Bridge – If Clear, no Port is mapped to this Bridge. RW

PCISIG Confidential 173


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Bit Location Register Description Attributes


The Bridge VHN and Bridge Port fields are not used. Software
operating in the VH sees the Data Link Layer Link Active bit Clear
and link is in the virtual DL_Inactive state.
If Set, a Port is mapped to this Bridge. The Bridge VHN and Bridge
Port fields are used. The virtual Data Link Layer Link Active bit and
the virtual link state track the physical link state of the mapped Port.
Changes to the Virtual Data Link Layer Link Active bit, either through
changes to this bit or changes to the physical link state cause a Data
Link State Change event in the virtual Hot-Plug controller.
The default value of this field is Vendor Specific. If Bridge Enable is
Clear, this field is Read Only Zero.
23:16 Bridge VHN – This field indicates the Port VHN associated with this RW
P2P Bridge. This value must be less or equal to than value of NumVH
in the Port contained in Bridge Port. Hardware ignores the value of
this field if the Port Mapped to Bridge bit is Clear. The default value of
this field is Vendor Specific.
31:24 Bridge Port – This field indicates the Port associated with this P2P RW
Bridge. This value must correspond to an enabled Port. Hardware
ignores the value of this field if the Port Mapped to Bridge bit is Clear.
The default value of this field is Vendor Specific.

Behavior is undefined if a Bridge Port / Bridge VHN combination is simultaneously mapped into
more than one VS Bridge Table entry.

174 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Table 4-56: Switch VS Bridge Control 2

Bit Location Register Description Attributes


3:0 Reserved RO
6:4 Max Payload Size Offered – Indicates Maximum Payload Size RW
offered in the Type 1 Header. This value must be less than or equal
to the Max Payload Size Supported value.
Default value for this field is Vendor Specific.
7 Reserved RO
10:8 Extended VC Count – MR-PCIM must configure this value with the RW
number of VC Resources that are offered to software operating in the
VH. This value will be available as Extended VC Count in the VC
Capability of the associated Type 1 Configuration Header.
This value must be less than or equal to Num VC Resources
Hardware Present. Changing this value while Bridge Enable is set is
undefined. The default value of this field is Vendor Specific.
11 Reserved RO
14:12 Low Priority Extended VC Count – MR-PCIM must configure this RW
value with the value to be provided to software operating in the VH.
This value has no hardware effect. This field’s purpose is to inform
software running in the VH of the relative priority of certain VCs.
This value must be less than or equal to the value set for Extended
VC Count. Changing this value while Bridge Enable is set is
undefined. The default value of this field is Vendor Specific.
31:15 Reserved RO

PCISIG Confidential 175


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.3.6.3. VC ID to VL Map (0Ch and 10h)

These fields contain the Virtual Link to be used for traffic out this P2P Bridge for the indicated VC.
VC to VL mapping is not needed and these fields are read only zero if the MaxVL value is zero for
all Ports of the switch.
Software may not map multiple VCs to the same VL. Specifically, within a single VS Bridge Table
entry, behavior is undefined if multiple enabled VL Map entries contain the same map value.

Table 4-57: Switch VS Bridge VC ID to VL Map 1

Bit Location Register Description Attributes


2:0 VC0 VL Map – Indicates the VL number used for VS traffic labeled RW
VC0 transmitted via this VS Bridge. The default value of this field is
Vendor Specific.
3 Reserved RO
4 VC0 VL Map Enable – Indicates that the VC0 VL Map field contains RW
a valid VL number. The default value of this field is Vendor Specific.
Hardware behavior on the 1b to 0b transition of this bit is undefined.
7:5 Reserved RO
10:8 VC1 VL Map – Indicates the VL number used by VS traffic labeled RW
VC1 transmitted via this VS Bridge. This field is Read Only Zero if
Num VC Resources Hardware Present is 0. The default value of this
field is Vendor Specific.
11 Reserved RO
12 VC1 VL Map Enable – Indicates that the VC1 VL Map field contains RW
a valid VL number. This field is Read Only Zero if Num VC Resources
Hardware Present is 0. The default value of this field is Vendor
Specific.
Hardware behavior on the 1b to 0b transition of this bit is undefined.
15:13 Reserved RO
18:16 VC2 VL Map – Indicates the VL number used by VS traffic labeled RW
VC2 transmitted via this VS Bridge. This field is Read Only Zero if
Num VC Resources Hardware Present is 0 or 1. The default value of
this field is Vendor Specific.
19 Reserved RO
20 VC2 VL Map Enable – Indicates that the VC2 VL Map field contains RW
a valid VL number. This field is Read Only Zero if Num VC Resources
Hardware Present is 0 or 1. The default value of this field is Vendor
Specific.
Hardware behavior on the 1b to 0b transition of this bit is undefined.
23:21 Reserved RO
26:24 VC3 VL Map – Indicates the VL number used by VS traffic labeled RW
VC3 transmitted via this VS Bridge. This field is Read Only Zero if

176 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Bit Location Register Description Attributes


Num VC Resources Hardware Present is less than or equal to 2. The
default value of this field is Vendor Specific.
27 Reserved RO
28 VC3 VL Map Enable – Indicates that the VC3 VL Map field contains RW
a valid VL number. This field is Read Only Zero if Num VC Resources
Hardware Present is less than or equal to 2. The default value of this
field is Vendor Specific.
Hardware behavior on the 1b to 0b transition of this bit is undefined.
31:29 Reserved RO

Table 4-58: Switch VS Bridge VC ID to VL Map 2

Bit Location Register Description Attributes


2:0 VC4 VL Map – Indicates the VL number used by VS traffic labeled RW
VC4 transmitted via this VS Bridge. This field is Read Only Zero if
Num VC Resources Hardware Present is less than or equal to 3. The
default value of this field is Vendor Specific.
3 Reserved RO
4 VC4 VL Map Enable – Indicates that the VC4 VL Map field contains RW
a valid VL number. The default value of this field is Vendor Specific.
Hardware behavior on the 1b to 0b transition of this bit is undefined.
7:5 Reserved RO
10:8 VC5 VL Map – Indicates the VL number used by VS traffic labeled RW
VC5 transmitted via this VS Bridge. This field is Read Only Zero if
Num VC Resources Hardware Present is less than or equal to 4. The
default value of this field is Vendor Specific.
11 Reserved RO
12 VC5 VL Map Enable – Indicates that the VC5 VL Map field contains RW
a valid VL number. This field is Read Only Zero if Num VC Resources
Hardware Present is less than or equal to 4. The default value of this
field is Vendor Specific.
Hardware behavior on the 1b to 0b transition of this bit is undefined.
15:13 Reserved RO
18:16 VC6 VL Map – Indicates the VL number used by VS traffic labeled RW
VC6 transmitted via this VS Bridge. This field is Read Only Zero if
Num VC Resources Hardware Present is less than or equal to 5. The
default value of this field is Vendor Specific.
19 Reserved RO
20 VC6 VL Map Enable – Indicates that the VC6 VL Map field contains RW
a valid VL number. This field is Read Only Zero if Num VC Resources
Hardware Present is less than or equal to 5. The default value of this
field is Vendor Specific.

PCISIG Confidential 177


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Bit Location Register Description Attributes


Hardware behavior on the 1b to 0b transition of this bit is undefined.
23:21 Reserved RO
26:24 VC7 VL Map – Indicates the VL number used by VS traffic labeled RW
VC7 transmitted via this VS Bridge. This field is Read Only Zero if
Num VC Resources Hardware Present is less than or equal to 6. The
default value of this field is Vendor Specific.
27 Reserved RO
28 VC7 VL Map Enable – Indicates that the VC7 VL Map field contains RW
a valid VL number. This field is Read Only Zero if Num VC Resources
Hardware Present is less than or equal to 6. The default value of this
field is Vendor Specific.
Hardware behavior on the 1b to 0b transition of this bit is undefined.
31:29 Reserved RO

4.3.6.4. VC Resource Fields (14h to 20h)

These fields return data from the VC Capability of the associated Type 1 Configuration Header.
They allow MR-PCIM software to track the enabling and mapping of VCs with each VH.
VC Resource fields for resource numbers above Num VC Resource Hardware Present are not
implemented and return 0 when read. VC Resource fields for resource numbers above Extended VC
Count are Undefined.
VC Resource State 0 is located at offset 0Ch; VC Resource 1 is located at offset 0Eh; etc. Fields
Deleted: Table 4-59
within VC Resource State are described in Table 4-59.

178 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Table 4-59: VC Resource State

Bit Location Register Description Attributes


0 VC Enabled – This field tracks the VC Enabled bit set by software RO
operating in the VH. Per the PCI Express specification, VC Resource
0 is always enabled and thus VC Enabled for VC Resource 0 is
always set.
1 VC Negotiation Pending – This field tracks the VC Negotiation RO
Pending bit view in the VH. This bit is Set when VC Enabled is Set
and either no VL has been mapped to this VC (associated VL Map
Enable bit is Clear) or Flow Control negotiation has not completed on
the mapped VH and VL.
3:1 Reserved RO
6:4 VC ID – This field tracks the VC ID field set by software operating in RO
the VH. Per the PCI Express specification, VC ID for VC Resource 0
is always 0.
7 Reserved RO
15:8 TC to VC Map – This field tracks the TC to VC Map field set by RO
software operating in the VH. Per the PCI Express specification, bit 0
of this field is fixed and the remaining bits may be set by software.

4.3.6.5. Hot-Plug Virtual Signals Interface (24h and 28h)

These bits form the “Virtual Signals Interface” of the Virtual Hot-Plug controller. These registers
allow MR-PCIM to indicate to software what Hot-Plug features are supported and to control those
features.
See Chapter 6 Hot Plug for additional details.
Virtual Hot-Plug controller hardware is optional. Presence of hardware is indicated by the Hot Plug
Hardware Present bit. If not hardware is present, the Slot Implemented bit is Read Only zero and
some of the bits in this section are Undefined. Virtual Hot-Plug hardware is only present in
downstream Ports.
If the Bridge Controls Physical Link bit is Set, the Virtual Hot-Plug Signals Interface Registers are
not used and their content is Undefined.

PCISIG Confidential 179


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Table 4-60: Virtual Hot-Plug Signals Interface 1

Bit Location Register Description Attributes


0 Reserved RO
1 Virtual Power Controller Present – This field indicates to software RW
in the VH that the virtual slot has a Power Controller. This is visible to
software in the VH through the Power Controller Present field of the
PCI Express Capabilities.
If Set, a Virtual Power Controller exists in the VH. When software
operating in the VH turns off power to the virtual slot using the Virtual
Hot-Plug Power Controller, a per-VH Reset is automatically triggered
to clean out state in downstream hardware.
If Hot-Plug Hardware Present is 0b, this field is undefined.
4:2 Reserved RO
5 Virtual Hot Plug Surprise – This field indicates to software in the VH RW
that the slot supports Surprise Hot-Plug events. Writing this field
changes the Slot Capabilities field of the PCI Express Capabilities.
The default value of this field is Vendor Specific.
If Hot-Plug Hardware Present is 0b, this field is undefined.
6 Virtual Hot Plug Capable – This field indicates to software in the VH RW
that the slot is Hot-Plug Capable. Writing this field changes the Slot
Capabilities field of the PCI Express Capabilities. The default value of
this field is Vendor Specific.
If Hot-Plug Hardware Present is 0b, this field is undefined.
14:7 Virtual Slot Power Limit Value – This field contains the value written RO
to the PCI Express Capability Slot Power Limit Value register by
software in the VH. There is no other effect.
If Hot-Plug Hardware Present is 0b or if Virtual Slot Implemented
is 0b, this field is undefined.
16:15 Virtual Slot Power Limit Scale – This field contains the value written RO
to the PCI Express Capability Slot Power Limit Scale register by
software in the VH. There is no other effect.
If Hot-Plug Hardware Present is 0b or if Virtual Slot Implemented
is 0b, this field is undefined.
18:17 Reserved RO
31:19 Virtual Slot Number – This field indicates the chassis slot number. It RW
is echoed to software in the VH via the PCI Express Capabilities field.
The default value of this field is Vendor Specific.
If Hot-Plug Hardware Present is 0b, this field is undefined.

180 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Table 4-61: Hot-Plug Signals Interface 2

Bit Location Register Description Attributes


1:0 Virtual Power Indicator State – This field indicates the state of the RO
Power Indicator as set by software in the VH. This corresponds to the
Power Indicator Control field of the PCI Express Capability Slot
Control Register.
Changes to this value set the Power Indicator Changed bit.
If Hot-Plug Hardware Present is 0b, this field is undefined.
3:2 Virtual Attention Indicator State – This field indicates the state of RO
the Attention Indicator as set by software in the VH. This corresponds
to the Attention Indicator Control field of PCI Express Capability Slot
Control Register.
Changes to this value set the Attention Indicator Changed bit.
If Hot-Plug Hardware Present is 0b, this field is undefined.
4 Virtual Power Controller State –This field indicates the state of the RO
Power Controller as set by software in the VH. This corresponds to
the Power Controller Control bit of the Slot Control field of the PCI
Express Capabilities.
Changes to this value set the Power Controller Changed bit.
If Hot-Plug Hardware Present is 0b, this field is undefined.
7:5 Reserved RO
8 Virtual Slot Implemented – This field indicates to software in the VH RW
whether the P2P Bridge contains Hot-Plug support. Writing this field
changes the Slot Implemented bit in the PCI Express Capabilities
field.
If Hot-Plug Hardware Present is 0b, this field is Read Only Zero.
9 Hot Plug Signals Interrupt Enable – If Set, the Power Controller RW
Changed, Power Indicator Changed and Attention Indicator Changed
bits can trigger an interrupt. If Clear, these fields will not trigger an
interrupt.
If Hot-Plug Hardware Present is 0b, this field is Read Only Zero.
16:10 Reserved RO
17 Virtual Presence Detect State – This field indicates that the virtual RW
slot has a card in it. Defined Encodings are:
0b Virtual Slot Empty
1b Card Present in Virtual Slot
The value of this field affects both the Presence Detect State and the
Presence Detect Changed fields in the PCI Express Capability Slot
Status Register.
The Presence Detect State field contains the same value as this field.
The Presence Detect Changed field is Set whenever this field
changes state.
If Hot-Plug Hardware Present is 0b, this field is undefined.

PCISIG Confidential 181


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Bit Location Register Description Attributes


18 Virtual Force Reset – For downstream ports, when this field is Set, RW
the Switch causes the Port / Port VHN mapped to this Bridge to enter
Reset. The effect is the same as if software running the VH set the
Secondary Bus Reset bit in the associated Type 1 Header.
For upstream Ports, this field is Read Only Zero.
For downstream Ports, this field is always present, even if Hot-Plug
Hardware Present is 0b.
23:18 Reserved RO
19 Push Virtual Attention Button – Writing 1 to this field Sets the Write 1 to
Attention Button Pressed bit in the PCI Express Capability Slot Status Trigger,
Register. This simulates the press of the virtual Attention Button in Read Zero
the Virtual Hot-Plug Controller.
Writing 0 to this field has no effect. If Hot-Plug Hardware Present is
0b, writing to this field has no effect. This field read as Zero.
20 Signal Virtual Power Fault – Writing 1 to this field Sets the Power Write 1 to
Fault Detected bit in the PCI Express Capability Slot Status Register. Trigger,
This simulates a Power Fault condition in the Virtual Hot-Plug Read Zero
Controller.
Writing 0 to this field has no effect. If Hot-Plug Hardware Present is
0b, writing to this field has no effect. This field reads as Zero.
31:21 Reserved RO

4.3.7. Misc. Switch Configuration Space Requirements

4.3.7.1. ARI Support

Alternate RID Interpretation (ARI) support must be provided in all downstream PCI-to-PCI bridges
of MRA Switches. Specifically, the ARI Forwarding Supported bit located in the Device Capabilities
2 register must be set and the ARI Forwarding Enable bit located in the Device Control 2 register
must be implemented.

4.3.7.2. BIST (switch)

MR-IOV Switches shall not support BIST.


Implementation Note: In a virtualized environment, BIST would also have to be virtualized. Since
BIST is rarely used and not completely specified it was decided to remove it from MR-IOV.

182 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.4. VL Arbitration Table


The VL Arbitration Table is optional. It is identical in structure to the VC arbitration table in PCI
Express.
Switches and Devices use the same VL Arbitration Table structure. Switches and Devices differ in
the location of the fields used to locate the VL Arbitration table and to configure the arbitration
scheme used. For a switch, fields in the Port Table are used for this purpose. For a Device, fields in
the MR-IOV Capability are used instead.
The VL Arbitration Table is a read-write register array that is used to store the arbitration table for
VL Arbitration. This register array is valid for all Functions when the selected VL Arbitration uses a
WRR table. If it exists, the VL Arbitration Table is located by the VL Arbitration Table Offset field.
The VL Arbitration Table is a register array with fixed-size entries of 4 bits. Figure 4-12 depicts the Deleted: Figure 4-12
table structure of an example VL Arbitration Table with 32 phases. Each 4-bit table entry
corresponds to a phase within a WRR arbitration period. The definition of table entry is depicted in
Table 4-62. The lower 3 bits (bits 0-2) contain the VL value, indicating that the corresponding phase Deleted: Table 4-62
within the WRR arbitration period is assigned to the Virtual Channel indicated by the VL (must be a
valid VL that corresponds to an enabled VL).
The highest bit (bit 3) of the table entry is reserved. The length of the table depends on the selected
Deleted: Table 4-63
VL Arbitration as shown in Table 4-63.
When the VL Arbitration Table is used by the default VL Arbitration method, the default values of
the table entries must be all zero to ensure forward progress for the default VL (with VL of 0).

Figure 4-12: Example VL Arbitration Table with 32 Phases

Table 4-62: Definition of the 4-bit Entries in the VL Arbitration Table

Bit Location Register Description Attributes


2:0 VL RW
3 Reserved RsvdP

PCISIG Confidential 183


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Table 4-63: Length of the VL Arbitration Table

VC Arbitration Select VC Arbitration Table Length (in # of Entries)


001b 32
010b 64
011b 128

4.5. Performance Monitoring and Statistics


Collection
This section describes the registers and tables used to control the optional MR-IOV Performance
Monitoring and Statistics Collection Capability.
Switch and Device usage is identical except where noted.
An overview of the registers and tables associated with this capability is shown in Figure 4-13. Deleted: Figure 4-13
Detailed descriptions are provided in the following sections. The functional behavior is described in
Section 8.3.

184 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Statistics Capability # Stat Blocks # Stat Desc

Statistics Block Start / Busy


Statistics Block 31 Statistics Block 1 Start / Busy
Start / Busy
Statistics Block 0 Start / Busy
Statistics Descriptor Table Statistics Descriptor Table Offset BIR

Statistics Block Table Statistics Block Table Offset BIR

Configuration Space
Memory Space

ss sss sss ss sss sss sss ss sss sss sss ss


Statistics
Descriptor 0 ss sss sss ss sss sss sss ss sss sss sss ss Standard
ss sss sss ss sss sss sss ss sss sss sss ss Counters
ss sss sss ss sss sss sss ss sss sss sss ss
Statistics
Descriptor Max ss sss sss ss sss sss sss ss sss sss sss ss Vendor
Vendor ID Collection ID sss sss ss Counters 1
ss sss sss ss sss sss sss ss sss sss sss ss Vendor
Vendor ID Collection ID sss sss ss Counters 2
Statistics
Block 0

Number of Statistics in Block Statistics


Statistics
Block
Statistics Block Status
Block Control
Statistics Table Offset
Wait Time
Statistics Count Time
Block Max

Stats Select E Stats Width Statistics Style Port #


Statistics 0
Block Filter Enable and Control
Counter or Sampled Value Low

Statistics Max Counter or Sampled Value High


Block

Figure 4-13: Performance Monitoring and Statistics Collection Tables

4.5.1. Configuration Space Fields


The following fields are located in the Switch and Device MR-IOV Capabilities. Starting offsets of
these fields within these two capabilities are different.

PCISIG Confidential 185


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

In addition to the fields described below, each Capability contains Statistics Interrupt Status and
Statistics Interrupt Enable bits.

4.5.1.1. Statistics Capability (+00h)

These fields define the sizes of the Statistics Tables pointed to by the MR-IOV Capability.
If the Performance Monitoring and Statistics Collection Capability is not implemented, these fields
are Read Only Zero.
Table 4-64: Statistics Table Sizes

Bit Location Register Description Attributes


7:0 Number of Statistics Descriptors – Indicates the number of entries RO
in the Statistics Descriptor Table.
Statistics support is optional for all components but strongly
encouraged for MRA Switches. If not supported, this field is Zero.
15:8 Number of Statistics Blocks – Indicates the number of entries in the RO
Statistics Block Table. This value must be less than or equal to 32.
Statistics support is optional for all components but strongly
encouraged for MRA Switched. If not supported, this field is Zero.
31:16 Reserved RO

186 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.5.1.2. Statistics Block Start / Busy (+04h)

These fields contain a bit for each supported Statistics Block.


Table 4-65: Statistics Start / Busy

Bit Location Register Description Attributes


0 Statistics Block 0 Start / Busy – If idle, writes of 1b initiate the RW
statistics collection processing for Statistics Block 0. The behavior of
initiating statistics collection for a Statistics Block that is not idle is
undefined.
If not idle, writes of 0b terminate the statistics collection process for
Statistics Block 0. Termination of the statistics collection process is
not immediate; therefore, following termination of the statistics
collection process, this field should be read to confirm that
termination has completed and that the Statistics Block is idle.
When read, indicates the busy status of the associated Statistics
Block. The value 1b indicates the Statistics Block is busy (i.e. either
waiting or counting). The value 0b indicates the Statistics Block is
idle.
If Number of Statistics Blocks is zero, this field is Read Only Zero.
1 Statistics Block 1 Start / Busy – Controls Statistics Block 1. If RW
Number of Statistics Block is 0 or 1, this field is Read Only Zero.
2 Statistics Block 2 Start / Busy – Controls Statistics Block 2. If RW
Number of Statistics Block is less than or equal to 2, this field is Read
Only Zero.
… … …
31 Statistics Block 31 Start / Busy – Controls Statistics Block 31. If RW
Number of Statistics Block is less than or equal to 31, this field is
Read Only Zero.

PCISIG Confidential 187


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.5.1.3. Statistics Descriptor Table Offset (+08h)

Table 4-66: Statistics Descriptor Table Offset

Bit Location Register Description Attributes


2:0 Statistics Descriptor Table BIR – Indicates which one of a RO
function’s Base Address registers, located beginning at 10h in
Configuration Space, is used to map the Function’s Statistics
Descriptor Table into Memory Space.
BIR Value Base Address register
0 BAR0 10h
1 BAR1 14h
2 BAR2 18h (Device Only)
3 BAR3 1Ch (Device Only)
4 BAR4 20h (Device Only)
5 BAR5 24h (Device Only)
6..7 Reserved
For a 64-bit Base Address register, the BIR indicates the lower
DWORD.
For Switch usage, the values 2..5 are Reserved as well.
31:3 Statistics Descriptor Table Offset – Used as an offset from the RO
address contained by one of the Function’s Base Address registers to
point to the base of the Statistics Descriptor Table. The lower 3 BIR
bits are masked off (set to zero) by software to form a 32-bit
QWORD-aligned offset.

188 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.5.1.4. Statistics Block Table Offset (+0Ch)

Table 4-67: Statistics Block Table Offset

Bit Location Register Description Attributes


2:0 Statistics Block Table BIR – Indicates which one of a function’s RO
Base Address registers, located beginning at 10h in Configuration
Space, is used to map the Function’s Statistics Block Table into
Memory Space.
BIR Value Base Address register
0 BAR0 10h
1 BAR1 14h
2 BAR2 18h (Device Only)
3 BAR3 1Ch (Device Only)
4 BAR4 20h (Device Only)
5 BAR5 24h (Device Only)
6..7 Reserved
For a 64-bit Base Address register, the BIR indicates the lower
DWORD.
For Switch usage, the values 2..5 are Reserved as well.
31:3 Statistics Block Table Offset – Used as an offset from the address RO
contained by one of the Function’s Base Address registers to point to
the base of the Statistics Block Table. The lower 3 BIR bits are
masked off (set to zero) by software to form a 32-bit QWORD-aligned
offset.

4.5.2. Statistics Descriptor Table


The Statistics Descriptor Table describes sets of statistics that may be recordedby Statistics
Counters.
This table contains up to 256 entries. Each entry is 256 bits. The entire Statistics Descriptor Table is
Read Only and constant.
A Descriptor Table Entry describes generate statistics recording capabilities. Each Statistics Counter
contains the index of a Descriptor Table Entry that indicates the statistics recording capabilities
associated with that counter.
Each Descriptor Table entry contains 208 Supported bits (S bits). If an S bit is 1b, then Statistics
Counters that point to that Descriptor Table entry may be configured to record the statistic
associated with the S bit. This configuration is done by setting the Statistics Select field of the
counter to the bit number of the S bit.
S bits 127:0 represent statistics defined in this specification. S bits 167:128 and 231:192 represent
Vendor Specific statistics.

PCISIG Confidential 189


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Table 4-68: Statistics Descriptor Table Entry

Bit Location Register Description Attributes


127:0 Standard S Bits – Indicates S bits whose meaning is defined by this RO
specification.
167:128 Group 1 Vendors Specific S Bits – Indicates S bits whose meaning RO
is defined by the Vendor listed in Group 1 Vendor ID further qualified
by Group 1 Collection ID. If no Group 1 S bits are supported, this field
is hardwired to zero.
175:168 Group 1 Collection ID – Vendor Specific identifier that defines the RO
meaning of bits 167:128. If no Group 1 S bits are supported, this field
is hardwired to zero.
191:176 Group 1 Vendor ID – Indicates the Vendor that defined the meaning RO
of bits 175:128. If no Group 1 S bits are supported, this field is
hardwired to zero.
231:192 Group 2 Vendors Specific S Bits – Indicates S bits whose meaning RO
is defined by the Vendor listed in Group 2 Vendor ID further qualified
by Group 2 Collection ID. If no Group 2 S bits are supported, this field
is hardwired to zero.
239:232 Group 2 Collection ID – Vendor Specific identifier that defines the RO
meaning of bits 231:192. If no Group 2 S bits are supported, this field
is hardwired to zero.
255:240 Group 2 Vendor ID – Indicates the Vendor that defined the meaning RO
of bits 239:192. If no Group 2 S bits are supported, this field is
hardwired to zero.

The following nomenclature is used to describe counters:


‰ Standard statistics are designated CSEL[n] where n is in the range [0..127] (inclusive).

‰ Vendor Specific statistics are designated CSEL[Vendor ID, Collection ID, n]. Vendor ID is
assigned by the PCI SIG. Collection ID a Vendor defined value used to select a set of S bit
definitions. The value n is in the range [0..39] (inclusive) and the corresponding S bit number
is n + 168 (if mapped using Group 1) or n + 192 (if mapped using Group 2).The meaning of a
Vendor Specific statistic is not affected by whether it is mapped using Group 1 or Group 2.
This mechanism allows a single counter to support any mixture of standard events and vendor
defined events from up to two sets of S bit definitions (from either the same or different Vendors).

190 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.5.2.1. Standard Statistics

Table 4-69Error! Reference source not found. contains Standard Statistics defined by this Deleted: Table 4-69
standard. Items marked Sample correspond to sampled values while items marked Count
correspond to counted values. Standard filters are defined in Section 4.5.2.2.
Table 4-69: Standard Statistics

S Bit Description Count / Applicable Filter(s)


Number Sample
0 Transmitted TLPs – Counts non-nullified TLPs Count Optional TLP Filters:
transmitted by the Port. VH, VL, TLP Type
Required to be implemented by at least two
Statistics Counters per port.
1 Transmitted TLP DWORDs – Counts DWORDs of Count Optional TLP Filters:
non-nullified TLPs transmitted by the Port. This VH, VL, TLP Type
includes framing symbols and all bytes sent (i.e.
STP to END inclusive).
Required to be implemented by at least two
Statistics Counters per port.
2 Transmitted IDLE Symbols – Counts the number Count None
of IDLE symbols transmitted by the Port.
Required to be implemented by at least two
Statistics Counters per port.
3 Transmitted DLLPs – Counts the number of DLLPs Count Optional DLLP Filters:
sent by this Port DLLP Type
Required to be implemented by at least two
Statistics Counters per port.
5 Any TLP Blocked by VL –Number of Symbol times Count Optional Credit Filters:
a TLP is blocked from transmission due to lack of VL TLP Type
flow control credits
Required Credit Filters:
VL
6 Any TLP Blocked by {VH, VL} – Number of Count Optional Credit Filters:
Symbol times a TLP is blocked from transmission TLP Type
due to the lack of {VH, VL} flow control credits.
Required Credit Filters:
VH, VL
7 Any TLP Blocked by VL or {VH, VL} – Number of Count Optional Credit Filters:
Symbol times a TLP is blocked from transmission TLP Type
due either to the lack of VL flow control credits or the
Required Credit Filters:
lack of {VH, VL} flow control credits.
VH, VL
9 All TLP Blocked by VL –Number of Symbol times Count Optional Credit Filters:
all TLPs are blocked from transmission due to lack TLP Type
of VL flow control credits (i.e., the lack of VL flow
Required Credit Filters:
control credits results in no TLP being transmitted
VL
on the wire).
10 All TLP Blocked by {VH, VL} – Number of Symbol Count Optional Credit Filters:

PCISIG Confidential 191


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

S Bit Description Count / Applicable Filter(s)


Number Sample
times all TLPs are blocked from transmission due to TLP Type
the lack of {VH, VL} flow control credits (i.e., the lack
Required Credit Filters:
of {VH, VL} flow control credits results in no TLP
VH, VL
being transmitted on the wire).
11 All TLP Blocked by VL or {VH, VL} – Number of Count Optional Credit Filters:
Symbol times all TLPs are blocked from TLP Type
transmission due either to the lack of VL flow control
Required Credit Filters:
credits or the lack of {VH, VL} flow control credits
VH, VL
(i.e., the lack of VL or {VH,VL} flow control credits
results in no TLP being transmitted on the wire).
32 Available VL Transmit Credits – Number of Sample Required Credit Filters:
available transmit credits associated with a VL VL, Credit Type
computed as:
[Field Size]
(CREDIT_LIMIT – CREDITS_CONSUMED) mod 2

33 Available {VH, VL} Transmit Credits – Number of Sample Required Credit Filters:
available transmit credits associated with a {VH, VL} VL, VH, Credit Type
computed as:
[Field Size]
(CREDIT_LIMIT – CREDITS_CONSUMED) mod 2

64 Received TLPs – Counts non-nullified TLPs Count Optional TLP Filters:


received by this port Port. VH, VL, TLP Type
Required to be implemented by at least two
Statistics Counters per port.
65 Received TLP DWORDs – Counts DWORDs of Count Optional TLP Filters:
non-nullified TLPs received by this Port. This VH, VL, TLP Type
includes framing symbols and all bytes received (i.e.
STP to END inclusive).
Required to be implemented by at least two
Statistics Counters per port.
66 Received Idle Symbols – Counts the number of Count None
IDLE symbols received by this Port.
Required to be implemented by at least two
Statistics Counters per port.
67 Received DLLPs – Counts the number of DLLPs Count Optional DLLP Filters:
received from by this Port. DLLP Type
Required to be implemented by at least two
Statistics Counters per port.
96 Available VL Receive Credits – Number of Sample Required Credit Filters:
available receive credits associated with a VL VL, Credit Type
computed as:
[Field Size]
(CREDITS_ALLOCATED – CREDITS_RECEIVED) mod 2

97 Available {VH, VL} Receive Credits – Number of Sample Required Credit Filters:
available receive credits associated with a {VH, VL} VL, VH, Credit Type
computed as:
[Field Size]
(CREDITS_ALLOCATED – CREDITS_RECEIVED) mod 2

192 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.5.2.2. Standard Filters

TLP Filtering consists of three filters: TLP Type, VL and VH.


Table 4-70: TLP Filters

Bits Filter Description


0 TLP Type Completion – If Set, Completion TLPs are not included (i.e., filtered
out)
If TLP Type filtering is not supported, this field is hardwired to zero.
1 TLP Type Non-Posted – If Set, Non-Posted TLPs are not included (i.e., filtered
out).
If TLP Type filtering is not supported, this field is hardwired to zero.
2 TLP Type Posted – If Set, Posted TLPs are not included (i.e., filtered out).
If TLP Type filtering is not supported, this field is hardwired to zero.
15:3 Reserved
23:16 VH VH Value – If VH filtering is enabled, this field contains the VH to
include (i.e., filtered in). If VH filtering is not enabled, this field is
ignored and all VHs are included.
If VH filtering is not supported, this field is hardwired to zero.
26:24 VL VL Value – IF VL filtering is enabled, this field contains the VL to
include (i.e., filtered in). If VL filtering is not enabled, this field is
ignored and all VLs are included.
If VL filtering is not supported, this field is hardwired to zero.
29:27 Reserved
30 VH VH Filter Enable – If Set, VH filtering is enabled. If Cleared, VH
filtering is not enabled and all VHs are included.
If VH filtering is not supported, this field is hardwired to zero.
31 VL VL Filter Enable – If Set, VL filtering is enabled. If Cleared, VL filtering
is not enabled and all VLs are included.
If VL filtering is not supported, this field is hardwired to zero.

PCISIG Confidential 193


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Credit Filtering consists of three filters: Credit Type, VL and VH. Unsupported credit filters are
hardwired to zero.
Table 4-71: Credit Filters

Bits Filter Description


0 Credit Type Completion Header – If Set, Completion Header Credits are not
included (i.e., filtered out).
1 Credit Type Non-Posted Header – If Set, Non-Posted Header Credits are not
included (i.e., filtered out).
2 Credit Type Posted Header – If Set, Posted Header Credits are not included (i.e.,
filtered out).
3 Credit Type Completion Data – If Set, Completion Data Credits are not included
(i.e., filtered out).
4 Credit Type Non-Posted Data – If Set, Non-Posted Data Credits are not included
(i.e., filtered out).
5 Credit Type Posted Data – If Set, Posted Data Credits are not included (i.e.,
filtered out).
15:6 Reserved
23:16 VH VH Value – Contains the VH to include (i.e. filtered in).
26:24 VL VL Value – Contains the VL to include (i.e., filtered in).
31:27 Reserved

194 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

DLLP Filtering is optional and consists of a number of filters. Unsupported DLLP filters are
hardwired to zero.
Table 4-72: DLLP Filters

Bits Filter Description


0 DLLP Type Ack – If Set, Ack DLLPs are not included (i.e., filtered out).
1 DLLP Type Nak – If Set, Nak DLLPs are not included (i.e., filtered out).
2 DLLP Type Reset – If Set, Reset DLLPs are not included (i.e., filtered out).
3 DLLP Type MRInit – If Set, MRInit DLLPs are not included (i.e., filtered out).
4 DLLP Type Flow Control Initialization – If Set, the following DLLPs are not
included (i.e., filtered out):
InitFC1
InitFC2
MRInitFC1_VL
MRInitFC1_VH
MRInitFC2_VL
MRInitFC2_VH
5 DLLP Type Flow Control Update – If Set, UpdateFC and MRUpdateFC DLLPs are
not included (i.e., filtered out).
6 DLLP Type ASPM L1 – If Set, PM_Active_State_Request_L1 and
PM_Request_Ack DLLPs are not included (i.e., filtered out).
7 DLLP Type PM L1 L23 – If Set, PM_Enter_L1 and PM_Enter_L23 DLLPs are not
included (i.e., filtered out).
8 DLLP Type Vendor Specific – If Set, vendor Specific DLLPs are not included (i.e.,
filtered out).
31:9 Reserved

4.5.3. Statistics Block Table


This table contains an entry for each supported Statistics Block. Up to 32 Statistics Blocks may be
supported by a component.

4.5.3.1. Statistics Block Capability (00h)

These fields describe the capabilities of the associated Statistics Block.

PCISIG Confidential 195


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Table 4-73: Statistics Block Capability

Bit Location Register Description Attributes


1:0 Statistics Block Status – Indicates whether the statistics block is RO
busy and if so, whether the block is waiting or counting. Values in this
field correspond to the states of the statistics collection process.
Values are:
00b Idle
10b Waiting
11b Counting
01b Reserved
Note that the upper bit of this field matches the value read from the
Statistics Block Start / Busy register.
15:2 Reserved RO
31:16 Statistics Table Size – Indicates the number of entries contained in RO
this Statistics Table associated with this Statistics Block.

4.5.3.2. Statistics Table Offset (04h)

Table 4-74: Statistics Table Offset

Bit Location Register Description Attributes


3:0 Reserved RO
31:4 Statistics Table Offset – Used as an offset from the address RO
contained by one of the Function’s Base Address registers to point to
the base of the Statistics Table. The Base Address register used is
selected by the Statistics Block BIR located in the MR-IOV Capability.

4.5.3.3. Statistics Wait Time (08h)

Table 4-75: Statistics Wait Time

Bit Location Register Description Attributes


15:0 Waiting Period – Indicates the time, in microseconds, of the waiting RW
period. The waiting period is defined as the time from statistics
collection initiation to the start of the counting period.
A value of zero indicates no waiting period.
31:16 Reserved RO

196 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.5.3.4. Statistics Count Time (0Ch)

Table 4-76: Statistics Count Time

Bit Location Register Description Attributes


23:0 Counting Period – Indicates the time, in microseconds, of the RW
counting period. The counting period is defined as the time during
which selected events are counted and is the time from the end of the
waiting period to the idle period.
Note: A value of all ones (i.e. FFFFFFh) is interpreted as infinite. An
infinite counting period ends when requested by software (i.e. the
corresponding Statistics Block Start bit os Cleared).
A value of zero corresponds to no counting period (this is useful for
sampled values).
31:24 Reserved RO

4.5.4. Statistics Counter Table


Associated with each Statistics Block Table Entry is a Statistics Counter Table. The Statistics
Counter Table contains one entry for each implemented Statistics Counter associated with a
Statistics Block. The registers described in this section form a Statistics Counter Table entry.

PCISIG Confidential 197


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.5.4.1. Statistics Capability and Control (00h)

Table 4-77: Statistics Capability and Control

Bit Location Register Description Attributes


7:0 Port Number – Indicates which port the counter is associated with. RO
For a Device, this field must be zero. For a switch, this value contains
an index into the Port Table.
15:8 Statistics Descriptor Index – Contains the index of the Statistics RO
Descriptor Table entry that describes the statistics recording
capabilities of this counter.
21:16 Counter Width – Indicates the width of the counter. Value is counter RO
width-1 (i.e. a 32 bit counter contains 31 in this field).
If the counter supports any standard statistics, the implemented
counter width must be 32 bits or greater. If the counter supports only
Vendor Specific statistics, the counter width can be any value.
22 Reserved RO
23 Counter Enable – When Set, the counter is enabled. When Cleared, RW
the counter is disabled and certain fields (defined below) are
undefined.
Software should disable unused counters to reduce power
consumption.
This bit is allowed to be hardwired to 1b if the counter is always
enabled.
The default value of this field is Vendor Specific.
31:24 Statistics Select –Determines what statistic software has selected RW
for this counter to record. This field contains the bit number of one of
the Supported bits of the Statistics Descriptor entry selected by
Statistics Descriptor Index.
The counter value is undefined if the value of this field indicates an
unsupported counter (i.e. the value does not correspond to an S bit or
the associated S bit is 0b).
Bits in this field may be Read Only if they are not needed. For
example, a Statistics Descriptor Index that supports S bits 16, 17 and
18 need only implement this field as 000100WWb where W
represents read/write bits. Following this rule to its extreme, if a
Statistics Descriptor Index supports exactly one S bit, this entire field
may be Read Only.
The default value of this field is Vendor Specific. This field is
undefined when the counter is disabled.

198 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

4.5.4.2. Statistics Filter Enable and Control (04h)

Table 4-78: Statistics Filter Enable and Control

Bit Location Register Description Attributes


31:0 Filter Enable and Control – The meaning of this field depends on RW
the Statistics Style and Statistics Select fields. The description of
each counter defines the meaning of the filters associated with it.
This field is undefined when the counter is disabled.
The default value of this field is Vendor Specific.

4.5.4.3. Statistics Counter Low (08h)

Table 4-79: Statistics Counter Low

Bit Location Register Description Attributes


Width:0 Count Value Low – Indicates the lower 32 bits of the counter value. RO
Unused bits, as indicated by Counter Width, are Read Only Zero.
The counter value is undefined when the counter is disabled or Busy
(i.e., in the waiting or counting period).
The default value of this field is Vendor Specific.
31:Width Reserved RO

4.5.4.4. Statistics Counter High (0Ch)

Table 4-80: Statistics Counter High

Bit Location Register Description Attributes


Width-32:0 Count Value High – Indicates the upper bits of the counter value. RO
Unused bits, as indicated by Counter Width, are Read Only Zero.
The counter value is undefined when the counter is disabled or Busy
(i.e., in the waiting or counting period).
The default value of this field is Vendor Specific.
31:Width-32+1 Reserved RO

PCISIG Confidential 199


5
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

5. Error Handling

5.1. PCIe Error mapping to MR


The basic rules for error detection, logging and reporting are unchanged from PCIe. The only
change is which VH(s) should be affected.
Table 5-1: Physical Layer Error List

Error Name Error Type Detecting Agent Action Detecting Agent Action
(PCIe) (MR IOV)
Receiver Error Correctable Receiver (if checking): Receiver:
Send ERR_COR to Root Same as PCIe but send on
Complex. all enabled VHs not in Reset.

Table 5-2: Data Link Layer Error List

Error Name Error Type Detecting Agent Action Detecting Agent Action
(PCIe) (MR IOV)
Bad TLP Correctable Receiver: Receiver:
Send ERR_COR to Root Same as PCIe but send to all
Complex. enabled VHs not in Reset.
Bad DLLP Correctable Receiver: Receiver:
Send ERR_COR to Root Same as PCIe but send on
Complex. all enabled VHs not in Reset.
Replay Timeout Correctable Transmitter: Transmitter:
Send ERR_COR to Root Same as PCIe but send on
Complex. all enabled VHs not in Reset
REPLAY NUM Correctable Transmitter: Transmitter:
Rollover
Send ERR_COR to Root Same as PCIe but send on
Complex. all enabled VHs not in Reset
Data Link Layer Uncorrectable If checking, send Same as PCIe but send to all
Protocol Error (Fatal) ERR_FATAL to Root enabled VHs not in Reset
Complex.

200 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Table 5-3: Transaction Layer Error List


Error Name Error Type Detecting Agent Action Detecting Agent Action (MV
(PCIe) IOV)
Poisoned TLP Uncorrectable Receiver: Receiver:
Received (Non-Fatal)
Send ERR_NONFATAL to Same as PCIe.
Root Complex, or ERR_COR
Error message sent only for
for the Advisory Non-Fatal
affected VH.
Error cases
Log the header for the
Log the header of the
affected VH only.
Poisoned TLP.
ECRC Check Uncorrectable Receiver (if ECRC checking Receiver (if ECRC checking
Failed (Non-Fatal) supported): supported):
Send ERR_NONFATAL to Same as PCIe
Root Complex, or ERR_COR
Error message sent only for
for the Advisory Non-Fatal
affected VH.
Error case
Log the header for the
Log the header of the TLP
affected VH only.
that encounter the ECRC
error.
Unsupported Uncorrectable Request Receiver: Request Receiver:
Request (UR) (Non-Fatal)
Send ERR_NONFATAL to Same as PCIe
Root Complex, or ERR_COR
Error message sent only for
for the Advisory Non-Fatal
affected VH.
Error case
Log the header for the
Log the header of the TLP
affected VH only.
that caused the error.
Completion Uncorrectable Requester: Requester:
Timeout (Non-Fatal)
Send ERR_NONFATAL to Same as PCIe
Root Complex, or ERR_COR
Error message sent only for
for the Advisory Non-Fatal
affected VH.
Error case
Completer Uncorrectable Completer: Completer:
Abort (Non-Fatal)
Send ERR_NONFATAL to Same as PCIe
Root Complex, or ERR_COR
Error message sent only for
for the Advisory Non-Fatal
affected VH.
Error case
Log the header for the
Log the header of the
affected VH only.
Request that encountered the
error.
Unexpected Uncorrectable Receiver: Receiver:
Completion (Non-Fatal)
Send ERR_COR to Root Same as PCIe.
Complex. This is an Advisory
Error message sent only for
Non-Fatal Error
affected VH.
Log the header of the
Log the header for the
Completion that encountered
affected VH only.

PCISIG Confidential 201


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Error Name Error Type Detecting Agent Action Detecting Agent Action (MV
(PCIe) IOV)
the error.
Receiver Uncorrectable Receiver (if checking): Receiver (if checking):
Overflow (Fatal)
Send ERR_FATAL to Root Same as PCIe.
Complex.
Send ERR_FATAL to all Root
Complex that have a VC on
the affected link mapped to
the affected VL.
Flow Control Uncorrectable Receiver (if checking): Receiver (if checking):
Protocol Error (Fatal)
Send ERR_FATAL to Root Same as PCIe.
Complex.
Send ERR_FATAL to all Root
Complex that have a VC on
the affected link mapped to
the affected VL.
Malformed TLP Uncorrectable Receiver: Receiver:
(Fatal)
Send ERR_FATAL to Root Same as PCIe.
Complex.
Error message sent only for
Log the header of the TLP affected VH.
that encountered the error.
Log the header for the
affected VH only.

202 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

5.2. MR Errors
Table 5-4: MR Error List

Error Name Error Type Detecting Agent Action Detecting Agent Action
(PCIe) (MR IOV)
Invalid TLP Uncorrectable N/A Receiver:
Prefix
Signal MR Uncorrectable
TLP Error, discard TLP, do
not update flow control (VL /
VH can’t be trusted).
TLP received Uncorrectable N/A Receiver:
on VH in reset
Signal MR Uncorrectable
(sender of TLP
TLP Error, discard TLP,
has Acked
update flow control normally.
entering Reset)
TLP Prefix with Uncorrectable N/A Receiver:
Global Key
Signal MR Global Key Error,
Mismatch: TLP
discard TLP, update flow
at destination or
control normally
forwarded on
PCIe Link
TLP Prefix with Correctable N/A Receiver:
Global Key
Signal MR Global Key Error,
Mismatch: TLP
forward TLP normally.
being forwarded
on MR Link
TLP Prefix Uncorrectable N/A Receiver:
{VH, VL} that is
Signal MR Uncorrectable
invalid, not
TLP Error, discard TLP, do
enabled or has
not update flow control (VL /
not finished
VH can’t be trusted).
Flow Control
Initialization
MRUpdateFC Correctable N/A Receiver:
for {VH, VL}
Signal MR DLLP Error,
that is invalid,
discard DLLP.
not enabled or
has not finished
Flow Control
Initialization
Invalid VH Correctable N/A Receiver:
Group in Reset
Signal MR DLLP Error,
DLLP
discard DLLP.
Out of range Correctable N/A Receiver:
Assert bit set in
Signal MR DLLP Error,
Reset DLLP
ignore the offending Assert
bit(s) and process remainder

PCISIG Confidential 203


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Error Name Error Type Detecting Agent Action Detecting Agent Action
(PCIe) (MR IOV)
of Reset DLLP normally.

204 PCISIG Confidential


6
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

6. Hot Plug

6.1. MRA Switch


The Virtual Signals Interface registers in the VS Bridge Table provide the MR-PCIM end of the
“Virtual Hot-Plug Signals Interface”. This section describes the connection between those fields and
the PCI Express Slot registers of the corresponding Type 1 Header.
If the Bridge Controls Physical Link field in the VS Bridge Table is Clear, the VS Type 1 Header,
slot registers are virtual and connect to the “Virtual Hot-Plug Signals Interface” described in Section
4.3.6.5.
If the Bridge Controls Physical Link field in the VB Bridge Table is Set, the VS Type 1 Header fields
are physical and the external signals of the switch (if implemented). Specifically, in this mode, the
Type 1 Header slot registers and the slot registers in the Port Table are identical. Changes to one
register affects the other, external events are visible in both, there is a single set of “changed” bits
and write 1 to clear to either register clears then.
The Type 1 header associated with each virtual downstream Switch Port shall contain the PCIe Slot
Capabilities, Control and Status registers. These registers work exactly as in base PCIe as described
in sections 7.8.9, 7.8.10 and 7.8.11.
For each Type 1 Header register, the following sections describe how the various bits interact with
the “Virtual Hot-Plug Signals Interface”. Figures are extracted for reference from the PCI Express
Base Specification.
Changes to the Virtual Hot-Plug Controller occur immediately. If Bridge Controls Physical Link is
Clear, the No Command Complete Support bit in the Type 1 Header is always Set.

PCISIG Confidential 205


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

6.1.1. PCI Express Capability: Slot Capability Register

Figure 6-1: Slot Capabilities Register (PCIe Figure 7-18)


Table 6-1: Virtual Mapping: PCIe Slot Capabilities Register

Bits Field Name If Virtual If Physical


(Bridge Controls (Bridge Controls
Physical Link = 0b) Physical Link = 1b)
0 Attention Button Present Base (HWInit)
1 Power Controller Bit 1 Signals Interface 1, Base (HWInit)
Present Power Controller
Present
2 MRL Sensor Present 0b Base (HWInit)
3 Attention Indicator 1b Base (HWInit)
Present
4 Power Indicator Present 1b Base (HWInit)
5 Hot-Plug Surprise Bit 5 Signals Interface 1, Base (HWInit)
Virtual Hot Plug
Surprise
6 Hot Plug Capable Bit 6 Signals Interface 1, Base (HWInit)
Virtual Hot Plug
Capable
14:7 Slot Power Limit Value Bits 14:7 Signals Base (HWInit)
Interface 1, Virtual Slot

206 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Bits Field Name If Virtual If Physical


(Bridge Controls (Bridge Controls
Physical Link = 0b) Physical Link = 1b)
Power Limit Value
16:15 Slot Power Limit Scale Bits 16:15 Signals Base (HWInit)
Interface 1, Virtual Slot
Power Limit Scale
17 Electromechanical 0b Base (HWInit)
Interlock Present
18 No Command 0b Base (HWInit)
Completed Support
31:19 Physical Slot Number Bits 31:19 Signals Base (HWInit)
Interface 1, Virtual Slot
Number

PCISIG Confidential 207


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

6.1.2. PCI Express Capability: Slot Control Register

Figure 6-2: Slot Control Register (PCIe Figure 7-19)


Table 6-2: Virtual Mapping: PCIe Slot Control Register

Bits Field Name If Virtual If Physical


(Bridge Controls (Bridge Controls
Physical Link = 0b) Physical Link = 1b)
0 Attention Button Implement Base
Pressed Enable
1 Power Fault Detected Implement Base
Enable
2 MRL Sensor Changed 0b Base
Enable
3 Presence Detect Implement Base
Changed Enable
4 Command Completed 0b Base
Interrupt Enable
5 Hot-Plug Interrupt Implement Base
Enable
Note: This bit controls
Interrupts delivered in

208 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Bits Field Name If Virtual If Physical


(Bridge Controls (Bridge Controls
Physical Link = 0b) Physical Link = 1b)
the VH. It is unrelated to
the similarly named Hot-
Plug Signals Interrupt
Enable bit. The latter
governs interrupts
delivered to MR-PCIM
in the Management VS.
7:6 Attention Indicator Bits 3:2, Signals Base
Control Interface 2, Virtual
Attention Indicator State
9:8 Power Indicator Control Bits 1:0, Signals Base
Interface 2, Virtual
Power Indicator State
10 Power Controller Bit 4 Signals Interface 2, Base
Control Virtual Power Controller
State
Turning off Virtual
Power State also
causes the associated
link to see a VH Reset
(this is the same effect
as setting Secondary
Bus Reset).
11 Electromechanical 0b Base
Interlock Control
12 Data Link Layer Implement Base
Changed Enable
15:13 Reserved 0b Base

PCISIG Confidential 209


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

6.1.3. PCI Express Capability: Slot Status Register

Figure 6-3: Slot Status Register (PCIe Figure 7-20)


Table 6-3: Virtual Mapping: PCIe Slot Control Register

Bits Field Name If Virtual If Physical


(Bridge Controls (Bridge Controls
Physical Link = 0b) Physical Link = 1b)
0 Attention Button Set if 1b written to Bit Base
Pressed 24, Signals Interface 2,
Push Virtual Attention
Button
1 Power Fault Detected Set if 1b written to Bit 25 Base
Signals Interface 2,
Signal Virtual Power
Fault
2 MRL Sensor Changed 0b Base
3 Presence Detect Set on transition of Bit Base
Changed 17 Signals Interface 2,
Virtual Presence Detect

210 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Bits Field Name If Virtual If Physical


(Bridge Controls (Bridge Controls
Physical Link = 0b) Physical Link = 1b)
State
4 Command Completed 0b Base
5 MRL Sensor State 0b Base
6 Presence Detect State Bit 17, Signals Base
Interface 2, Virtual
Presence Detect State
7 Electromechanical 0b Base
Interlock Status
8 Data Link Layer State Set on transition of Bit Base
Changed 16 Signals Interface 2,
Virtual Data Link State
15:9 Reserved 0b Base

PCISIG Confidential 211


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

6.1.4. PCI Express Capability: Device Capabilities


Register

Figure 6-4: PCI Express Capabilities Register (PCIe Figure 7-11)


Table 6-4: Virtual Mapping: PCIe Capabilities Register

Bits Field Name If Virtual If Physical


(Bridge Controls (Bridge Controls
Physical Link = 0b) Physical Link = 1b)
3:0 Capability Version Base Base
7:4 Device / Port Type Base Base
8 Slot Implemented Bit 8, Signals Base
Interface 2, Virtual Slot
Implemented
13:9 Interrupt Message Base Base
Number
14 Undefined Base Base
15 Reserved Base Base

6.1.5. Virtual Hot-Plug Signals Interface Registers


Each virtual downstream Switch Port has a Virtual Signals Interface as defined in Section 4.3.6.5.
Registers controlling this hardware are located in the VS Bridge Table. They provide the interface
between MR-PCIM Hot-Plug software and Hot-Plug software running in the VH. This interface
allows MR-PCIM to:
1. Provide configuration and status information the virtual slot registers

212 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

2. Push virtual buttons of the Virtual Hot-Plug controller (i.e. change bits in the Virtual Slot
Status register)
3. Detect Indicator and Control changes made to the Virtual Hot-Plug controller (i.e. detect
certain changes to the Virtual Slot Control and Virtual Slot Capabilities registers)
The following register fields are defined for this function:
Bit Field MR-PCIM View Purpose
Virtual Slot Number Read / Write 1
Virtual Slot Power Limit Scale / Value Read Only 3
Virtual Slot Implemented Read / Write 1
Virtual Hot-Plug Capable Read / Write 1
Virtual Hot-Plug Surprise Read / Write 1
Virtual Data Link State Read / Write 2
Virtual Power Controller State Read Only 3
Virtual Power Controller State Read / Write One to Clear 3
Changed
Virtual Power Controller Present Read / Write 1
Virtual Power Indicator State Read Only 3
Virtual Power Indicator State Changed Read / Write One to Clear 3
Virtual Attention Indicator State Read Only 3
Virtual Attention Indicator State Read / Write One to Clear 3
Changed
Virtual Presence Detect State Read / Write 2
Press Virtual Attention Button Read Zero, Write One to Set 2
Signal Virtual Power Fault Read Zero, Write One to Set 2

These fields are more precisely defined in Section 4.3.6.5.


In addition to the above fields, changes by VH software to the Attention Indicator, Power Indicator
and Power Controller Control (Purpose 3 above) can generate an interrupt to MR-PCIM.

6.1.6. Physical Slot Registers


Each physical slot has an associated set of registers for controlling the Physical Hot Plug Controller.
As in PCIe, the Physical Hot-Plug Controller is optional.
If present, the Physical Hot Plug Controller is managed via the Slot registers of the associated Port
Table entry.
In addition, when a Bridge Controls Physical Link field in the VS Bridge Table is Set, the Physical
Hot-Plug Controller for the Port mapped to that VS Bridge Table entry is also controlled using the
Slot Capability, Control, and Status registers of the associated with the Type 1 Header. This is useful

PCISIG Confidential 213


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

when an MRA Switch is being used as a Base PCIe Switch. It is also useful if MR-PCIM chooses to
delegate authority for managing the Physical Link to software running in a specific VH (typically
because the associated Port is attached to a Base PCIe Device).

6.1.7. Physical Hot-Plug Signals Interface


Hot-Plug support is optional in PCI Express. If provided, some means for communicating Hot-Plug
signals from the Switch is needed. In PCI Express, this mechanism is Vendor Specific.
Physical Hot-Plug remains optional in MRA Switches. The Physical Signals Interface also remains
Vendor Specific.
The presence or absence of Hot-Plug support is indicated using the Slot Implemented bit in the Port
Table. If present, various Hot-Plug signals are optional and their presence is indicated in the Slot
Capabilities Register also in the Port Table.
If the Bridge Controls Physical Link field in some VS Bridge Table entry is Set, the Physical Hot-
Plug information for the Port mapped to that VS Bridge is also reflected in the associated Type 1
Header.

6.2. Virtual Device Migration


Virtual Hot Plug can be used to support Device Migration from one VH to another VH.
To accomplish this, the losing VH gets a Hot Remove sequence. This sequence starts with a push of
the Virtual Attention Button and ends with the disabling of the Virtual Power Controller.
Software could then remap and reset the Virtual Device by:
Ensuring that the gaining VS Bridge is ready to receive the Virtual Device (i.e. the Port / Port VHN
fields are unmapped and the Data Link State is clear)
Clearing the Port / Port VHN fields in the losing VS Bridge Table entry
Setting the Force Reset bit in the gaining VS Bridge Table entry
Mapping the Port / Port VHN fields into the gaining VS Bridge Table Entry.
Software would then send the gaining VH a Hot Add sequence. This starts with the assertion of
Presence Detect, and finished with the Device being enumerated and used by software in the VH.

6.3. Base PCI Express Device Migration


Base PCIe components can also be attached to a Switch and assigned to a single VH. This
assignment can change over time through a Device Migration process.
The sequence of operations is similar to that described in Section 6.2. The exception is that since the
link is operating in PCIe mode, the Secondary Bus Reset field in the Port Table should be used
instead of the Force Reset field from the VS Bridge Table to cleanse Device state for the gaining
VH.

214 PCISIG Confidential


7
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

7. Power Management
MR systems continue to need power management capabilities. Two varieties of power management
are involved:
‰ Virtual power management allows software running in a VH to believe that it has turned off
power to one or more virtual functions.
‰ Actual power management allows MR-PCIM software to control device power.

7.1. Overview
ASPM and PCI-PM are expanded for MR-IOV. MR Components have both virtual and physical D-
states. Slots have virtual and physical power states. Virtual and physical ASPM controls also exist.

7.2. Virtual D-State


Every Function in every VH of every MR Device or Switch has a virtual D-state. This includes BFs,
PFs, VFs, and Functions as well as P2P Bridges.
Virtual D-state is controlled by software operating in each VH using the rules defined in the PCI
Bus Power Management Interface Specification, Revision 1.2, and in the PCI Express Specification.
Component D-state is an extension of the PCIe multi-function component rules. A MR Component
is treated as a multi-function component taking the Functions in all VHs into account. For example,
a shared component may not enter L1 state until all Functions in all VHs of the component have
been written to non-D0 states.
When software in a VH writes a Function to a lower power virtual D-state, the component acts as if
it were in a lower power state. Affects on actual power consumption are vendor specific. Vendors
are encouraged to use this mechanism to reduce power consumption whenever possible.
Because the component may have not powered down, lower virtual D-states may not result in actual
power savings.

7.3. Link Power States


Link power states and transitions are unchanged from PCIe. L1 and L2/L3 handshakes are
unchanged. As in PCIe, link state is affected by the D-state of all Functions in the Component. In
MR, this includes all Functions in all VHs. There are exceptions to deal with D-state of Functions
used for managing the MR topology and D-states of Functions in VHs that are not being used.

PCISIG Confidential 215


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

7.4. Multi-Root ASPM


ASPM is a physical notion and is controlled by MR-PCIM using the Port Table (see Section 4.3.3.12
for details). ASPM may also be controlled by the Type 1 Header in each VH if the Bridge Controls
Physical Link bit is Set in the associated VS Bridge Table entry.
When the Bridge Controls Physical Link bit is clear, virtual ASPM controls are provided for
software compatibility but perform no function. Like PCIe, virtual ASPM controls must support
L0s. L1 ASPM support is optional in PCIe and, for simplicity, virtual L1 ASPM is not supported.

7.5. Slot Clock and Common Clock Configuration


The Slot Clock Configuration and Common Clock Configuration bits in Type 1 Headers associated
with Base PCIe Ports reflect the physical Port.
The Slot Clock Configuration and Common Clock Configuration bits in Type 1 Headers associated
with MR Ports reflect the virtual environment and always indicate a common clock. The underlying
physical configuration is available in the Port table (see Section 4.3.3.12 for details).

7.6. Multi-Root Wake-Up


The wake-up model is mostly unchanged from PCIe. The exception is the need to deal with the
“power inversion” situations. In PCIe, power is turned off starting at the leaves of a PCIe hierarchy
so that power is removed from a root only after power was first removed from all components
below that root. In MR topologies, shared components have virtual power removed while the non-
shared components have actual power removed. This creates situations where a powered on shared
component located below a non-shared component (in some VH), needs to wake-up that non-
shared component. It also creates situations where a powered off component needs to wake-up a
powered on shared component.
Deleted: Figure 7-1
Examples of these scenarios are shown in Figure 7-1 below.

216 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Root 1 on Aux Device + Root on Device + Roots on Everything on Aux


Aux Aux

X X X X X X
X X X X X X
X
1 2 1 2 1 2 1 2
X
X
0 0 0 0

3 3 3 3

X X X
X X X
Figure 7-1: Multi-Root Wake-Up Scenarios
To address this, MR Switches implement a number of power management features:
‰ The ability to send Beacon / WAKE# on receipt of certain PM_PME messages.

‰ The ability to detect beacon / Wake# and convert it into a PM_PME message.

‰ The ability to detect Beacon / WAKE# and signal an interrupt.

7.6.1. PME Triggers Beacon / Wake#


The PME Triggers Beacon / WAKE# bit in the Port Table instructs an MR Switch to issue Beacon
or WAKE# when a PM_PME Message is headed out a Port that is DL_Down.
Scenario A: This mechanism can be used to wake-up Root 1 in Figure 7-1 Scenario A. The MR Deleted: Figure 7-1
Device detects a wake-up event in one of its Functions. The link from the Device to the MR Switch
is in DL_Active and the Device is being used by Root 2. The Device sends a PM_PME Message
upstream in the VH associated with Root 1. The MR Switch Port 3 receives the PM_PME and
because it is associated with Root 1, forwards it to Port 1.
Port 1 is DL_Down and has its PME Triggers Beacon / WAKE# bit Set. This causes the PM_PME
Message to be queued and a wake-up event to be sent to Root 1 (as in PCIe, whether this involves
Beacon or WAKE# is platform specific).

7.6.2. Beacon / Wake# Triggers MSI


The Beacon / WAKE# Triggers MSI bit in the Port Table instructs an MR Switch to generate an
MSI interrupt to MR-PCIM when it detects a Beacon or WAKE# indication from the Port.
Scenario B: This mechanism is used in to power up the Device in Figure 7-1 Scenario B. The Deleted: Figure 7-1
Device detects a wake-up event in one of its Functions. The Device is operating on Aux power and
generates a Beacon or WAKE# to the MR Switch. The MR Switch interrupts MR-PCIM which

PCISIG Confidential 217


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

notices the Beacon / WAKE# event and powers up the Device using the Physical Power Controller
associated with Port 3. Once the link comes up, Scenario A applies.
Scenario C: This mechanism is also used in to power up the Device in Figure 7-1 Scenario C. The Deleted: Figure 7-1
Device is powered up as in Scenario B. When the PM_PME Message is sent, it is addressed to either
Root 1 or Root 2 and Scenario A applies to the addressed Root.

7.6.3. Beacon / WAKE# Triggers Beacon / WAKE#


An MR Switch that is operating on Aux power will broadcast a Beacon or WAKE# indication
received on a Downstream Port to all Authorized Upstream Ports.
Scenario D: This mechanism is used to power up everything in Figure 7-1 Scenario D. The Port Deleted: Figure 7-1
containing MR-PCIM is Authorized. The Device is operating on Aux power and generates a Beacon
or WAKE# to the MR Switch. The MR Switch generates Beacon of WAKE# out the Authorized
Port headed to MR-PCIM. The Root where MR-PCIM is powered on which, in turn, powers on the
MR Switch. After the MR Switch is powered on and configured, Scenario C applies.

7.7. Multi-Root PME Turn Off


MR Devices respond to PME_Turn_Off messages with PME_TO_Ack within each VH. In VH0,
PME_Turn_Off messages will cause a Downstream Component to request a Link transition to
L2/L3 Ready using the PM_Enter_L23 DLLP. In non-zero VHs, PME_Turn_Off has no effect on
link state. Devices must cleanse state in all Functions of a VH before sending PME_TO_Ack. This
ensures that virtual Device state disappears when the device is “powered off”.
MR Switches process PME_Turn_Off messages using PCIe rules within each VS. When a
PME_Turn_Off message is received at the Upstream Bridge of a VS, the message is broadcast to all
Downstream Bridges of the VS. When the downstream sees the downstream component
responding with PME_TO_Ack, the responses are recorded in a scoreboard. When the last
Downstream Bridge responds, a PME_TO_Ack message is sent Upstream.
For Upstream Links operating in Base PCIe mode, MR Switches will request the Link to transition
to L2/L3 Ready using the PM_Enter_L23 DLLP following completion of the PME_Turn_Off /
PME_TO_Ack handshake.
For Upstream Links operating in MR mode, MR Switches will never automatically transition the link
to the L2/L3 Ready state. Instead the entry into L2/L3 Ready state is controlled using the Send
PM_Enter_L23 DLLP bit in the Port Table. See Section 4.3.3.2 for details.
The Port PME Interrupt can be used to determine when all VHs on a Port have completed the
PME Turn Off handshake. Bridges that are not mapped or are associated with an Authorized VS do
not participate in this determination. See Section 4.3.3.3 for details.

7.8. Multi-Root Power Controller


MR-PCIM may enable a virtual power controller for each virtual slot by setting the Virtual Power
Controller Present bit in the Hot-Plug Virtual Signals Interface (see Section 4.3.6.5). If this bit is Set

218 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

and the Bridge Controls Physical bit is Clear, when the VH turns off power using the PCIe Hot-Plug
controller, the VH is sent a reset.
The virtual power controller has no effect on physical power. Turning off virtual power causes the
MR Switch to send Reset DLLPs to cleanse state from the affected Components.
A form factor can allow MR-PCIM to control the physical power to a slot. Like in PCIe doing so is
optional. This control occurs through the Port Table (see Section 4.3.3.12 for details).

7.9. Multi-Root Power Budgeting


Power Budgeting is optional in the PCI Express Specification. Certain form factors may require it.
Power Budgeting is required for Devices supporting Hot-Plug. See Section 7.15 of the PCI Express
Specification.
Power Budgeting remains optional in MR under the same conditions. If provided, power values
reflect the power consumed by the BF and all associated PFs and VFs in all VHs.

PCISIG Confidential 219


8
Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

8. Congestion Management
Exceeding the bandwidth of a link or capacity of a buffer can lead to congestion in a topology. In a
Multi-Root Topology, congestion may affect the performance of unrelated VHs and lead to
Completion Timeout errors. This chapter defines mechanisms for detecting and controlling
congestion in a Multi-Root Topology.

8.1. Overview
There are three possible causes of congestion in a PCIe topology.
‰ A fault in hardware or software configuration of a device in the topology.

‰ A static rate mismatch in the capacity of the path from a component injecting traffic into the
topology (e.g., a Device) and the ultimate destination (e.g., a Root Port). Congestion due to a
static rate mismatch would occur even if the topology were otherwise idle.
‰ Traffic merging of multiple flows, none of which individually suffer from a static rate
mismatch, causing the capacity of an element in the topology to be exceeded
While the causes of congestion are the same in both single and multi-root PCIe topologies, it is
desirable to provide the ability to manage, limit and contain the congestion caused by one VH on
other VHs in the system. The congestion management mechanisms outlined in this section allow
management of congestion that is unique to MR topologies (i.e., due to traffic merging from
different VHs). These mechanisms do not address congestion within a VH since this congestion
would have been present in an equivalent PCIe Base topology.
MR congestion management mechanisms provide the following benefits.
‰ Preserve the behavior of Virtual Channels (VCs) defined by the PCIe Base specification within
a VH.
‰ Allow systems to be constructed where a fault in one VH does not result in errors (e.g.,
Completion Timeouts) in another VH.
‰ Allow systems to be constructed that support forward progress guarantees on a VH or groups
of VHs when congestion exists on an unrelated VH.
‰ Support a wide range of implementation options. At one extreme, they allow creation of MR
Devices through incremental changes to SR Devices. At the other extreme, they allow
implementations that support complete isolation between virtual hierarchies.

220 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

8.2. Congestion Isolation


The Virtual Link (VL) mechanism together with Bypass Queues, a logical queuing structure
associated with a VL at a Receiver, provide the foundation for isolating congestion and supporting
differentiated services within a Multi-Root PCI Express Topology. This section defines these
mechanisms and describes them from a system perspective.

8.2.1. Virtual Links


The Virtual Link (VL) mechanism provides the means to support multiple independent logical data
flows over a single physical Multi-Root PCIe Link. VLs play the same role in an MR topology as
Virtual Channels (VCs) in a PCIe Base topology. VLs are associated with independent fabric
resources (queues/buffers and associated control logic) that are used to move information across an
MR Link with independent flow control. As with VC in a PCIe Base topology, VLs are associated
with a Link and are not end-to-end. Links in an MR topology may implement a different number of
VLs.
Multi-Root Aware Components support one or more Virtual Hierarchies (VHs). As defined by the
PCIe Base specification , each VH associated with a port of an MRA component may support one
to eight VCs. The notation (VHx, VCy) is used to denote VC y associated with VH x. Each VC of a
VH associated with an MRA component port represents an independent logical data flow that in an
MR topology must be mapped to a physical VL resource in order for data to be transferred across a
Link.
(VH, VC)s may be mapped to VLs in a flexible manner (e.g., (VH 0, VC 0), (VH 1, VC 2), and
(VH 2, VC 1) all map to VL 0) or in a VC consistent basis (e.g., (VHany, VC 0) all map to VL 0).
While not a requirement, systems will generally map (VH, VC) data flows with similar QoS
characteristics onto the same VL.
A graphical example of mapping VCs associated with VHs to VLs is illustrated in xx, where an
MRA device connects to an MRA downstream Switch Port. In this example, the MRA device
implements two BFs, each with two virtual channels. One BF in this device has been assigned to
VH A while the other to VH B. In this example VC 0 from both VHs has been mapped to VL 0
(i.e., (A, 0) and (B, 0) both map to VL 0) while VC 1 from VH A maps to VL 1 and VC 1 from VH
B maps to VL 2. The mapping of (VH,VC)s associated with a port of an MRA component onto
VLs is described in section xx.

PCISIG Confidential 221


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Figure 8-1: (VH, VC) to VL Mapping

8.2.1.1. Virtual Link and Virtual Hierarchy Identification

A port of an MRA component may support from one to eight VLs. Virtual Links are uniquely
identified using a Virtual Link Identification (VL ID). There is a fixed one-to-one mapping between
VL IDs and VL resources (e.g., VC resource 0 always has a fixed ID of zero (i.e., VL0)).
A port of an MRA component may support from one to 256 Virtual Hierarchies. Virtual Hierarchies
associated with a port are uniquely identified using a Virtual Hierarchy Number (VHN). VHNs are
Link specific and do not represent a global VH identifier.
Each port is independently configured and managed allowing implementations to vary the number
of VLs and VHs supported per Port based usage model-specific requirements.
MR DLLPs used for flow control accounting contain VHN and VL ID information. Unlike PCIe
Base TLPs that contain only TC and no VC information in the header, the MR TLP prefix tag
contains both VHN and VL ID information simplifying the Flow Control accounting done at each
Port of a Link.
Rules for allocating VL IDs to VL hardware resources associate with a port are as follows:
‰ VL ID assignment must be one-to-one

‰ The same VL ID cannot be assigned to different VL hardware resources within the same Port.

‰ VL ID 0 (VL0) is assigned and fixed to the default VL.

Rules for assigning VH VCs to VL hardware resources associate with a port are as follows:
‰ (VH, VC) assignment must be the same (matching) for the two Ports on both sides of a Link.

‰ (VH0, VC0) is assigned at initialization, but not fixed, to the default VL.

MR-PCIM is responsible for configuring ports on both sides of an MR link in a consistent manner.

222 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

8.2.1.2. VL and VC Configuration

Support for VLs beyond the default VL0 is optional. VL0 is always enabled and while not fixed or
“hardwired,” by default there is a one-to-one mapping between VC and VLs. Therefore, MR
topology initialization may proceed using (VH 0, VC 0) mapped to VL 0 and does not require any
specific hardware or software configuration.
MR-PCIM is responsible for enabling VLs and configuring the mapping of VC associated with VHs
to VLs.
‰ VL0 is always enabled

‰ For VLs 1-7, a VL is considered enabled when the corresponding VL Enable bit in the MR-
IOV Control register has been set to 1b in the BF, and once FC negotiation for that VL has
exited the MRFC_INIT2_VL state.
For VLs 1-7, MR PCIM must use the VL Negotiation Pending bit in the MR-IOV Status register to
determine when a VL is enabled.
Every VC resource of a VH associated with a Port visible to software operating in the VH must be
mapped to an enabled VL. Since the number of VLs supported by components on a Link is
implementation specific, and only one VC of any VH may be mapped to given a VL, the number of
advertised VC resources to software operating in the VH must not exceed the number of enabled
VLs associated with the port.
‰ If a function only implements the default VC0 resource, then no configuration is necessary.

‰ For Devices this is managed through the Base Function as follows:

• For Virtual Hierarchies that do not have a MFVC Capability structure associated with
the Port, then the VC Extended VC Count field in the Function Control 2 register must
be initialized to a value such that the number of VC resources advertised to software
operating in the VH is less than or equal to the number of enabled VLs. As a result of
this configuration, the VC Low Priority Extended VC Count field in the Function
Control 2 register may need to be initialized to a value consistent with the VC Extended
VC Count field.
• For virtual hierarchies that have a MFVC Capability structure associated with a Port, the
MFVC Extended VC Count field in the Function Control 1 register must be initialized
to a value such that the number of VC resources advertised to software operating in the
VH is less than or equal to the number of enabled VLs. As a result of this initialization,
the MFVC Low Priority Extended VC Count field in the Function Control 1 register
may need to be initialized to a value consistent with the MFVC Extended VC Count
field.
‰ For Switches this is managed through the corresponding Virtual Switch (VS) Bridge Table
Entry as follows:

• The VC Extended VC Count field in the Switch VS Bridge Control 2 register must be
initialized to a value such that the number of VC resources advertised to software
operating in the VH is less than or equal to the number of enabled VLs. . As a result of
this configuration, the VC Low Priority Extended VC Count in the Function Control 2

PCISIG Confidential 223


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

register may need to be initialized to a value consistent with the VC Extended VC


Count field.
TLPs generated by a VH but not specifically associated with a BF, PF or VF (e.g., interrupts enabled
in the Device MR-IOV register) require a mapping to an associated VH. Mapping for these TLPs is
performed by the Default VL field in the Device MR-IOV Control register.
TLPs generated by a BF require a mapping to an associated VH. Mapping for these TLPs is
performed by the BF VL field in the Device MR-IOV Control register.

8.2.1.3. VC to VL Mapping

A Virtual Link is established when one or more VC IDs from different VHs are associated with a
physical resources designated by a VL ID.
Components with Ports that implement VLs beyond the default VL must also implement an
associated VC to VL mapping capability. The VC to VL mapping capability is optional for Ports that
implement only the default VL0. Ports that do not have an associated VC to VL mapping capability
must map VC0 from all supported VHs to VL0.
In order to preserve the semantics of a VC defined in the PCIe Base specification, only one VC of a
given VH may be mapped to a VL. The behavior when two or more VCs from the same VH are
mapped to a single VL is undefined.
Given the above requirement, knowledge that a VH is mapped to a VL together with the VC to VL
mapping function associated with that VH is sufficient to determine the (VH,VC). Thus, indicating a
that a VL has a mapped VH, or (VL,VH) is synonymous with specifying the (VH, VC).
VC to VL mapping is from virtual VC IDs to VLs and is controlled as follows:
‰ For Devices, the VC to VL mapping is controlled by fields in the Function VC to VL Map
register associated with the BF of each VH. This map is from Virtual Hierarchy VC resources
as they would have appeared on the Link in an equivalent PCIe Base component. Thus, if
function 0 in the VH contains a MFVC Capability structure then this mapping is from VC IDs
managed by the MFVC Capability structure. Otherwise, this mapping is from VC IDs
managed by the BF VC capability structure.
‰ For Switches the VC to VL mapping is controlled by fields in the VC to VL Map register in a
VS Bridge Table entry.
‰ A VC ID x is mapped to a VL y when the VCx VL Map field has been initialized with a value
of y and the corresponding VCx VL Map Enable bit has been set.
VC to VL mapping may be performed during MR-PCIM initialization or on an as-needed basis as
dictated by software operating in the VH. The PCIe Base specification supports the arbitrary
mapping of VC IDs to VC resources. Thus, in general, MR PCIM has no a priori knowledge of
which VC IDs will be used or how they will be allocated by software operating in the VH. In
systems where MR PCIM possess this knowledge or in which MR PCIM can communicate desired
allocation to software operating in the VH, VC IDs may be mapped to VLs during MR-PCIM
initialization (i.e., prior to the instantiation of software operating in the VH). Otherwise, this
allocation must be performed by MR-PCIM on an as-needed basis as VC IDs are allocated by
software operating in the VH to VC resources.

224 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

The VC ID to VC resource allocation and TC to VC map configuration performed by software


operating in the VH may be determined by MR-PCIM via polling or interrupts as follows:
‰ For Devices this information may be determined by examining the Function Table Resource
Status and Function Table Multi-Function Resource Status registers. Any modification of
these fields can be used to trigger a BF interrupt to MR PCIM.
‰ For Switches this information may be determined by examining the VC Resource Fields
register. Any modification of these fields can be used to trigger a VS Bridge interrupt to MR-
PCIM.
If software operating in the VH enables a VC that has not been mapped to a VL, then the VC
Negotiation Pending bit in the VC Resource Status Register remains set until the VC ID has been
mapped by MR-PCIM to a VL. Once mapped to a VL, the VC Negotiation Pending bit is cleared.
Once mapped, VC to VL mapping may only be modified when a VL and all associated VCs are
disabled. Prior to disabling a VL, VCs mapped to that VL from all virtual hierarchies must be
disabled. The behavior of disabling a VL with mapped and enabled VCs is unspecified.
Mapping a VC to a disabled VL results in the VC Negotiation Pending bit in the VC Resource
Status Register remaining set (i.e., the VC resource does not complete the process of negotiation due
to an invalid mapping). If at some later point the disabled VL becomes enabled, the state of the VC
Negotiation Pending bit is undefined.
Once enabled, MR flow control information is tracked by Receivers and Transmitters for all
configured VLs and (VH, VC)s. The enabling or disabling of a VC within a VH represents a logical
event that does not affect the operation of a physical MR Link or flow control information tracked
on that Link. For example, (VH,VC) flow control continues to be tracked during a hot-reset of a
VH.

8.2.1.4. Arbitration

The objectives of MR arbitration are to provide the following:


‰ Guaranteed forward progress on all supported data flows.

‰ Differentiated service characteristics for data flows associated with a VL within an MR


topology.
‰ The ability to tune bandwidth and end-to-end latency between components in an MR
topology.
MRA components that support multiple VHs require an arbitration mechanism associated with each
supported VL at egress Ports to select the VH from which the next TLP will be transmitted on that
VL. This arbitration is referred to as VH to VL arbitration.
MRA components that support multiple VLs require an arbitration mechanism at egress Ports to
select the VL from which the next TLP will be transmitted on the physical Link. This arbitration is
referred to as VL to Link arbitration or just VL arbitration.

PCISIG Confidential 225


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Figure 8-2: MRA Arbitration Model

Figure tbdError! Reference source not found. illustrates both VH to VL and VL to Link
arbitration associated with an egress Port of an MRA Enabled PCIe Component such as an MR
Root Port, Switch, Bridge, or Device. The component in this example implements two VHs, with
both VHs implementing two VCs. Flows (A, VC 0) and (B, VC 0) are mapped to VL 0 and require
VH to VL arbitration to control the multiplexing of TLPs onto this VL. VLs one and two only have
a single mapped flow and therefore require trivial arbitration. The physical link associated with the
egress Port in this example implements three VLs. VL to Link arbitration is required to control the
multiplexing of VLs onto the physical link.

8.2.1.4.1. VH to VL Arbitration

MRA components that implement multiple Virtual Hierarchies must implement a VH to VL


arbitration mechanism associated with each VL supported by a Port.
VH to VL arbitration is not configurable. All implementations must support a hardwired-fixed VH
to VL arbitration scheme (e.g., Round-Robin) that guarantees forward progress on all VHs
associated with a VL at an egress Port.

8.2.1.4.2. VL to Link Arbitration

MRA components with ports that support multiple Virtual Links must implement a VL to Link
arbitration mechanism.
A component may support a hardwired-fixed arbitration algorithm or optional software configurable
algorithms selection. Support for the optional software configurable arbitration algorithm selection
is indicated by the state of the VL Arbitration Table Present bit in the MR-IOV Capabilities register.
If an implementation does not support software configurable algorithm selection, then it must
implement a hardwired-fixed arbitration scheme (e.g., Round Robin) that guarantees forward
progress on all enabled VLs.

226 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

The remainder of this section describes the behavior and requirements of software configurable VL
arbitration algorithm selection.
VL arbitration algorithm selection is controlled as follows:
‰ For Devices, registers associated with VL arbitration are located in the MR-IOV Capability.

‰ For Switches, registers associated with VL arbitration are located in the Port Table Entry of
the corresponding port.
‰ The VL arbitration table in all components, if present, is located in BAR memory space.

VLs may be partitioned into two priority groups – a lower and an upper group. VLs in the upper
group are arbitrated using strict priority based on VL number while VLs in the lower group are
arbitrated only when there are no packets to process in the upper group. Arbitration within the
lower group may be configured to one of the supported arbitration algorithms described below.
Membership of a VL in the low or high priority group is determined by the state of the
corresponding bit in the VL Strict Priority Arbitration field in the VL Arbitration Control register.
Since the VL Strict Priority Arbitration field represents a bit vector, VLs to group assignment is
flexible and need not be allocated sequentially based on VL ID.
Among VLs configured for strict priority, priority is based on increasing VL number. VL0 has the
lowest priority while VL 7 has the highest.
The arbitration algorithm for VLs in the low priority group is selected by the VL Arbitration Select
field in the VL Arbitration Control register. Arbitration algorithms supported by an implementation
are advertised in the VL Arbitration Capability field in the VL Arbitration Capability and Status
register and may include the following architected schemes.
‰ Hardware-fixed arbitration, e.g. Round-Robin

‰ Weighted Round Robin (WRR) arbitration scheme with 32, 64, 128 or 256 phases

‰ Time-Based Weighted Round Robin (time-based WRR) arbitration scheme with 128 phases

‰ Vendor defined arbitration

This specification establishes a standard framework within which vendors may specify their own
vendor specific arbitration scheme. The definition of vendor-defined arbitration is outside the
scope of this document.
VL arbitration algorithms, e.g., WRR and time-based WRR, operate in a manner analogous to the
schemes defined for VC arbitration in the PCI Express Base specification.

8.2.2. Bypass Queues


Associated with each VL at a receiver are dedicated physical resources (queues/buffers and control
logic) that allow independent traffic flows to proceed on the Port. This section describes the logical
queuing structure associated with each VL at a Receiver. This logical description does not imply or
require a particular implementation and is used solely to clarify requirements.

PCISIG Confidential 227


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Figure 8-3: Logical Queuing Structure Associated with a VL Receiver


As illustrated in Figure 8-3, associated with each VL at a receiver is a Virtual Link Queue (VLQ) and Deleted: Figure 8-3
one or more optional Bypass Queues (BQ). Within a VL, the PCIe ordering rules defined in the
Base specification are maintained with respect to TLPs from a VH across both the VLQ and the
associated BQ (if any). TLPs associated with a VL received on a Link are queued in the VLQ. In the
absence of congestion, TLPs for non-congested VHs are dequeued from the VLQ for receiver
processing.
When a TLP at the head of the VLQ experiences congestion, a free BQ is allocated to the VH
associated with the TLP, if one does not already exist, and the TLP is moved to the BQ. Subsequent
TLPs associated with the VH from the VLQ are moved to the same BQ until the congestion
condition clears and the (now empty) BQ is freed. This scheme allows TLPs associated with a
congested flow to be bypassed and isolates congestion to a VH. TLPs are dequeued from a BQ as
flow control credits of the required type for the VH associated with the BQ become available. An
implementation must ensure that forward progress is guaranteed on all flows.
If at any point the number of congested VHs associated with a VL exceeds the number of BQs
implemented by a Receiver for that VL, then TLPs in the VLQ experience congestion. Thus,
congestion is managed within a VL at the receiver by dynamically mapping congested VH flows
onto BQs. The degree of congestion isolation between flows mapped to the same VL is dictated by
the number of BQs implemented for that VL by the receiver. A receiver that implements only the
required VLQ provides no congestion isolation between data flows mapped to the VL, while a
receiver that implements n optional BQs provides complete congestion isolation for up to n flows.

8.2.3. Flow Control Rules


‰ Components must implement independent Flow Control of all supported VLs.

‰ As in the PCI Express Base specification, flow control is distinguishes between TLP type
(Posted, Non-Posted, and Completion) and Header/Data. Thus, there are six types of tracked
flow control information.
‰ The unit of Flow Control credit is 4 DW for data

228 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

‰ The unit of Flow Control credit for headers is one maximum-size header plus TLP prefix and
TLP digest
‰ Flow Control is initialized autonomously by hardware only for the default virtual link (VL0)
and virtual hierarchy (VH0).
‰ When other Virtual Links are enabled by MR-PCIM, each newly created VL will follow the
flow control initialization protocol.
‰ (VL, VH) credits are negotiated using the flow control initialization protocol outlined in
Section 2.1.2 whenever MR-PCIM increases NumVH.
‰ A Receiver must never cumulatively issue more than 2047 outstanding unused credits to the
Transmitter for data and 127 for header.
‰ If an Infinite VL and (VL, VH) credit advertisement has been made during initialization, no
Flow Control updates are required for that VL following initialization.
‰ A Receiver that advertises non-infinite VH credits must utilize MRUpdateFC DLLPs for that
VL.

• Independent MRUpdateFCs DLLPs are used to track header and data credits associated
with VLs and (VH,VC)s.
• As described in Section tbd, Receivers and Transmitters track independent flow control
information for each VL and for each supported VH. For each VL and (VL, VH), the
six types of flow control information outlined above are tracked.
• A TLP in a Receiver’s VLQ or BQ consumes both VL and corresponding VH credits.
• Both VL and corresponding VH credits are released when a TLP is processed and
removed from the logical queuing structure associated with the Receiver for a VL.
• MRUpdateFC DLLPs are only associated with VH credits related with a VL. VL credits
are implicitly computed using state information and MRUpdateFC DLLPs as outlined in
Section 2.4.1.
‰ A Receiver that advertises non-infinite VL credits and infinite VH credits must utilize PCIe
Base UpdateFC DLLPs for that VL.

• UpdateFC DLLPs are used to explicitly track VL header and data credits.
• The VH gating function is unconditionally satisfied for all Credit Types associated with
that VL.
• Receivers and Transmitters track independent flow control information for each VL. For
each VL, the six types of flow control information are tracked.
• A Receiver that advertises infinite VH credits may only implement a VLQ for that VL.
• VL credits are released when a TLP is processed and removed from the logical queuing
structure associated with the Receiver for a VL

PCISIG Confidential 229


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

‰ If a receiver advertises infinite VH credits on one VL, then it must advertise infinite VH
credits on all VLs.

8.3. Performance Monitoring and Statistics


Collection
Congestion in a Multi-Root Topology may result in traffic associated with one Virtual Hierarchy
affecting the performance of an unrelated Virtual Hierarchy. This section defines a set of optional
performance monitoring and statistics collection capabilities that may be used to diagnose, plan and
tune the performance of a multi-root system.
The objective of standardizing these capabilities is to allow component vendor independent
software to monitor performance and manage congestion. It is not a goal of these capabilities to
monitor or count errors.
This capability is optional for all MR-IOV components, but its implementation is strongly
encouraged for all MRA Switch Ports.
Registers and tables associated with the Performance Monitoring and Statistics CollectionsCollection
capability are described in Section 4.5. This section describes functional behavior.
A component that implements the optional Performance Monitoring and Statistics Collection
Capability is required to implement basic features outlined in this section and in Section 4.5. These
features ensure minimum functionality and interoperability with software that utilizes these
capabilities.
Support for the Performance Monitoring and Statistics Collection Capability is indicated by a non-
zero value in the Statistics Capability register location in the MR-IOV Extended Capability.

Figure 8-4: Statistics Collection Process


Performance statistics are recorded by Statistics Counters and may be captured values and counted
values. Captured values correspond to sampled system state while counted values correspond to the
number of occurrences of a selected event over a counting period. The statistics collection process is
Deleted: Figure 8-4
illustrated in Figure 8-4.

230 PCISIG Confidential


Multi-Root I/O Virtualization and Sharing, Rev. 0.7
mr-iov-07-2007-06-08

Completion of the statistics collection process (i.e., the end of the counting period) may be signaled
via an interrupt.
A component that implements the Performance Monitoring and Statistics Collection Capability
must implement a Statistics Block Table and a Statistics Descriptor Table. These tables are located
in memory space and their location is specified by the Statistics Descriptor Table and Statistics
Block Table registers in the MR-IOV Extended Capability structure.
A set of Statistics Counters that share a common initiation mechanism and statistics collection
process periods is referred to as a Statistics Block. A component that implements the Performance
Monitoring and Statistics Collection Capability may implement one to 32 Statistics Blocks. Each
Statistics Block has an associated Statistics Block Table entry that contains a pointer to a Statistics
Counter Table that holds the Statistics Counters associated with the Statistics Block. The Statistics
Block Table entry also specifies the statistics collection process state (i.e., Idle, Waiting, Counting),
number of entries in the Statistics Counter Table, waiting period, and counting period.
Statistics Counters associated with a Statistics Block may have different characteristics and be
associated with different Ports. Associated with each Statistics Counter is a Statistics Descriptor
Index that points to the Statistics Descriptor Table entry that describes statistics that may be
recorded by the Statistics Counter. The actual statistic recorded by a Statistics Counter is selected by
the Statistics Select field.
Associated with a Statistics Counters is a 64-bit counter that is used to report the captured statistic .
The counter is formed by the Statistics Counter Low and Statistics Counter High registers. While the
field is specified as 64-bits, an implementation is free to implement fewer bits. The number of
implemented counter bits is specified by the Statistics Width field and, for standard counters, must
be 32-bits or greater.
Associated with each Statistics Counter is an optional filter specified by the Statistics Filter Enable
and Control register. Filters allow refinement in a recorded statistic. For example, rather than count
all transmitted TLPs on a Port, a filter may be used to only count transmitted TLPs from a particular
VH on a particular VL. Each entry in a Statistics Descriptor (i.e., an S-bit) defines required filters
that must be implemented and optional filters that may be implemented.
The format of Statistics Descriptors, standard statistics, filters and requirements are specified in
Section 4.5.2.
A component that implements the Performance Monitoring and Statistics Collection Capability
must implement at least one Statistics Block and at least two Statistics Counters per Port. At least
two Statistics Counters per port must implement Statistics Descriptor standard statistics specified as
Deleted: Table 4-69
required in Table 4-69.

PCISIG Confidential 231

You might also like