0% found this document useful (0 votes)
176 views54 pages

MDS Hardware Architecture

The document summarizes an upcoming MDS Storage Summit to take place May 8-11, 2023 in San Jose, California. It provides an agenda for presentations on MDS hardware and architecture, including the MDS modular and fixed switch designs, ASIC components, and product evolution over 20 years. Details are given on MDS modular and fixed switch hardware specifications and building blocks. Fibre Channel speed and encoding standards are also outlined.

Uploaded by

Sanjin Zuhric
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
176 views54 pages

MDS Hardware Architecture

The document summarizes an upcoming MDS Storage Summit to take place May 8-11, 2023 in San Jose, California. It provides an agenda for presentations on MDS hardware and architecture, including the MDS modular and fixed switch designs, ASIC components, and product evolution over 20 years. Details are given on MDS modular and fixed switch hardware specifications and building blocks. Fibre Channel speed and encoding standards are also outlined.

Uploaded by

Sanjin Zuhric
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

MDS Storage Summit 2023

May 8 - 11, San Jose, California

MDS Hardware/Architecture
A 20-year journey of continuous innovation

Harsha Bharadwaj
Principal Engineer
San Jose, California May 8-11, 2023
© 2023 Cisco and/or its affiliates. All rights reserved. MDS, NDFC, SAN Analytics
Agenda
• MDS HW Introduction
• MDS ASIC Architecture
• MDS Product Evolution
• Day in life of a packet inside MDS
• MDS ASIC 15 unique features
• Wrap up

© 2023 Cisco and/or its affiliates. All rights reserved.


MDS Modular – Architected for Longevity

……….

Cisco
MDS 9700 Directors

Cisco
MDS 9500 Directors

1G 2G 4G 8G 16G 32G 64G 128G


© 2023 Cisco and/or its affiliates. All rights reserved.
Fabric Module (aka Xbar) are
Fan Trays
behind Fan Trays (6 nos)
(3 nos, 4 fans each)
MDS Modular (aka Director)
MDS Chassis (Rear Side)

Line Card
(w/ Front Panel multiprotocol
ports)
Supervisor
(w/ Mgmt, Console Port
and USB Ports + NX-OS
+ Arbitration ASIC)

Power Supply
(w/ grid redundancy)

MD S
© 2023 Cisco and/or its affiliates. All rights reserved.
MDS Modular HW Specification
M9706 M9710 M9718

Line Cards 4 8 16

Line-rate 64G Ports 192 384 768

Power Supplies 4 6 12

Chassis Height 9RU 14 RU 26RU

Chassis Depth 32” 34" 35"

Chassis Width 17.3”

Supervisor Modules 2 (1+1)

Fabric Modules 3-6 (Different PID for each switch)

Fan Trays 3 (2+1*)

Airflow Front-to-Back
© 2023 Cisco and/or its affiliates. All rights reserved.
*replacement mode only
Fixed Switch (aka Fabric Switch)
Chassis (1 or 2 RU)

Supervisor, Fabric module (built in)

Front

Mgmt Port
Console Port, Switch Ports (built-in)
USB Port
Fans (modular)

Rear

Power Supply (modular)


© 2023 Cisco and/or its affiliates. All rights reserved.
Fixed Switch HW Specification (64G)
M9124V M9148V M9396V

Line-rate 64G Ports 24 48 96

Chassis Height 1RU 1 RU 2RU

Chassis Depth 20.1” 20.1" 23"

Chassis Width 17.3”

Fans 4 (2+2) 3 (2+1)

Power Supplies 2 (1+1)

Supervisor Module 1 (Built-in)

Fabric Module 1 (Built-in)

Airflow Front-to-Back (or) Back-to-Front

© 2023 Cisco and/or its affiliates. All rights reserved.


Agenda
• MDS HW Introduction
• MDS ASIC Architecture
• MDS Product Evolution
• Day in life of a packet inside MDS
• MDS ASIC 15 unique features
• Wrap up

© 2023 Cisco and/or its affiliates. All rights reserved.


MDS Switch Building Blocks
ARB ASIC Supervisor

XBAR ASIC Fabric Module

XBAR Interfacing FC ASIC XBAR Interfacing

Queuing Reorder/Scheduler
FC Stack

ULP: SCSI/NVMe Analytics Linecard/Port

FC3/4: Protocol Map Read only Forwarding Egress ACL


(FCP, FC-NVMe)
FC2: Framing, Routing
FC-MAC FC-MAC
FC1: Data Enc/Dec

FC0: Physical (SFP)


PHY PHY

© 2023 Cisco and/or its affiliates. All rights reserved. Ingress Port Egress Port
XBAR ARB
QUE

FC MAC: Fibre Channel speeds/features FWD


MAC

• All ports support full line rate (Non-oversubscribed)


• Only single lane FC speeds supported

FC Speed Signaling Rate Data Rate Encoding Modulation FEC BB_SC


1G 1.0625G 100MB/s 8/10b PAM2 No No
2G 2.125G 200MB/s 8/10b PAM2 No No
4G 4.25G 400MB/s 8/10b PAM2 No Yes
8G 8.5G 800MB/s 8/10b PAM2 No Yes
16G 14.025G 1600MB/s 64/66b PAM2 Yes Yes
32G 28.05G 3200MB/s 64/66b PAM2 Yes Yes
64G 28.9x2 (57.8G) 6400MB/s 256/257b PAM4 Yes Yes
128G* 56.1x2 (112.2G) 12800MB/s 256/257b PAM4 Yes Yes

© 2023 Cisco and/or its affiliates. All rights reserved.


*Under Investigation for MDS
XBAR ARB
QUE
FWD

FC MAC: Feature overview MAC

Egress MAC
T11 defined Standard Features MDS additional Features
• 4 FC Speeds • CRC check/drop at line rate
• Link Speed Auto Neg • Internal header for Stats Stats
• B2B crediting (Tx/Rx) metadata
• Frame Encode/Decode • Timestamping and
• FEC Timeout check Crediting Crediting
• VSAN Classification Ingress Egress
• BB_SC credit/frame loss Loopback
recovery • Slow N_Port (NPIV)/VM Buffer Parser, TS, Buffer
Framer,
• FC Encryption/ Decryption detection Frame Check TS Check
• FC traffic generator
• Loopback Decode Encode
• Credit pacing (IRL)
• Frame Stats
Ingress MAC

© 2023 Cisco and/or its affiliates. All rights reserved.


XBAR ARB

FC-MAC: Buffering
QUE
FWD
MAC

• Buffers required due to inherent switching delays


Switch Buffers Logical View
• MDS is an Ingress Buffered Switch built using SRAMs
• Ingress buffers Ingress
Buffer
Ingress Egress Egress
Buffer Buffer
• Flexible carving per-Port/VL basis Buffer
• Link B2B credit accounting (Rx Credits)
• Larger ingress buffers helps go distances and better absorb bursts Switch
• Egress buffers Ingress Ingress Egress Egress
• Minimal buffers Port-A Port-B Port-A Port-B
• Drives ARB credit replenish per egress port

Port0 Port1Port2 Port23


VL0 VL0 VL0 VL0 Default Buffers/Port
VL1 VL1 VL1 VL1
ASIC E_Port B2B F_port B2B
….. default default
…..
…..
…..

…..
24K Buffers 16G* 500 32
VL7 VL7 VL7 VL7 32G 500 32
64G ASIC Ingress buffer partitioning 64G 1000 100
© 2023 Cisco and/or its affiliates. All rights reserved. *except 9148S
XBAR ARB

Forwarding: ACL and FIB


QUE
FWD
MAC

• Fundamental function of a switch


• Frame forwarding uses high speed TCAMs(match) and SRAMs(result/rewrite) for line rate traffic @ Ingress port
• VSAN is a mandatory key in all lookups

ACL – Ingress (HardZoning/IVR/32G Analytics) ACL – Egress (FICON/HBA Diagnostics)


RESULT:
MATCH: Permit/Deny/ToNPU/To MATCH: RESULT:
VSAN/FCHdr/SCSI/NVMe/ SUP VSAN/FCHdr/SCSI/NVMe Permit/Deny/To
Other /Other Ingress
REWRITE:
VSAN/SID/DID/QoS/Other
Ingress: Parallel Lookup
to determine final ACL/FIB Evolution
FIB – Ingress only (FSPF Routes) result/destination for
each frame ASIC IACL# FIB# Deep SCSI/ NVMe
RESULT: header inspection
Next Hop (Dest Port)
MATCH:
16G* 96K 96K No
<VSAN/DID> , Other
REWRITE: 32G 96K 96K No
VSAN/SID/DID/QoS/Other
© 2023 Cisco and/or its affiliates. All rights reserved.
64G 128K 96K Yes
*except 9148S
XBAR ARB

Forwarding: FIB with ECMP/Port Channels


QUE
FWD
MAC

• FSPF: Lowest cost (highest speed) port chosen among possible options
• Host->Target forwarding at MDS1 has 4 cases: 1x64G
MDS2
Host MDS1
Target
Host MDS1 MDS2 Target
1x64G MDS3
1x64G

#1: Physical Port


#2: ECMP (Equal Cost Multipathing)
2x64G
Host MDS1 MDS2 Target 2x64G
MDS2
Host MDS1
#3: PC (Port Channel) Target

2x64G MDS3

• FIB resolves adjacency (next hop port) in the following order:


#4: ECMP and PC
1) ECMP 2) Port Channel 3) Physical Port
© 2023 Cisco and/or its affiliates. All rights reserved.
XBAR ARB

Queuing: Virtual output Queuing (VoQ)


QUE
FWD
MAC

• Ingress port buffers organized on a Virtual output Queue basis for every other Egress port.
• Prevents HoL Blocking

Switch without VOQ


port C VOQ model
ted
Top of queue conges
port C
Frame to port C Top of virtual ted Top of virtual
Frame to port B output queue
Frame to port C
conges
output queue
Frame to port B
þ
Frame to port B
Frame to port C Frame to port C Frame to port B
Frame to port C Frame to port C Frame to port B
Frame to Port B Input queue at port A Input queue at port A
Frame to port B
Frame to port B Crossbar
Input queue at port A
es 3 Port switch Logical View with VoQ
All fram ind Arbiter
ck ed beh
b lo
portC
frame to

VoQ-C VoQ-B VoQ-A VoQ-C VoQ-A VoQ-B Egress Egress Egress


Buffer Buffer Buffer
Ingress Buffer Ingress Buffer Ingress Buffer

LC1 LC2 LC1 LC2

Ingress Ingress Ingress Egress Egress Egress


Port-A Port-B Port-C Port-A Port-B
© 2023 Cisco and/or its affiliates. All rights reserved. Port-C
port C
ted
conges
XBAR ARB

Queuing: QoS/Scheduling
QUE
FWD
MAC

• Ingress:
• Frames classified using incoming FCHeader.Priority or using ZoneQoS/VL policy of the switch

Ccos, QCos
• VoQ (and Arbitration) is per-port, per-QoS level

EISL_UP,
VH_UP,

CS_CTL
• Egress:
• Every QoS Level can be scheduled with either Strict Priority(SP) or DWRR (3 levels - 50%, 30%, 20%)

EISL Header
Ingress Egress

+ FC frame
ACL/FIB Queue
Arbitration
Isola/Thunderbird ingress

Classifier
Vegas Header
+ FC frame

(Optional)
mapping result for Vegas 1 packets)

P0 P1 P2 P3
Ingress Port Egress Port
CCos[1:0] = VH_UP mapping result

QCos demux

DWRR
(RX) (TX)
QCos[1:0] = CCos (Qcos = CCos

CS_CTL rewrite
M

Port Logic
ARB

FC frame
L
(Port,QoS)

H
SP
VLs Evolution

next switch through


rewrited by egress.

won’t be used until


ASIC #QoS levels

VH_UP could get

But the new QoS


32G ASIC with 4 QoS levels
Same VH_UP-to-CCos

Optional rewrite
(VLs)
mapping table

EISL port
16G, 32G 4
Return GID with
CCos to Carb

64G 8
© 2023 Cisco and/or its affiliates. All rights reserved.

der
e
P
t
XBAR ARB

Arbiter: Theory of Operation


QUE
FWD
MAC

• Ensures frame sent from ingress to egress port only when egress port has a buffer to accept it
• One Arbiter per Switch (Centralized ARB) - Dedicated ASIC (in SUP) in Modular, Integrated into the FC ASIC in
Fixed Switches
Arbiter checks buffer available 9 Arbiter accounts that Port-C now has (N) egress buffers
4 at port-C and grants request
0 Port-C informs arbiter it has has (N) egress buffers; Arbiter accounts it
Arbiter 5 Grant given to Port-A. Port-C accounting updates as (N-1) egress buffers
3 Request to Arbiter for Port-C from Port-A
2 Frame is enqueued into VoQ of Port-C 8 Port-C informs Arbiter that it has now a free buffer

Dest=Port-C Dest=Port-C 6 Port-A sends frame to Port-C via Crossbar

VoQ-C VoQ-B VoQ-A VoQ-C VoQ-A VoQ-B Egress Egress Egress


Buffer Buffer Buffer
Arbiter Control Loop
LC2 LC1 LC2
LC1
ARB
Ingress Ingress Ingress Egress Egress Egress
Port-A Port-B Port-C Port-A Port-B Port-C Grant Credit
Request

FC Frame
FC Frame Input Output
1 Frame arrives on Port-A destined to a DID behind Port-C Frame Port Frame Port Frame
© 2023 Cisco and/or its affiliates. All rights reserved.
7 Frame transmitted out of Port-C
XBAR ARB
QUE
FWD

Crossbar Overview MAC

Crossbar Switch
• Performs the port-to-port switching function
Control and
• Establishes a temporary connection between input and output port for Scheduling
duration of the frame transmit Ingress

• Frames are transmitted once connection made and path available Crossbar

• Crossbars have a 3x internal speedup that make them non-blocking


• Crossbar maybe external (MDS 97xx, MDS 9396S) or integrated into ASIC
(MDS 9250i and MDS 9148S)
Egress

© 2023 Cisco and/or its affiliates. All rights reserved.


XBAR ARB
QUE

MDS Modular Multi-stage Crossbar FWD


MAC

• MDS 97xx is a 3-stage crossbar switch fabric


• Every FC frame has a FPoE (Fabric Port of Exit) based on Egress Port
• Flow control between the stages
Fabric Modules (FAB1)
Fabric Fabric Fabric Fabric Fabric Fabric
2nd stage ASIC
Fabric
ASIC
Fabric
ASIC
Fabric
ASIC
Fabric
ASIC
Fabric
ASIC
Fabric
ASIC ASIC ASIC ASIC ASIC ASIC
1 2 3 4 5 6

256G

1.5T
Fabric ASIC Fabric ASIC Fabric ASIC Fabric ASIC

1st stage Ingress LC (32G) Egress LC (32G) 3rd stage

© 2023 Cisco and/or its affiliates. All rights reserved.


Agenda
• MDS HW Introduction
• MDS ASIC Architecture
• MDS Product Evolution
• Day in life of a packet inside MDS
• MDS ASIC 15 unique features
• Wrap up

© 2023 Cisco and/or its affiliates. All rights reserved.


MDS Architecture Layout
Modular Fixed
Midplane
Arb ASIC SUP (Active) CPU
NX-OS NX-OS System
XBAR EOBC Kickstart
(Control + Mgmt Plane)
(Kernel)
ASIC
ARB*SwitchingXBAR*
ASIC
NPU CPU
Arb ASIC SUP (Standby) CPU (data plane)
. Switching ASIC
(data plane)
.
NX-OS
EOBC Kickstart
NX-OS System
(Control + Mgmt Plane)
(Kernel)

. NX-OS NX-OS
Kickstart (Control Plane
. (Kernel) +Drivers)
Switch
XBAR Switching ASIC
ASIC (data plane)
Switching ASIC
CPU/NPU
(data plane) Linecards
EOBC NX-OS
Kernel NX-OS (Drivers)

© 2023 Cisco and/or its affiliates. All rights reserved.


* In some architectures (eg: 2RU) ARB and XBAR are separate ASICs
MDS Modular Switch evolution <=8GFC >=16GFC

MDS 9506/9509 MDS 9513 MDS 9706/9710/9718

XBAR
XBAR ARB ARB ARB

Supervisor Supervisor Supervisor

XBAR XBAR

Linecard
…… Linecard

Linecard …… Linecard Linecard …… Linecard

Active/Active XBAR N+1 Redundant XBAR Modules


50% less BW on XBAR failure Same bandwidth upon fabric module failure
Easy expansion for linecards with higher port speeds
© 2023 Cisco and/or its affiliates. All rights reserved.
Crossbar BW upgrade without upgrading SUP
*9718 uses SUP1E a beefy version of SUP1 (Eg: 16G RAM)
** Not enabled yet

MDS Modular SUP evolution

SUP 2/2A SUP1* SUP4


MDS 95xx MDS 9706/9710 MDS9706/9710/9718
Width Full Half Half
Linecard Speeds <=8G 16/32G 16/32/64G
(Arb capacity)
Memory 2G 8G DDR3 16G DDR4
CPU Cores 1 4 8
Mgmt Port (Ethernet) 1x10/100/1000 1x10/100/1000 1x10/100/1000 and
1x1/10GE**
External Slots 2xUSB2.0 2xUSB2.0 2xUSB3.0
Power Typical/Max 126W 110/190W 100/120W
© 2023 Cisco and/or its affiliates. All rights reserved.
MDS Modular Fabric (Xbar) evolution

FAB MDS 9513 FAB1 MDS 9700 FAB3 MDS9700


Max No of FABs 2 6 6
BW per LC (with 6 FAB) 256G 1.5T 3T
Linecards supported <=8G 16G,32G 32G,64G FC
Power Typical (W) 63 135 (9710) 135 (9710)

*BW Per Slot = Add operating speed of all ports in the Linecard
No of Fabric FC Front Panel BW per Slot* FC Front Panel BW per Slot*
Modules FAB-1 FAB-3
1 256 Gbps 512 Gbps
2 512 Gbps 1024 Gbps

32G Linecard 3 768 Gbps 1536 Gbps


Fully populated 64G Linecard
4 1024 Gbps 2048 Gbps Fully populated

© 2023 Cisco and/or its affiliates. All rights reserved.


5 1280 Gbps 2560 Gbps
6 1536 Gbps 3072 Gbps
MDS Modular Linecard (48P) architecture evolution
16G 32G 64G
cARB- cARB-
cARB-16
32/64 32/64
SUP1 SUP1/4 SUP4

XBAR-16 or
XBAR-16 FAB1 XBAR 32/64 FAB1/3 XBAR-32/64 FAB 3
x6 x6 x6

XBAR-16 Packet XBAR- lARB-


x1 Memory x 6 lARB-16 32x2 32 XBAR-64 lARB
Pkt Mem

Pkt Mem Pkt Mem Pkt Mem 48x64G


8x16G 8x16G 8x16G 16x32G 16x32G 16x32G (Dual Die)

Linecard Linecard Linecard

Consolidation of functions with every new generation FC ASIC


© 2023 Cisco and/or its affiliates. All rights reserved. cARB=Centralized Arbiter
lARB=Local Arbiter (Aggregator)
MDS Switching ASIC capability evolution
ASIC Capability 16G ASIC 32G ASIC 64G ASIC
FC Speeds 2/4/8/16G 4/8/16/32G 8/16/32/64G
No of ports 8 16 24/48
No of VLs NA 4 8
Avg. Buffers/Credits per-port 500 500 1000
Analytics Engine NA NPU Software On Chip
XBAR/ARB Agg on LC External External On Chip
# Ingress ACLs (Hard Zoning) per ASIC 96K 96K 128K
Back-2-Back Connection (Local ARB + Local Switching) No Yes Yes
Typical Power (Gbps/W) 1.6 5.9 10
Ingress Rate Limit (IRL) No Yes Yes
No of Port Channel Member (Max) 16 16 16
Encryption (4 ports/ASIC) AES128 AES128 AES256
© 2023 Cisco and/or its affiliates. All rights reserved.
MDS 1RU (24/32/48P) Fixed Switch evolution
16G 32G 64G

XBAR XBAR
(internal) (internal) XBAR
(internal)
ARB ARB
(internal) (internal) ARB
(internal)

Packet 64G 1Die


XBAR ARB 32G ASIC 32G ASIC
Memory (24x64G)
(internal) (internal) (16x32G) (16x32G)
(internal)
9132T 9124V
16G SOC
(48x16G) XBAR 32 ARB 32
XBAR XBAR
(internal) (internal)

ARB ARB
(internal) (internal)
32G ASIC 32G ASIC 32G ASIC
9148S (16x32G) (16x32G) (16x32G) 64G 1Die 64G 1Die
(24x64G) (24x64G)
9148T 9148V
© 2023 Cisco and/or its affiliates. All rights reserved.

Packet memories are internal in 32G and 64G ASIC


MDS 2RU (96P) Fixed Switch evolution
16G 32G 64G

Packet
ARB 16 XBAR 16
Memory x 12 XBAR 32/64 XBAR 32/64 x 2
ARB 32

16G ASIC 16G ASIC 16G ASIC 16G ASIC


(8x16G) 32G ASIC 32G ASIC 32G ASIC ARB ARB

16G ASIC
(8x16G) (8x16G) (8x16G)
(16x32G) vv
(16x32G) (16x32G) 64G 1Die 64G 1Die
16G ASIC 16G ASIC 16G ASIC 32G ASIC 32G ASIC 32G ASIC (24x64G) (24x64G)
(8x16G) (8x16G) (8x16G) (8x16G)
(16x32G) (16x32G) (16x32G)
16G ASIC 16G ASIC LEM1 LEM2 LEM3
16G ASIC 16G ASIC ARB ARB
(8x16G) (8x16G) (8x16G) (8x16G) 64G 1Die 64G 1Die
(24x64G) (24x64G)
9396S 9396T
9396V

HW architecture looks more like the Modular!


© 2023 Cisco and/or its affiliates. All rights reserved.

Packet memories are internal in 32G and 64G ASIC


MDS SAN Analytics evolution

SW Analytics Encoder
Ingress ACL Mgmt Port
FC & SCSI/NVMe Engine DB Pull/Push to SUP
Tap + Zone
32G ASIC headers (Persist) Agg DB
Streamer
32G ASIC (Temp)
Streaming
Egress ACL NPU 1.0 Telemetry
Tap SUP
32G Linecard

TLVized I/O Metrics


HW Analytics HW Analytics HW Analytics Pull/Push
+ Flush HW DBs
Engine1 Engine2 Engine3 to SUP
DB
DB2 DB3 SW Analytics Encoder
DB1 (Persist)
Agg DB
64G
64G ASIC
ASIC
ACL Tap Engine
(Temp)
Mgmt Port
(Ing/Egr)
DB4 DB5 DB6
Tapped frames of interest NPU 2.0 Streamer
Streaming
HW Analytics HW Analytics HW Analytics
Encoder SUP Telemetry
Engine4 Engine5 Engine6 Streamer
64G Linecard
© 2023 Cisco and/or its affiliates. All rights reserved. Streaming
Telemetry from LC Optional
Agenda
• MDS HW Introduction
• MDS ASIC Architecture
• MDS Product Evolution
• Day in life of a packet inside MDS
• MDS ASIC 15 unique features
• Wrap up

© 2023 Cisco and/or its affiliates. All rights reserved.


Packet Flow 11.Buffer credit
granted, Buffer
accounting done Supervisor
19. Return buffer credit
(destination port +
priority)

Credit Central Arbiter


13. Tx R_RDY

12. Transmit to
Fabric Module 1 Fabric Module 2 Fabric Module 3 Fabric Module 4 Fabric Module 5 Fabric Module 6
fabric
10. Request Fabric ASIC Fabric ASIC Fabric ASIC Fabric ASIC Fabric ASIC Fabric ASIC
(Super Framing)
buffer credit for
destination port
+ priority

Analytics Analytics
Fabric ASIC Fabric ASIC Engine Fabric ASIC Fabric ASIC Engine
7. Final lookup result:
destination port + priority
Req Credit
14. Receive
Forwarding from fabric
… ACL
q1
q2 TCAM 15. Copy Hdr
Dst+Pri 6. Zoning lookups, to Analytics
e
q3 HDR
PKT
ot q4 Zoning Lookups FIB Load Balancing Decision,
em s l
FIB Lookups
fc2/25… R VoQ ca s TCAM Stats
fc1/20 Lo ort
9. Queue packet P SP DWRR
descriptor in VOQ
(destination port + Virtual 16. Buffer on egress
l
ca s
5.Copy Hdr to Lo ort
priority) Queuing Analytics based on destination port fc2/25 P
Ingress + priority, QoS Scheduler
Buffer Ingress Parser
PKT HDR Egress Buffer
8. Packet Stored in 4. Packet

32G ASIC 1

32G ASIC 2

32GASIC 3
headers sent to
32G SOC 2
32G SOC1

32G SOC3
ingress buffer based
on incoming Port, 2. CRC check, Add PHY/MAC FWD 17. Schedule PHY/MAC
Priority/VL Internal Header + for
timestamp, VSAN transmission
© 2023 Cisco and/or its affiliates. All rights reserved. 3. Ingress
18. Remove internal
LC 1 1. Rx packet
R_RDY packet parsing
Header, Timestamp check, LC 2
from wire PKT HDR fc1/20 fc2/25
Agenda
• MDS HW Introduction
• MDS ASIC Architecture
• MDS Product Evolution
• Day in life of a packet inside MDS
• MDS ASIC 15 unique features
• Wrap up

© 2023 Cisco and/or its affiliates. All rights reserved.


#1: Store and Forward Architecture
• Forwarding decision after entire frame received and integrity verified

• Slightly higher per frame switching latency hardly matters compared to overall I/O completion times
• Storage Access times = 100+us (even on NVMe flash storage); Per frame switching latency ~ 2us
• Ensures data integrity

DID lookup and switching towards output port


Forwarding
performed for integrity verified frames only
ASIC Ingress Pipeline
Moves packet to Forwarding only
FC-MAC after verifying its integrity (eg: CRC)

PHY

SoF FC Header (w/ DID) Payload CRC EoF

© 2023 Cisco and/or its affiliates. All rights reserved. Standard FC Frame (Upto 2148 bytes)
#2: Consistent switching Latency
• In all architectures latency between any two ports of the switch is the same (same/different LC/ASIC)

• Predictability in I/O performance

Modular Switch Fixed Switch

XBAR ARB

LC1 LC2
ARB XBAR ARB XBAR
ASIC1 ASIC2 ASIC1 ASIC2
ASIC1 ASIC2

Front Panel Ports Front Panel Ports

© 2023 Cisco and/or its affiliates. All rights reserved.


#3: Fabric In-order delivery (IOD)
• In order delivery of frames within an exchange is important for hosts/storage
• (SID,DID,OXID) hashing by default with ECMP/Port Channel forwarding ensuring all frames of an exchange (I/O)
is bound to one port when multiple ports are available
• Even during topology changes (eg: ECMP/PC member add/delete) IOD is guaranteed
• Fabric IOD guarantee is important for error free I/O transactions between host and storage systems

I/O Req
Host 1
I/O Resp
(FCID= 10.1.1) I/O Resp PortA (#1) PortC (#1)
I/O Req
OXID=100 I/O Req
hash(10.1.2/ 20.1.1/ 200) à 1 (PortA) hash (20.1.1/ 10.1.2/ 200) à 1 (PortC)

PC-Forwarding @ PC-Forwarding @ Target I/O Resp


Host F_Port ingress ISL-PC F-Port ingress Target
I/O Req I/O Resp
(FCID= 20.1.1)
Host 2 hash(10.1.1/ 20.1.1/ 100) à 2 (PortB) hash (20.1.1/ 10.1.1/ 100) à 2 (PortD)
I/O Req
(FCID= 10.1.2) I/O Resp
I/O Resp
PortB (#2) PortD (#2)
I/O Req
OXID=200
MDS 1 MDS 2
© 2023 Cisco and/or its affiliates. All rights reserved.
#4: Arbiter Fairness
• In requesting credits, FC ASIC does RR across all the VoQs and then RR across all the Ingress Ports
• Limits number of outstanding Arbiter requests from a given port to a Destination Port
• Fairness among all the ports of switch

Config: Max 25 outstanding Req/Port


T2:100 Req/Grant for Port-C
ARB
T3: 25 Grants
for Port-C, T0: 50 buffers (credits) registered for Port-C
T1: 100 frames to Port-C 25 frames
Port-A sent
T7 : 25 frames sent
T8: 25 frames sent
T9: 25 frames sent Port-C
T5: 1 Req for Port-C
T6: 1 Grant for Port-C,
T4: 1 frame to Port-C 1 frame sent
Port-B
© 2023 Cisco and/or its affiliates. All rights reserved.
#5: Crossbar Superframing
• Fibre Channel frames are of varying sizes
• Command_IU (Read/Write), XRDY_IU, RSP_IU frames are small size (<200Bytes)
• DATA_IU frames are usually large size (~2K Bytes)
• FC Control frames (eg: RSCN, Zone DB distribution ) are small to large (100-2K Bytes)
• Super framing allows packing small size frames destined to same output port (VoQ) into one programmable
size large frame (4K-6K). If only 1 frame is available, a single frame superframe created
• Egress Port disassembles Superframe to its constituent frames for transmission out of the port
• Improves the switching throughput through the fabric – More frames can be sent per credit
• Improves Avg frame latency

CMD ELS
DATA
(Read) (RSCN)

A Superframe with 3 frames (destined to same port)

© 2023 Cisco and/or its affiliates. All rights reserved.


#6: Crossbar Auto-Spreading
• Between any two switch ports multiple paths exist via the 3-stage crossbar
• FPoE is the destination Linecard’s Fabric ASIC# and carried in the Superframe header used for routing in Stage2
• Preserves IOD of frames in an exchange despite using multiple paths
• Every Superframe assigned a sequence number by Ingress LC ASIC
• Superframes in sequence may take different paths and may arrive in any order at Stage3
• Egress Fabric ASIC will wait for all lower sequence numbers to arrive before disassembling Superframes and transmit of frames

• Uniform use of all available crossbar BW with IOD guarantee

F1 F2 F3 F4 F5 F6
Stage2
Superframe Superframe
Multiple Paths Multiple Paths

F1 F2 F1 F2
Stage1 Stage3
© 2023 Cisco and/or its affiliates. All rights reserved.
#7: Ensuring Data Integrity
Data Integrity is paramount in Storage
• Multistage CRC error checking
• CRC checks on incoming frame(Ingress MAC)
• ASIC to Crossbar Superframe (Inside xbar module)
• Crossbar to ASIC Superframe (Egress Path)
• Error correction
• FEC on every incoming frame
• FEC inside Crossbar
• ECC protected packet memories

• Accounting every frame error/drop


• Per Port/VL: CRC, Bad Words, ITWs, FEC, Timeout Drops, ECC erros etc.
• Per FWD/ASIC: ACL, FIB, XBAR drops

• Collectively ensure integrity of data to/from storage media as it transits through the MDS
© 2023 Cisco and/or its affiliates. All rights reserved.
#8: VSAN
• Logically partition physical SAN to multiple virtual SANs
• Classified at Ingress MAC and carried via internal header from Ingress to Egress Port.
• Carried across ISL if the other end is a Cisco MDS (Trunking E_Port)
• Traffic Isolation, Scalability, Redundancy

VSAN aware blocks

XBAR Interfacing XBAR Interfacing


FC ASIC

Queuing Reorder/Scheduler

Analytics SoF Internal Hdr FC Hdr Payload CRC EoF


Analytics (VSAN)
Forwarding Egress ACL
Forwarding Egress ACL
FC frame with VSAN
FC-MAC
FC-MAC FC-MAC
FC-MAC

PHY PHY

Ingress Port Egress Port

© 2023 Cisco and/or its affiliates. All rights reserved.


#9: Virtual Links (VLs)
• Virtualizes a FC link into multiple VLs
• Per-VL B2B Crediting using ER_RDY (Enhanced R_RDY) primitive
• VLs supported on:
• All 32G/64G MDS ISL ports
• Certain 32/64G Marvell HBA ports (F_Ports)
• Traffic segregation on FC links: Congestion Isolation, Prioritization (QoS)

VL Credits
(ER_RDY)
Frames
Frames

Credits Frames
(R_RDY)
Port Tx Port Rx Port Tx VL Credits
Port Rx
Buffers Buffers Buffers (ER_RDY)
Buffers

FC Link without VLs FC Link with VLs


© 2023 Cisco and/or its affiliates. All rights reserved.
#10: VM Tagging
• VM Tag is a per-frame VM Identifier in FC fabric
• No additional overhead to carry VMID
• Backwards compatible
• S_ID + VM Tag mapping to VM UUID/Name maintained in VEID Server INCITS 545-20xx Rev 0.4

• All HBAs support VM Tagging12from initiators


Frame_Header

• VM Tag in MDS ASICs: 12.1 Scope

• Per-VM Analytics on-chip (eg:Within


IOPS/ECT
the FC-2M sublevel andper VM)
the Frame_Header, addressing information (i.e., the S_ID and D_ID) supports the functionality of
the FC-2V sublevel. All other Frame_Header information supports the functionality
of the FC-2V sublevel.

• Frame stats per VM (eg: #of frames Tx/Rx per VM)


12.2 Introduction

• Identify Slow VM behind a FCTheport


Frame_Header shall be subdivided into fields as shown in table 30.

Table 30 - Frame_Header

Bits 31 .. 24 23 .. 16 15 .. 08 07 .. 00
Word

0 R_CTL D_ID

1 CS_CTL/Priority S_ID

2 TYPE F_CTL

3 SEQ_ID DF_CTL SEQ_CNT

VM Tag carried here 4 OX_ID RX_ID

5 Parameter

The Frame_Header shall immediately follow the SOF delimiter, if no Extended_Headers are present, or
shall follow the last Extended_Header present, and shall be transmitted on a word boundary. The
© 2023 Cisco and/or its affiliates. All rights reserved.
Frame_Header is used to control link operations and device protocol transfers as well as detect missing or
out of order frames.
#11: Credit Pacing (aka Ingress Rate Limiting)
• Paces the rate of B2B credit return to link partner to dictate incoming frame rate (other direction)
• DIRL builds dynamism on top of credit pacing to throttle up/down the incoming rate on the port
• Pushes back oversubscription/credit stall type congestions to misbehaving devices

Congestion
Congestion Reduce credit subsides Resume credit
towards Host return towards Host return gradually

Host Switch Host Switch Host Switch

Frames

Credits
Tx (R_RDY) Tx Credits Tx Credits Tx Credits
Tx Credits Tx Credits
Buff Buff Buff (R_RDY) Buff (R_RDY)
Buff Buff

Normal case (No IRL): Switch port Switch Tx Congestion: Switch port credit Switch Tx Congestion subsides: Tx rate from
returns credit as soon as Rx buffer is free return programmed to a trickle using IRL host slowly picks up
(independent of Switch Rx buffer
© 2023 Cisco and/or its affiliates. All rights reserved.
availability) - Tx rate from host slows down
#12: Onchip, Realtime, Programmable Analytics
• 64G ASIC can compute 70+ I/O metrics on chip • NPU also connected to the ASIC Data Path
• ACLs can be programmed (Ingress/Egress) to copy out
• Full visibility into I/O metrics on all ports at line rate
headers of frames of interest to NPU
• DATA IU not inspected (DATA could be encrypted also) • Software Analytics Engine in NPU can compute custom
• No impact to switching latency metrics not computed on chip

QUE I/O I/O QUE


state metrics

SCSI or NVMe:
CMD/XRDY/RSP
HW Analytics
frames
FWD Engine FWD
(Onchip)

SW Analytics
SCSI or NVMe:
MAC Any frame
Engine (NPU) MAC
matching ACL

© 2023 Cisco and/or its affiliates. All rights reserved. Ingress Egress
#13: Congestion Management
Detection Notification Avoidance/Mitigation

• TxWait, RxWait,
• Non-oversubscribed
TBBZ, RBBZ per-port
switching fabric
(Credit Stall)
• No Credit drop
• Egress Buffer • Congestion Signals
• Timeout drop
occupancy/outstandi (FPIN is software)
• Congestion Isolation
ng credits per-port
/w VLs
(Oversubscription)
• VLs with HBAs
• HW Creditmon
• Credit Pacing

Combination of T11 standard and MDS unique solutions for congestion mgmt
© 2023 Cisco and/or its affiliates. All rights reserved.
#14: Non-disruptive HW upgrade
• SUP1, SUP4, FAB1, FAB3 are compatible with all the Linecards of the MDS system
• SUP4 ARB ASIC is backwards compatible to SUP1 ARB ASIC
• FAB3 XBAR ASIC is backwards compatible with FAB1 XBAR ASIC
• Mix-n-match allowed only during migration procedure

• SUP1àSUP4 migration via user-friendly migration script


• migrate sup kickstart <sup4-kickstart-image> system <sup4-system-image>
• Switch config automatically migrated

• FAB1àFAB3 migration is just a replacement in sequence


• Insert FAB3 into empty slot and power down and remove a FAB1 after a few mins (OR)
• Power down FAB1 wait for few mins and replace it with FAB3

• Non-disruptive migration to new hardware on production MDS chassis

© 2023 Cisco and/or its affiliates. All rights reserved.


#15: Non-disruptive In-Service Software Upgrade (ISSU)
• Software migration with Zero Packet Loss
MDS# install all kickstart kickstart-y.y.y.bin system system-y.y.y.bin
5. Upgrade Line-
cards in batch
4. Bring up old
active with y.y.y
Downtime Control Data
During ISSU Plane Plane

ACTIVE SUP
STANDBY x.x.x
SUP y.y.y
x.x.x Modular ~Zero Zero
1. Pre-checks for
ACTIVE SUP
STANDBY y.y.y
SUP y.y.y
x.x.x
ISSU Fixed ~2 mins Zero

2. Bring up Zero Packet Loss


Standby SUP
with y.y.y
3. Initiate SUP
Switchover

5. Upgrade
Line-cards in
© 2023 Cisco and/or its affiliates. All rights reserved.
batch
Agenda
• MDS HW Introduction
• MDS ASIC Architecture
• MDS Product Evolution
• Day in life of a packet inside MDS
• MDS ASIC 15 unique features
• Wrap up

© 2023 Cisco and/or its affiliates. All rights reserved.


Key Advantages of MDS Modular
• Resilient to failure by design
• Dual Supervisor (NX-OS) – Software fault resilient
• N+1, N+N XBAR
• Redundant Power Supply and Fan
• Hot swap of Supervisor/Linecards/XBAR (upgrade or replace)
• Zero downtime ISSU
• No impact to Data/Control Plane
• High Density/Bandwidth – upto 768 ports in a switch
• Multiprotocol (FC, FCoE, FCIP) connectivity from a single box (one mgmt. domain)
• More resources and higher scale
• Large No of Logins, No of ITLs for SAN Analytics etc.
• Ideal for NPV uplink (NPIV Core Switch)
• Typically used in Edge-Core, Collapsed Core designs for connecting storage systems
• Reliability/Redundancy comparable to that on high end storage arrays
© 2023 Cisco and/or its affiliates. All rights reserved.
Key Advantages of Fixed Switch
• Compact (ToR)
• Low Cost and Power
• Zero downtime for data path during ISSU
• Feature parity with Modular but with lesser scale
• Eg: SAN Analytics supported, but 1/5th # of ITLs.
• BiDi air flow (based on Fan, Power supply – except 9250i)
• NPV mode operation
• Interop
• Large scale fabric

© 2023 Cisco and/or its affiliates. All rights reserved.


ONLY IN

Unique architectural advantages of MDS MDS

• Non blocking architecture inside the switch for efficient use of switch resources
• Central Arbitration to ensure frames are never dropped inside of switches
• Source Fairness to ensure all ports are given a proportionate share of switch
resources
• Consistent Latency to ensure predictable application I/O performance
• In-Order frame delivery even during fabric changes to ensure end-devices don’t
unnecessarily spend cycles reassembling frames of I/O
• Onchip, Real time Analytics for unprecedented visibility to all I/O transiting the
fabric

More Reading: https://gblogs.cisco.com/in/cisco-mds-directors-top-five-architectural-advantages/


© 2023 Cisco and/or its affiliates. All rights reserved.
Fibre Channel and Storage: Made for each other!
• Investment Protection: All FC Ports mandatorily support 4 speeds. FC SFPs support 3 speeds
• MDS 97xx additionally supports a wide speed range from 2G to 64G on the same modular chassis

• No Drop Fabric: B2B crediting ensures no packet drop on the link


• MDS additionally uses a credited crossbar ensuring no-drop inside switches as well
• Rich fabric centric services: Name Server, Zone Server, RSCN etc. make host and storage connectivity
almost plug-n-play.
• MDS also adds rich visibility with fabric centric SAN Analytics and SAN Insights
• Advanced Congestion Management: Notification via Congestion Signals, FPIN
• MDS also offers Congestion Isolation with VLs and DIRL with/without end device support
• Error detection/recovery: Very low BER (<10e-15), FEC, CRC, LR, BB_SC, ABTS, SLER
• MDS additionally provides early CRC drop (first instance)

Other storage transports have a lot to catch up!


© 2023 Cisco and/or its affiliates. All rights reserved.
Thank you

© 2023 Cisco and/or its affiliates. All rights reserved.


The MDS acronym Quiz! J

MDS: Multilayer Director Switch!


(“MDS Switch” , “MDS Fixed Switch” technically incorrect)

ISSU: In Service Software Upgrade

S/T/V: SixteenG, ThirtytwoG, V

© 2023 Cisco and/or its affiliates. All rights reserved.

You might also like