SG 247786

Front cover
IBM b-type
Data Center Networking
Design and Best Practices Introduction
Learn about the products and features

of the IBM b-type portfolio
Discover how to best approach a

network design
Read about the unique

features
Jon Tate
Andrew Bernoth
Ivo Gomilsek
Peter Mescher
Steven Tong
ibm.com/redbooks
International Technical Support Organization
IBM b-type Data Center Networking:

Design and Best Practices Introduction
June 2010
SG24-7786-00
Note: Before using this information and the product it supports, read the information in
“Notices” on page ix.
First Edition (June 2010)
This edition applies to the supported products in the IBM b-type portfolio in September 2009.
Note: This book is based on a pre-GA version of a product and might not apply when the
product becomes generally available. Consult the product documentation or follow-on versions
of this book for more current information.
© Copyright International Business Machines Corporation 2010. All rights reserved.

Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP
Schedule Contract with IBM Corp.
Contents
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
The team who wrote this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . xiii
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
Chapter 1. The role of the data center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Data center evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 High-level architecture of the data center . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Edge Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Core Network tier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.3 Network Services tier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.4 Applications and Data Services tier . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Networking requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.1 Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.2 Backup and recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.3 Capacity estimates and planning . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.4 Configuration management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.5 Disaster recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.6 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.7 Extensibility and flexibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.8 Failure management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.9 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.10 Reliability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.11 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.12 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.13 Serviceability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.14 Service Level Agreement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.15 Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3.16 Network system management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.4 Drivers for a Dynamic Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.4.1 Operational challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5 Virtualization and consolidation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.5.1 Server virtualization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.5.2 The concept of virtual servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
© Copyright IBM Corp. 2010. All rights reserved. iii

Chapter 2. Product introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.1 Product overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.1.1 Product features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.1.2 Naming convention: IBM versus Brocade . . . . . . . . . . . . . . . . . . . . . 33
2.2 Product description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.2.1 IBM m-series Ethernet/IP Routers . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.2.2 IBM r-series Ethernet Switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.2.3 IBM x-series Ethernet Switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.2.4 IBM c-series Ethernet Switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.2.5 IBM s-series Ethernet Switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.2.6 IBM g-series Ethernet Switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Chapter 3. Switching and routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

3.1 Brief network history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
3.1.1 Connectivity to the early computers . . . . . . . . . . . . . . . . . . . . . . . . 110
3.1.2 Introduction to Ethernet networks . . . . . . . . . . . . . . . . . . . . . . . . . . 111
3.1.3 Ethernet cables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
3.2 Network communications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
3.2.1 Introduction to data communications . . . . . . . . . . . . . . . . . . . . . . . 113
3.2.2 Addressing the NIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
3.2.3 Computer network packets and frames . . . . . . . . . . . . . . . . . . . . . 114
3.2.4 Broadcast packets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
3.3 TCP/IP brief introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
3.3.1 TCP/IP protocol. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
3.3.2 IP addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
3.4 Switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
3.4.1 Ethernet packet delivery through coaxial cable or a hub . . . . . . . . 118
3.4.2 Ethernet packet delivery through a switch . . . . . . . . . . . . . . . . . . . 119
3.5 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
3.5.1 TCP/IP network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
3.5.2 Layer 3 switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
3.6 IBM Ethernet switches and routers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Chapter 4. Market segments addressed by the IBM Ethernet products 125

4.1 Data center market segment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.2 Enterprise market segment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4.3 High Performance Computing market segment . . . . . . . . . . . . . . . . . . . 131
4.4 Carrier market segment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Chapter 5. IBM Ethernet in the green data center . . . . . . . . . . . . . . . . . . 133

5.1 Key elements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.2 Power consumption. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.2.1 General power consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.2.2 PoE power saving features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
iv IBM b-type Data Center Networking: Design and Best Practices Introduction
5.3 Power utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.3.1 VLAN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
5.3.2 VSRP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
5.3.3 MRP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.4 Device reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.4.1 Common routing tricks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.4.2 VRF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Chapter 6. Network availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

6.1 Network design for availability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
6.1.1 Hardware resiliency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
6.1.2 Infrastructure resiliency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.2 Layer 2 availability features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.2.1 Spanning Tree Protocol and Rapid Spanning Tree Protocol . . . . . 143
6.2.2 Virtual Switch Redundancy Protocol . . . . . . . . . . . . . . . . . . . . . . . . 144
6.2.3 Metro Ring Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.2.4 Access tier availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
6.3 Layer 3 availability features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
6.3.1 Static routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
6.3.2 Dynamic routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
6.3.3 VRRP and VRRPE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
6.4 Protected link groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
6.5 Link Aggregation Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
6.6 PoE priority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
6.6.1 g-series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
6.6.2 s-series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
6.7 Hitless upgrades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
6.7.1 s-series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
6.7.2 m-series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Chapter 7. Quality of Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

7.1 QoS introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
7.2 Why QoS is used. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
7.3 QoS architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
7.3.1 QoS enabled protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
7.3.2 QoS mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
7.3.3 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
7.3.4 QoS in a single network device . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
7.3.5 Network wide QoS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.3.6 QoS management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
7.4 QoS on b-type networking products . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
7.4.1 FastIron QoS implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
7.4.2 FastIron fixed rate limiting and rate shaping . . . . . . . . . . . . . . . . . . 173
Contents v
7.4.3 FastIron traffic policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
7.4.4 NetIron m-series QoS implementation . . . . . . . . . . . . . . . . . . . . . . 181
7.4.5 NetIron m-series traffic policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
7.4.6 NetIron c-series QoS implementation . . . . . . . . . . . . . . . . . . . . . . . 194
7.4.7 NetIron c-series traffic policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
Chapter 8. Voice over IP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

8.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
8.2 Architecture overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
8.2.1 VoIP call flow inside enterprise . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
8.2.2 VoIP call flow for external calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
8.3 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
8.3.1 QoS for VoIP traffic prioritization . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
8.3.2 VoIP traffic statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
8.3.3 PoE for VoIP devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
8.3.4 VLAN features for VoIP traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
8.3.5 Security features for VoIP traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
Chapter 9. Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

9.1 Security in depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
9.2 Layer 1 security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
9.2.1 Secure console access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
9.3 Layer 2 security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
9.3.1 MAC port security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
9.3.2 Multi-device port authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
9.3.3 802.1x supplicant authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
9.3.4 STP root guard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
9.3.5 BPDU guard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
9.3.6 VLAN security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
9.3.7 VSRP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
9.3.8 Layer 2 ACLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
9.4 Layer 3 security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
9.4.1 Dynamic ARP Inspection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
9.4.2 DHCP snooping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
9.4.3 IP Source Guard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
9.4.4 VRRP authentication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
9.4.5 OSPF authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
9.4.6 BGP password . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
9.5 Layer 4 security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
9.5.1 IP ACLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
9.5.2 Per-port-per-VLAN ACL application . . . . . . . . . . . . . . . . . . . . . . . . 230
9.5.3 Protocol flooding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
9.6 Layer 5 security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
vi IBM b-type Data Center Networking: Design and Best Practices Introduction
9.6.1 System security with ACLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
9.6.2 Remote access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
9.6.3 Telnet / SSH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
9.6.4 HTTP / SSL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
9.6.5 SNMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
Chapter 10. Stacking the g-series IBM Ethernet Switch . . . . . . . . . . . . . 235

10.1 IBM g-series stacking overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
10.1.1 IBM g-series stacking technology . . . . . . . . . . . . . . . . . . . . . . . . . 236
10.1.2 IBM g-series models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
10.1.3 IBM g-series stacking terminology . . . . . . . . . . . . . . . . . . . . . . . . 237
10.2 Stack topology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
10.2.1 Stack priority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
10.2.2 Linear Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
10.2.3 Ring Stack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
10.2.4 Stacking ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
10.3 Stack Unit roles and elections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
10.3.1 Active Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
10.3.2 Standby Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
10.3.3 Stack Member . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
10.4 Secure Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
10.5 Adding, replacing, or removing units in a stack . . . . . . . . . . . . . . . . . . . 243
10.5.1 Adding a new unit to the stack . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
10.5.2 Removing a Stack Unit from the stack . . . . . . . . . . . . . . . . . . . . . 244
10.5.3 Removing the Active Controller from the stack . . . . . . . . . . . . . . . 245
10.5.4 Moving a Stack Unit to another stack . . . . . . . . . . . . . . . . . . . . . . 245
10.5.5 Replacing a Single Stack Unit in a stack. . . . . . . . . . . . . . . . . . . . 245
10.5.6 Replacing multiple Stack Units . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
10.6 Merging stacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
10.7 Best practices in a stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
10.7.1 Stack topology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
10.7.2 Stack Unit changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Chapter 11. Network design for the data center. . . . . . . . . . . . . . . . . . . . 249

11.1 Key architectural decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
11.2 Data center network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
11.2.1 Hierarchical design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
11.3 DCN server access tier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
11.3.1 Access switch placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
11.3.2 Server access switch selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
11.4 DCN distribution tier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
11.4.1 Functions of the distribution tier . . . . . . . . . . . . . . . . . . . . . . . . . . 257
11.4.2 Access/distribution device selection . . . . . . . . . . . . . . . . . . . . . . . 259
Contents vii
11.5 Other tiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
11.5.1 DCN core tier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
11.5.2 DCN connectivity tier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
Chapter 12. Network design for the enterprise. . . . . . . . . . . . . . . . . . . . . 265

12.1 Enterprise network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
12.2 Enterprise site model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
12.2.1 Enterprise site design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
12.2.2 Enterprise site chassis versus stacking considerations . . . . . . . . 270
12.3 Enterprise access tier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
12.3.1 Enterprise access device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
12.3.2 Product selection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
12.4 Enterprise distribution tier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
12.4.1 Enterprise distribution location . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
12.4.2 Enterprise distribution product selection . . . . . . . . . . . . . . . . . . . . 275
12.5 Enterprise core tier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
12.5.1 Core device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
12.5.2 Enterprise core product selection . . . . . . . . . . . . . . . . . . . . . . . . . 278
Chapter 13. Network design for high performance computing. . . . . . . . 281

13.1 High Performance Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
13.1.1 HPC 2.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
13.1.2 HPC 3.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
13.2 HPC 2.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
13.2.1 HPC 2.0 cluster fabric architecture . . . . . . . . . . . . . . . . . . . . . . . . 286
13.2.2 IBM Ethernet products for the flat HPC 2.0 . . . . . . . . . . . . . . . . . . 287
13.2.3 HPC connectivity in a tiered HPC 2.0 . . . . . . . . . . . . . . . . . . . . . . 288
13.3 HPC 3.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
13.3.1 HPC 3.0 cluster fabric architecture . . . . . . . . . . . . . . . . . . . . . . . . 290
13.4 HPC case study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
13.4.1 Deploying HPC 2.0 with an existing storage fabric . . . . . . . . . . . . 291
13.4.2 HPC is an evolving art. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297

IBM Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
How to get Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
viii IBM b-type Data Center Networking: Design and Best Practices Introduction
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area.
Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM
product, program, or service may be used. Any functionally equivalent product, program, or service that
does not infringe any IBM intellectual property right may be used instead. However, it is the user's
responsibility to evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document.
The furnishing of this document does not give you any license to these patents. You can send license
inquiries, in writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION
PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer
of express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may
make improvements and/or changes in the product(s) and/or the program(s) described in this publication at
any time without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any
manner serve as an endorsement of those websites. The materials at those websites are not part of the
materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without
incurring any obligation to you.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm
the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on
the capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the
sample programs are written. These examples have not been thoroughly tested under all conditions. IBM,
therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.
© Copyright IBM Corp. 2010. All rights reserved. ix

Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corporation in the United States, other countries, or both. These and other IBM trademarked
terms are marked on their first occurrence in this information with the appropriate symbol (® or ™),
indicating US registered or common law trademarks owned by IBM at the time this information was
published. Such trademarks may also be registered or common law trademarks in other countries. A current
list of IBM trademarks is available on the webat http://www.ibm.com/legal/copytrade.shtml
The following terms are trademarks of the International Business Machines Corporation in the United States,
other countries, or both:
Dynamic Infrastructure® IBM® System Storage™

GDPS® Parallel Sysplex® Tivoli®
Geographically Dispersed Redbooks® xSeries®
Parallel Sysplex™ Redbooks (logo) ®
The following terms are trademarks of other companies:
InfiniBand, and the InfiniBand design marks are trademarks and/or service marks of the InfiniBand Trade
Association.
Microsoft, and the Windows logo are trademarks of Microsoft Corporation in the United States, other
countries, or both.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
x IBM b-type Data Center Networking: Design and Best Practices Introduction
Preface
As organizations drive to transform and virtualize their IT infrastructures to

reduce costs, and manage risk, networking is pivotal to success. Optimizing
network performance, availability, adaptability, security, and cost is essential to
achieving the maximum benefit from your infrastructure.
In this IBM® Redbooks® publication, we address the requirements:

򐂰 Expertise to plan and design networks with holistic consideration of servers,
storage, application performance and manageability
򐂰 Networking solutions that enable investment protection with a range of
performance and cost options that match your environment
򐂰 Technology and expertise to design and implement and manage network
security and resiliency
򐂰 Robust network management software to provide integrated, simplified
management that lowers operating costs of complex networks
IBM and Brocade have entered into an agreement to provide expanded network
technology choices with the new IBM b-type Ethernet Switches and Routers, to
provide an integrated end-to-end resiliency and security framework.
Combined with the vast data center design experience of IBM and the
networking expertise of Brocade, this portfolio represents the ideal convergence
of strength and intelligence. For organizations striving to transform and virtualize
their IT infrastructure, such a combination can help you reduce costs, manage
risks, and prepare for the future.
In this book, we introduce the products and the highlights of the IBM b-type
portfolio from a viewpoint of design and suggested practices.
This book is meant to be used in conjunction with: IBM b-type Data Center
Networking: Product Introduction and Initial Setup, SG24-7785
We realize that the scope of this subject is enormous, so we have concentrated

on certain areas only. However, as our portfolio matures, we intend to update this
book to include new areas, and revisit other areas.
Be sure to let us know of any additions you want to see in this book, because we
always welcome fresh ideas.
© Copyright IBM Corp. 2010. All rights reserved. xi

The team who wrote this book
This book was produced by a team of specialists from around the world working
at the International Technical Support Organization, San Jose Center, and at
Brocade Communications, San Jose.
Jon Tate is a Project Manager for IBM System Storage™ Networking and
Virtualization Solutions at the International Technical Support Organization, San
Jose Center. Before joining the ITSO in 1999, he worked in the IBM Technical
Support Center, providing Level 2 support for IBM storage products. Jon has 25
years of experience in storage software and management, services, and support,
and is both an IBM Certified IT Specialist and an IBM SAN Certified Specialist.
He is also the UK Chairman of the Storage Networking Industry Association.
Andrew Bernoth is the IBM Network Services Lead Architect for the Asia Pacific
region based out of Melbourne, Australia. Prior to this, Andrew worked on global
architecture and security standards for the IBM services extranet environment.
He has 20 years of experience in computing, over 15 years of which has been
focused on network and security. Andrew holds GSEC and CISSP security
certifications as well as IBM Certified IT Architect. His work on a security
checking program for communication between networks was awarded a patent in
2008.
Ivo Gomilsek is a Certified IT Architect working in IBM Austria as a solution

architect for STG Sales in CEE/CEEMEA region. His responsibilities include
architecting, consulting, deploying, and supporting infrastructure solutions.
His areas of expertise include SAN, storage, networking, HA systems
(GDPS/GDOC), cross platform server and storage consolidation and open
platform operating systems. Ivo holds several certifications from various vendors
(IBM, Red Hat, Microsoft®, Symantec, and VMware). Ivo has contributed to
various other Redbooks publications on Tivoli® products, SAN, storage
subsystems, Linux/390, xSeries®, and Linux®.
Peter Mescher is a Product Engineer on the SAN Central team within the IBM
Systems and Technology Group in Research Triangle Park, North Carolina.
He has seven years of experience in SAN Problem Determination and SAN
Architecture. Before joining SAN Central, he performed Level 2 support for
network routing products. He is a co-author of the SNIA Level 3 FC Specialist
Exam. This is his sixth Redbooks publication.
Steven Tong is a corporate Systems Engineer for Brocade focused on

qualification and solutions development of IBM Data Center Networking
products. His areas of expertise include Storage Area Networks (SAN) as well as
Ethernet and IP based networks.
xii IBM b-type Data Center Networking: Design and Best Practices Introduction
Thanks to the following people for their contributions to this project:
Brian Steffler
Marcus Thordal
Kamron Hejazi
Mike Saulter
Jim Baldyga
Brocade
Holger Mueller
Pete Danforth
Casimer DeCusatis
Doris Konieczny
Aneel Lakhani
Mark Lewis
Tom Parker
Steve Simon
IBM
Emma Jacobs
International Technical Support Organization, San Jose Center
Now you can become a published author, too!

Here's an opportunity to spotlight your skills, grow your career, and become a
published author - all at the same time! Join an ITSO residency project and help
write a book in your area of expertise, while honing your experience using
leading-edge technologies. Your efforts will help to increase product acceptance
and customer satisfaction, as you expand your network of technical contacts and
relationships. Residencies run from two to six weeks in length, and you can
participate either in person or as a remote resident working from your home
base.
Find out more about the residency program, browse the residency index, and
apply online at:
ibm.com/redbooks/residencies.html
Preface xiii
Comments welcome
Your comments are important to us!
We want our books to be as helpful as possible. Send us your comments about

this book or other IBM Redbooks publications in one of the following ways:
򐂰 Use the online Contact us review Redbooks publications form found at:
ibm.com/redbooks
򐂰 Send your comments in an email to:
[email protected]
򐂰 Mail your comments to:
IBM Corporation, International Technical Support Organization
Dept. HYTD Mail Station P099
2455 South Road
Poughkeepsie, NY 12601-5400
Stay connected to IBM Redbooks

򐂰 Find us on Facebook:
http://www.facebook.com/IBMRedbooks
򐂰 Follow us on Twitter:
http://twitter.com/ibmredbooks
򐂰 Look for us on LinkedIn:
http://www.linkedin.com/groups?home=&gid=2130806
򐂰 Explore new Redbooks publications, residencies, and workshops with the
IBM Redbooks publications weekly newsletter:
https://www.redbooks.ibm.com/Redbooks.nsf/subscribe?OpenForm
򐂰 Stay current on recent Redbooks publications with RSS Feeds:
http://www.redbooks.ibm.com/rss.html
xiv IBM b-type Data Center Networking: Design and Best Practices Introduction
1
Chapter 1. The role of the data center

During the past few decades, the network in the data center has not been given
much attention. However, with the introduction of virtualized server resources,
the network can now emerge from behind the scenes to command center stage
and play a central role in tomorrow’s virtualized environments.
The network is becoming the new backplane. In this chapter we describe the
history of the data center network and examine the multiple forces that are
driving a transformation of the data center network.
© Copyright IBM Corp. 2010. All rights reserved. 1

1.1 Data center evolution
When looking at the evolution of the enterprise data center and how we got to
where we are today, we can distinguish between three phases, as shown in
Figure 1-1.
Figure 1-1 Evolution of the data center
If we go back in time to the mainframe-centric computing area, it was out of

necessity that the environment was a shared environment. Multiple applications
were hosted on the same infrastructure, but there were some limitations. It was
able to support only a certain set of applications and services. There were many
more applications that the business wanted, but were not provided from the data
center, leading to the impression that the environment was unresponsive to
business needs.
2 IBM b-type Data Center Networking: Design and Best Practices Introduction
The development of standardized industry hardware and different technology
innovations, such as client-server technology or e-business on demand, has led
us to the environment that we find ourselves in today, a distributed infrastructure
that is difficult to re-provision. Most services and applications are running on
dedicated environments, which means that multiple instances of raised floor,
wiring, LAN equipment, and servers are deployed in the data centers. The
positive effect of this has been an explosion of applications and access,
particularly through the Internet. However, all of this has caused a fragmentation
of the enterprise and the computing capacity associated with it. We now have
islands of computing resources and puddles of information, which become very
costly and inefficient to manage.
We have now started to enter into the next phase driven by technology
innovations such as virtualization, Web 2.0, Software as a Service (SaaS),
Service Oriented Architecture (SOA), Cloud Computing, converged networks
and so on. This is leading us into a new enterprise model, that will again be
characterized by a re-centralization and high sharing, but will be built with an
extremely flexible infrastructure. One that can be quickly re-provisioned to
respond to changing business needs and market demands.
Along with these technology advancements it has placed a tremendous strain on

the data centers and the IT operations associated with it. IT professionals have to
balance the challenges associated with managing data centers as they increase
in cost and complexity, with the need to be highly responsive to ongoing and
increased demands from the business that are placed on the IT infrastructure.
On one side, we have a set of daily operational challenges around cost, service
delivery, business resilience, security, and green IT initiatives that bring many IT
data centers to a breaking point. On the other side, there are business and
technology innovations that can drive competitive advantage. Never before has
the Enterprise Data Center faced such a “perfect storm” of forces that drives the
need for true data center transformation.
Chapter 1. The role of the data center 3

1.2 High-level architecture of the data center
Figure 1-2 illustrates the high-level network architecture of the data center.
Figure 1-2 Data center network architecture
The major architectural tiers are as follows:

򐂰 Edge Services:
This component provides entry and exit for all traffic from outside the data
center towards the switching infrastructure for application and storage
services. It also provides connectivity between data center facilities for high
availability, replication and disaster recovery.
򐂰 Core Network:
This component is where the whole switching infrastructure resides and
connects all data center networks within and across data centers
򐂰 Network Services:
This component provides WAN acceleration, intrusion prevention, firewall
services, and other network services.
򐂰 Applications and Data Services:
This component connects to the Core Network and hosts all of the servers,
databases, and storage.
Other services supporting these four basic blocks are deployment tools and
management services. These encompass the entire data center network. They
must be considered in the deployment of any basic component.
In the following sections, we explore the different data center network tiers in
greater detail.
1.2.1 Edge Services

The Edge Services component (for data center edge connectivity) provides the
connectivity from networks outside of the data center into the data center
network. This is the area where security is mostly implemented to control access
to the data center.
This component requires some or all of the following features:

򐂰 DC-to-DC connectivity:
– Medium speed transport
– High speed, low latency transport or multiplexed fiber
– Can be synchronous or asynchronous
This connectivity requires transport to support the data replication goals,
which can be achieved by using several existing technologies (point-to-point
private links, Frame Relay links, ATM, or SONet-based), or by using leading
edge technologies (Private Optical Networks, Course wavelength Division
Multiplexing (CWDM), Dense Wavelength Division Multiplexing (DWDM),
Worldwide Interoperability for Microwave Access (WiMax), and other
microwave transmission advances).
򐂰 Internet security:
– Business-to-consumer firewall and DMZ
– Business-to-business firewall or VPN device
– Business-to-employee VPN device (dial-up, point-to-point, broadband,
wireless)

򐂰 Campus LAN security:
Matching clients’ policies for internal or local users
򐂰 WAN security:
Matching clients’ policies for external or remote users
򐂰 Internet routing isolation:
Separating exterior routing protocols from interior routing protocols
򐂰 Network Address Translation (NAT):
Converting private IP addresses to public Internet routable IP addresses
򐂰 QoS:
Providing different priority to different applications, users, or data flows
1.2.2 Core Network tier

The LAN switching fabric is usually divided into tiers that consist of nodes
(L2/L3 switches) and links.
For enterprise-grade, server-intensive data centers, the multi-tier model is

preferred. Up to three layers can be used:
򐂰 Access layer: 100 MbE/1 GbE or 10 GbE copper or fiber links for server
connectivity
򐂰 Distribution/aggregation layer: Multi GbE/10 GbE fiber links for
interconnecting the access layer with the core layer
򐂰 Core layer: Multi GbE/10 GbE fiber links for interconnecting the data center
with the rest of the enterprise network
Depending on the requirements, there are pros and cons for either a tier-2 or
tier-3 data center design.
Several other elements can impact the architecture of this component:

򐂰 Physical facility and cabinet or rack floor layout (available space, type of
power, cables, connectors, patch panels, cabinet types, and rack types)
򐂰 Building or room security and access security (fire, water, and flood
protection, physical access security)
򐂰 Environment cooling (temperature monitors, cooling systems, emergency
plans)
򐂰 Access, distribution/aggregation, and core switch placement
򐂰 Network physical cabling
1.2.3 Network Services tier
Network Services are closely aligned to the network protocols that support data
center applications. They are generally divided into two categories:
򐂰 Security services, such as firewalls and intrusion prevention
򐂰 Application front-end services, such as server load balancing and content
distribution.
Load balancing services are important parts of data center architectures.
These services can be divided into two categories:
– Local server load balancers distribute content requests (from Layer 4 to
Layer 7) sourced by remote clients across several systems within a single
data center.
– Global site selectors optimize multi-site deployments that involve globally
distributed data centers. These selectors are the cornerstone of multi-site
disaster recovery plans.
The Network Services tier must extend to any of the server networks hosted in
the data center, and apply a network-specific policy and set of configurations to
appropriately interact with the traffic in that particular network section. For
example, using a security service, such as traffic SYN checking/sequence
number checking, might only be required for servers available to the outside
world. Therefore, the architecture must support the application of these features
only to those systems or networks. Most importantly, key characteristics are
enabled by direct logical attachment to the data center’s network core.
Leveraged throughout is the Network Services tier’s ability to extend a shared

pool of network services to any of the server and data networks, while allowing
for granular and specific network service settings for each service. The network
services are virtually available for each of the backend data and service
networks, while sharing the network service resources across the entire data
center. This approach allows the designer to intelligently deploy network services
to different applications and networks in the data center. Virtual instances are a
key consideration in designing the Network Services tier.
1.2.4 Applications and Data Services tier

The Core Network tier connects to the Applications and Data Services tier that
hosts all of the servers, databases and storage. Generally, there are four types of
networks, with multiple instances of each type. The primary reasons for the
multiple instances are separation of duties within the organization, and
differentiated objectives and IT requirements for the different networks.

The applications and data services component encompasses the following four
networks:
򐂰 External Applications Network: There can be multiple external networks
serving separate network segments. These typically include applications such
as the public website, public mail transfer agent (MTA), Domain Name System
(DNS) services, remote access, and potential file services that are available
through unfiltered access.
򐂰 Internal Applications Network: Multiple internal networks serve different levels
of internal access from within the organization’s various locations. These
networks typically connect internal applications such as finance or health care
services systems.
򐂰 Infrastructure Services Network: Only servers that are accessible to users are
allowed to access infrastructure networks. These are intended to operate only
on an automatic basis and performance usually is quite predictable. Common
examples of infrastructure services include Lightweight Directory Access
Protocol (LDAP), databases, file shares, content management, and
middleware servers.
򐂰 Storage Network: The storage network is built on technologies including Fibre
Channel, the InfiniBand® serial link, and the Internet Small Computer System
Interface (iSCSI) protocol. Critical application servers directly connect to
storage devices through a separate Host Bus Adapter (HBA) to ensure fast
access to data. Other servers connect using Ethernet to access storage
facilities.
The data center application connection options (which can be combined as

needed to achieve high availability) are as follows:
򐂰 Independent servers:
Put in place duplicate servers so that the client can re-access the service.
This option works well with DNS and DHCP.
򐂰 Floating IP:
Put in place a floating IP address, which implies that two or more machines
can assume or advertise the same IP address for availability.
򐂰 NIC teaming:
Put in place redundant server-network interfaces, sharing the same IP
address.
򐂰 Clusters:
Put in place multiple servers sharing the same IP address.
򐂰 Virtual IP:
Put in place a load balancer on the application front end with a unique IP
address that will handle requests to specific servers.
A Storage Area Network (SAN) connects servers and storage devices across a
packet-switched network. SANs allow arbitrary block- level access from servers
to storage devices, and storage devices to each other. Multiple servers can
therefore share storage for clustering and HA applications. In addition, the
storage devices themselves can implement data protection services (such as
synchronous data replication, asynchronous data replication or data snapshots)
by directly moving data to another storage device. SANs also provide a set of
configuration, directory, discovery and notification services to attached devices.
A data center typically contains multiple SANs, each serving a different

application, set of applications, work group or department. Depending upon the
specific requirements, these SANs can be either Fibre Channel (FC) or
iSCSI-based deployments. Both Fibre Channel Protocol (FCP) and iSCSI allow
block access to storage devices using SCSI commands. FCP uses the Fibre
Channel communication structure of exchanges, sequences, and frames. The
iSCSI protocol uses TCP/IP with an overlay of iSCSI protocol data units (PDUs)
to implement SCSI commands and data framing.
Fibre Channel SANs

A Fibre Channel fabric has link-level credit-based flow control, making it
essentially lossless without equipment failure. and is able to provide speeds up to
8 Gbps. FC HBAs are FC protocol offload engines that handle most of the
exchange management and all of the frame transmission or other low-level
protocol work. Frame forwarding is based on an equal cost multipath link state
protocol called Fabric Shortest Path First (FSPF). Switch implementation does
not reorder frames unless a failure occurs. The set of FC fabric services is
distributed throughout the switches in the fabric.
iSCSI SANs
An iSCSI SAN can be based upon any network supporting the IP protocols.
In practice, this means iSCSI SANs are built from Ethernet switches. In principle,
because iSCSI is based upon TCP/IP, it can run on any switching infrastructure.
However, in practice, depending upon the features of the Ethernet switches, the
performance characteristics of TCP/IP in the face of dropped frames can limit
iSCSI deployments to low-performance SANs. In addition, most iSCSI
deployments presently only use 1 Gigabit Ethernet with software drivers, and the
resulting performance does not compare favorably to FC at 2 Gbps, 4 Gbps or
8 Gbps with an offload HBA. However, iSCSI SANs can be considerably less
expensive than FC SANs. The Internet Storage Name Service (iSNS) server
provides all fabric services in an iSCSI SAN.

SANs are often linked to remote data centers so that data can be replicated as
part of a Business Continuity/Disaster Recovery (BC/DR) design. The inter-data
center connections can run across direct optical repeater circuits such as dense
wavelength-division multiplexing (DWDM), private IP-based WAN connections or
the Internet.
FC traffic uses DWDM for metro-to-regional distances and specialized FCIP

tunnel gateways for regional to longer distances. Using DWDM requires FC
switches with FC credits sufficient to span the distance at the desired throughput.
Fibre Channel over IP (FCIP) gateways create complete WAN acceleration
services such as compression, large buffering, security, encapsulation, and
tunneling for FC traffic.
The iSCSI traffic can directly traverse the WAN connection without requiring a
gateway, but iSCSI implementations do not generally provide sufficient buffering
to fully utilize high-speed connections. The iSCSI implementations do not contain
compression or other WAN optimization features. Therefore, iSCSI WAN traffic
can often benefit from a WAN acceleration device. The iSCSI traffic also can
benefit from a data security gateway providing IPSec and VPN tunnels.
1.3 Networking requirements

In this section we discuss the requirements of an IT network system.
1.3.1 Availability
Availability means that data or information is accessible and usable upon
demand by an authorized person. Two of the major factors that affect availability
are redundancy and convergence.
Redundancy
Redundant data centers involve complex solution sets depending on a client’s
requirements for backup and recovery, resilience, and disaster recovery. Most
inter-data center connectivity involves private optical networking solutions for
network and storage.
Convergence
Convergence is the time required for a redundant network to recover from a
failure and resume traffic forwarding. Data center environments typically include
strict uptime requirements and therefore need fast convergence.
1.3.2 Backup and recovery
Although the ability to recover from a server or storage device failure is beyond
the scope of network architecture requirements, potential failures such as the
failure of a server network interface card (NIC) will be taken into consideration. If
the server has a redundant NIC, then the network must be capable of redirecting
traffic to the secondary network as needed.
As for network devices, the backup and recovery ability typically requires the use
of diverse routes and redundant power supplies and modules. It also requires
defined processes and procedures for ensuring that current backups exist in
case of firmware and configuration failures.
1.3.3 Capacity estimates and planning

Network capacity is defined in two dimensions: vertical and horizontal capacity:
򐂰 Vertical capacity relates to the forwarding and processing capacity—in this
case, a matrix such as bandwidth, packet rate, concurrent sessions, and so
on.
򐂰 Horizontal capacity involves the breadth and reach of the network—in this
case, a matrix such as server port counts, external connectivity bandwidth,
and so on.
1.3.4 Configuration management

Configuration management covers the identification, recording, and reporting of
IT/network components, including their versions, constituent components, states
and relationships to other networks components. Configuration Items (CI) that
must be under the control of Configuration Management include network
hardware, network software, services and associated documentation.
It also includes activities associated with change management policies and

procedures, administration, and implementation of new installations and
equipment moves, additions, or changes.
The goal of configuration management is to ensure that a current backup of each

network device’s configuration exists in a format that can be quickly restored in
case of failure.

1.3.5 Disaster recovery
The requirements for disaster recovery is the ability to resume functional
operations following a catastrophic failure. The definition of functional operations
depends upon the depth and scope of an enterprise’s operational requirements
and its business continuity and recovery plans.
Multi-data center environments that provide a hot-standby solution is one

example of a disaster recovery plan.
Another possible solution uses three data centers. The first is the active data
center which is synchronized with the second or standby data center. The third
site becomes a back-up site to which data is copied asynchronously according to
specific policies. Geographically Dispersed Parallel Sysplex™ (GDPS®) and
related technologies are used.
1.3.6 Environment
There are environmental factors such as the availability of power or air
conditioning and maximum floor loading that influence the average data center
today. Network architecture and architects must take these factors into
consideration.
1.3.7 Extensibility and flexibility

The network architecture must allow the infrastructure to expand as needed to
accommodate new services and applications.
1.3.8 Failure management

All failures must be documented and tracked. The root cause of failures must be
systematically determined and proactive measures taken to prevent a repeat of
the failure. Failure management processes and procedures will be fully
documented in a well architected data center environment.
1.3.9 Performance
Network performance is usually defined by the following terms:
򐂰 Capacity:
Capacity refers to the amount of data that can be carried on the network at
any point of time. A network architecture must take into account anticipated
minimum, average, and peak utilization of traffic patterns.
򐂰 Throughput:
Throughput is related to capacity, but focuses on the speed of data transfer
between session pairs versus the utilization of links.
򐂰 Delay:
Delay, also known as “lag” or “latency” is defined as a measurement of
end-to-end propagation times. This requirement is primarily related to
isochronous traffic, such as voice and video services.
򐂰 Jitter:
Jitter is the variation in the time between packets arriving, caused by network
congestion, timing drift, or route changes. It is most typically associated with
telephony and video-based traffic.
򐂰 Quality of Service:
Quality of Service (QoS) requirements include the separation of traffic into
predefined priorities. QoS helps to arbitrate temporary resource contention. It
also provides an adequate service level for business-critical administrative
functions, as well as for delay-sensitive applications such as voice, video, and
high-volume research applications.
1.3.10 Reliability
Reliability is the time a network infrastructure is available to carry traffic. Because
today’s data center houses critical applications and services for the enterprise,
outages are becoming less and less tolerable.

Reliability is expressed in terms of the percentage of time a network is available,
as shown in Table 1-1.
Table 1-1 Reliability

Total downtime (HH:MM:SS)
Availability Per day Per month Per year
99.9999 00:00:00.08 00:00:02.7 00:00:32
99.999 00:00:00.4 00:00:26 00:05:15
99.99 00:00:08 00:04:22 00:52:35
99.9 00:01:26 00:43:49 08:45:56
99 00:14:23 07:18:17 87:39:29
Very few enterprises can afford a 99.9999% level of reliability, because it is

usually too costly. Instead, many moderate-sized businesses opt for a 99.99% or
99.9% level of reliability.
1.3.11 Scalability
In networking terms scalability is the ability of the network to grow incrementally
in a controlled manner.
For enterprises that are constantly adding new servers and sites, architects
might want to specify something more flexible, such as a modular-based system.
Constraints that might affect scalability, such as defining spanning trees across
multiple switching domains or additional IP addressing segments to
accommodate the delineation between various server functions, must be
considered.
1.3.12 Security
Security in a network is the definitions and levels of permission needed to access
devices, services, or data within the network. We consider the following
components of a security system:
򐂰 Security policy:
Security policies define how, where, and when a network can be accessed.
An enterprise will normally develop security policies related to networking as
a requirement. The policies will also include the management of logging,
monitoring, and auditing events and records.
򐂰 Network segmentation:
Network segmentation divides a network into multiple zones. Common zones
include various degrees of trusted and semi-trusted regions of the network.
򐂰 Firewalls and inter-zone connectivity:
Security zones are typically connected with some form of security boundary,
such as firewalls or access-control lists. This might take the form of either
physical or logical segmentation, or a combination of both.
򐂰 Access controls:
Access controls are used to secure network access. All access to network
devices will be by user-specific login credentials; there must be no
anonymous or generic logins.
򐂰 Security monitoring:
To secure a data center network, a variety of mechanisms are available
including Intrusion Detection System (IDS), Intrusion Protection System
(IPS), content scanners, and so on. The depth and breadth of monitoring will
depend upon both the customer’s requirements as well as legal and
regulatory compliance mandates.
򐂰 External regulations:
External regulations will often play a role in network architecture and design
due to compliance policies such as Sarbanes-Oxley (SOX), Payment Card
Industry Data Security Standards (PCI DSS), The Health Insurance
Portability and Accountability Act (HIPAA); and a variety of other industry and
non-industry-specific regulatory compliance requirements.
1.3.13 Serviceability
Serviceability refers to the ability to service the equipment. Several factors can
influence serviceability, such as modular or fixed configurations or requirements
of regular maintenance.
1.3.14 Service Level Agreement

A Service Level Agreement (SLA) is an agreement or commitment by a service
provider to provide reliable, high-quality service to its clients or end users. An
SLA is dependent on accurate baselines and performance measurements.
Baselines provide the standard for collecting service-level data, which is used to
verify whether or not negotiated service levels are being met.

Service level agreements can apply to all parts of the data center, including the
network. In the network SLAs are supported by various means such as QoS and
configuration management, and availability.
1.3.15 Standards
Network standards are key to the smooth and ongoing viability of any network
infrastructure:
򐂰 Hardware configuration standards
򐂰 Physical infrastructure standards
򐂰 Network security standards
򐂰 Network services standards
򐂰 Infrastructure naming standards
򐂰 Port assignment standards
򐂰 Server attachment standards
򐂰 Wireless LAN standards
򐂰 IP addressing standards
򐂰 Design and documentation standards
򐂰 Network management standards
򐂰 Network performance measurement and reporting standards
򐂰 Usage metering and billing standards
1.3.16 Network system management

Network system management must facilitate the following key management
processes:
򐂰 Network Discovery and Topology Visualization:
This process includes the discovery of network devices, network topology,
and the presentation of graphical data in an easily understood format.
򐂰 Availability management:
This process provides for the monitoring of network device connectivity.
򐂰 Event management:
This process provides for the receipt, analysis, and correlation of network
events.
򐂰 Asset management:
This process facilitates the discovery, reporting, and maintenance of the
network hardware infrastructure.
򐂰 Configuration management:
This process facilitates the discovery and maintenance of device software
configurations.
򐂰 Performance management:
This process provides for monitoring and reporting network traffic levels and
device utilization.
򐂰 Incident management:
This process addresses the goal of incident management, which is to recover
standard service operation as quickly as possible. The incident management
process is used by many functional groups to manage an individual incident.
The process includes minimizing the impact of incidents affecting the
availability or performance, which is accomplished through analysis, tracking,
and solving of incidents that have impact on managed IT resources.
򐂰 Problem management:
This process includes identifying problems through analysis of incidents that
have the same symptoms, finding the root cause and fixing it, in order to
prevent malfunction reoccurrence.
򐂰 User and accounting management:
This process is responsible for ensuring that only those authorized can
access the needed resources.
򐂰 Security management:
This process provides secure connections to managed devices and
management of security provisions in device configurations.
1.4 Drivers for a Dynamic Infrastructure

Technology leaders are challenged to manage sprawling, complex, distributed
infrastructures and an ever growing tidal wave of data, while remaining highly
responsive to business demands. Additionally, they must evaluate and decide
when and how to adopt multitude of innovations that will keep their companies
competitive.
Global integration is changing the corporate model and the nature of work itself.
The problem is that most IT infrastructures were not built to support the explosive
growth in computing capacity and information that we see today.

CEOs are looking for ways to increase the collaboration between their
employees, customers, suppliers and partners. This means providing different
groups access to the tools, software, data and IT infrastructure required to
facilitate collaboration, as well as access to it all by remote and mobile
connectivity.
This idea of collaboration across the different groups or communities is seen as a

key enabler for leveraging information for business innovation. As the IT
infrastructure has become more distributed over the past 15 years, the ability to
quickly authorize access to the right set of resources (applications, servers,
storage) or install and provision additional resources to respond to new business
demands has become more complex and costly. The distributed computing
environment and its resulting management complexity is a large inhibitor to
business and IT coming together to respond quickly to market forces and
demands.
IBM has developed a strategy known as the IBM Dynamic Infrastructure®, which
is an evolutionary model for efficient IT delivery that helps to drive business
innovation. This approach allows organizations to be better positioned to adopt
integrated new technologies, such as virtualization and Cloud computing, to help
deliver dynamic and seamless access to IT services and resources. As a result,
IT departments will spend less time fixing IT problems and more time solving real
business challenges.
1.4.1 Operational challenges

IT professionals spend much of the day fixing problems, instead of applying the
time and resources to development activities that can truly drive business
innovation. In fact, many say they spend too much time mired down in operations
and precious little time helping the business grow.
IBM has taken a holistic approach to the transformation of IT and developed the
Dynamic Infrastructure, which is a vision and strategy for the future of enterprise
computing. The Dynamic Infrastructure enables you to leverage today’s best
practices and technologies to better manage costs, improve operational
performance and resiliency, and quickly respond to business needs. Its goal is to
deliver the following benefits:
򐂰 Improved IT efficiency:
Dynamic Infrastructure helps transcend traditional operational issues; and
achieve new levels of efficiency, flexibility, and responsiveness. Virtualization
can uncouple applications and business services from the underlying IT
resources to improve portability. It also exploits highly optimized systems and
networks to improve efficiency and reduce overall cost.
򐂰 Rapid service deployment:
The ability to deliver quality service is critical to businesses of all sizes.
Service management enables visibility, control, and automation to deliver
quality service at any scale. Maintaining user satisfaction by ensuring cost
efficiency and return on investment depends upon the ability to see the
business (visibility), manage the business (control), and leverage automation
(automate) to drive efficiency and operational agility.
򐂰 High responsiveness and business goal-driven infrastructure:
A highly efficient, shared infrastructure can help businesses respond quickly
to evolving demands. It creates opportunities to make sound business
decisions based on information obtained in real time. Alignment with a
service-oriented approach to IT delivery provides the framework to free up
resources from more traditional operational demands and to focus them on
real-time integration of transactions, information, and business analytics.
The delivery of business operations impacts IT in terms of application services

and infrastructure. Business drivers also impose key requirements for data
center networking:
򐂰 Support cost-saving technologies such as consolidation and virtualization
with network virtualization
򐂰 Provide for rapid deployment of networking services
򐂰 Support mobile and pervasive access to corporate resources
򐂰 Align network security with IT security based on enterprise policies and legal
issues
򐂰 Develop enterprise-grade network design to meet energy resource
requirements (green)
򐂰 Ensure high availability and resilience of the networking infrastructure
򐂰 Provide scalability to support applications’ need to access services on
demand
򐂰 Align network management with business-driven IT Service Management in
terms of processes, organization, Service Level Agreements, and tools
Now, equipped with a highly efficient, shared and Dynamic Infrastructure along
with the tools needed to free up resources from traditional operational demands,
IT can more efficiently respond to new business needs. As a result, organizations
can focus on innovation and aligning resources to broader strategic priorities.
Decisions can now be based on real-time information. Far from the “break/fix”
mentality gripping many data centers today, this new environment creates an
infrastructure that provides automated, process-driven service delivery and is
economical, integrated, agile, and responsive.

IT professionals have continued to express concerns about the magnitude of the
operational issues they face. We describe these operational issues in the
following sections.
Costs and service delivery

Time is money, and most IT departments are forced to stretch both. There is no
question that the daily expense of managing operations is increasing, as is the
cost and availability of skilled labor. In fact, IT system administration costs have
grown four-fold and power and cooling costs have risen eight-fold since 1996
alone.1 Furthermore, in today’s data center, data volumes and network
bandwidth consumed are doubling every 18 months with devices accessing data
over networks doubling every 2.5 years. 2´
Energy efficiency
As IT grows, enterprises require greater power and cooling capacities. In fact,
energy costs related to server sprawl alone my rise from less than 10 percent to
30 percent of IT budgets in the coming years. 3These trends are forcing
technology organizations to become more energy efficient in order to control
costs while developing a flexible foundation from which to scale.
Business resiliency and security

As enterprises expand globally, organizations are requiring that IT groups
strengthen the security measures they put in place to protect critical information.
At the same time, companies are demanding that users have real-time access to
this information, putting extra (and often conflicting) pressure on the enterprise to
be both secure and resilient in the expanding IT environment.
Changing applications and business models

A major shift has taken place in the way people connect, not only between
themselves but also to information, services, and products. The actions and
movements of people, processes, and objects with embedded technology are
creating vast amounts of data, which consumers use to make more informed
decisions and drive action.
By 2011, it is estimated that:

򐂰 Two billion people will be on the Word Wide Web.
򐂰 Connected objects, such as cars, appliances, cameras, roadways, and
pipelines will reach one trillion.
1
Virtualization 2.0: The Next Phase in Customer Adoption. Doc. 204904 DC, Dec. 2006
2
Jan. 2008, IDC
3
The data center power and cooling challenge. Gartner, Nov. 2007
Harnessing new technologies
If you are spending most of the time in day-to-day operations, it is difficult to
evaluate and leverage new technologies available that can streamline your IT
operations and help keep a company competitive and profitable. Yet the rate of
technology adoption around us is moving at breakneck speed, and much of it is
disrupting the infrastructure status quo.
Increasing speed and availability of network bandwidth is creating new

opportunities to integrate services across the web and re-centralize distributed IT
resources. Access to trusted information, real-time data and analytics will soon
become basic expectation. Driven by the expanding processing power of
multi-core and speciality processor-based systems, supercomputer power will be
available to the masses. Additionally, it will require systems, data, applications
and networks that are always available, secure and resilient.
Further, the proliferation of data sources, Radio Frequency Identification (RFID)

and mobile devices, unified communications, Service Oriented Architecture
(SOA), Web 2.0, and technologies like mashups and Extensible Markup
Language (XML) create opportunities for new types of business solutions. In fact,
the advancements in technology that are driving change can be seen in the new
emerging types of data centers, such as Internet and Web 2.0, which are
broadening the available options for connecting, securing and managing
business processes.
Ultimately, all of these new innovations need to play an important role in the
enterprise data center.
Evolving business models

The Internet has gone beyond a research, entertainment, or commerce platform.
It is now a platform for collaboration and networking, and has given rise to means
of communication that we never thought possible just a few years ago.
For example:
򐂰 Google’s implementation of their MapReduce method is an effective way to
support Dynamic Infrastructures.
򐂰 The delivery of standardized applications by the Internet using Cloud
Computing is bringing a new model to the market.
Furthermore, mobile devices now give us the ability to transport information, or

access it online, nearly anywhere.

Today, the people at the heart of this technology acceleration, “Generation Y,”
cannot imagine a world without the Internet. They are the ones entering the
workforce in droves, and they are a highly attractive and sought-after consumer
segment. Because of this, business models are no longer limited to
business-to-business or business-to-consumer. Instead, these new generations
of technology, as well as people, have created a bi-directional business model
that spreads influential content from consumer-to-consumer to
communities-to-business.
Today, the power of information, and the sharing of that information, rests firmly
in the hands of the end user while real-time data tracking and integration will
become the norm.
1.5 Virtualization and consolidation

With virtualization, the landscape is changing in the data center. Consolidation
began as a trend towards centralizing all the scattered IT assets of an enterprise
for better cost control, operational optimization and efficiency. Virtualization
introduced an abstraction layer between the hardware and the software, allowing
enterprises to consolidate even further, thus getting the most out of each physical
server platform in the data center by running multiple servers on it.
The leading edge of virtualization is beginning to enable dramatic shifts in the

way the data center infrastructures are designed, managed, sourced, and
delivered. The emerging Dynamic Infrastructures, such as cloud, call for elastic
scaling; automated provisioning; payment only for compute, storage, and
network capacity used; workload mobility attributes that drive new and
challenging requirements for the Data Center Network (DCN).
A virtualized entity provides flexibility and on demand service attributes to

changing business needs. Virtualization is furthermore a trigger for increased
standardization, consolidation, automation and infrastructure simplification.
Improving resource utilization through virtualizing IT infrastructure is becoming a

priority for many enterprises. In historical computing and network models, the
software applications or services to a specific operating system (OS) were
inextricably bound and has been developed to run on a particular hardware. By
contrast, virtualization decouples these components, making them available from
a common resource pool to improve management and utilization. In this respect,
virtualization prevents IT departments from having to worry about the particular
hardware or software platforms installed as they deploy additional services. The
decoupling and optimization of these components is possible whether you are
virtualizing servers, desktops, applications, storage devices, or networks.
IT departments require special virtualization software, firmware or a third-party
service that makes use of virtualization software or firmware in order to virtualize
some or all of a computing infrastructure’s resources. This software/firmware
component, called the hypervisor or the virtualization layer, as shown in
Figure 1-3, performs the mapping between virtual and physical resources. It is
what enables the various resources to be decoupled, then aggregated and
dispensed, irrespective of the underlaying hardware and, in some cases, the
software OS. The virtualization reduces complexity and management overhead
by creating large pools of like resources that are managed as server ensembles.
These server ensembles:

򐂰 Scale from a few to many thousands of virtual or physical nodes
򐂰 Reduce management complexity with integrated virtualization, management,
and security software
򐂰 Are workload optimized for maximum performance and efficiency
Mobility
SW SW SW
OS OS OS
Software
Operating System Virtual Server Virtual Server Virtual Server
Virtualization
Optimized for ….
• Availability
Compute Memory Storage Network Compute Memory Storage Network • Performance
• Power
Servers Virtual Servers Server Ensembles
IBM Internal Use Only © Copyright IBM Corporation 2008
Figure 1-3 Virtualization effect
Historically, there has been a 1:1 ratio of server to application, This has left many
CPU cycles sitting unused much of the time, dedicated to a particular application
even when there are no requests in progress for that application. Now, with these
virtualization capabilities, we can run more than one OS and application or
service on a single physical server.

If the appropriate resource allocation tools are in place, the hardware resources
can be applied, dynamically to whichever application needs them.
This virtualization effect delivers several cost and productive benefits.

򐂰 Lower expenses:
The reduced number of physical machines needed to get the job done
naturally lowers capital costs. Furthermore, it also decreases operating
expenses by leaving IT departments with fewer physical devices to manage.
In addition, consolidating more services into fewer pieces of hardware
consumes less energy overall and requires less floor space. Energy savings
can also the achieved with the migration of running applications to another
physical server during non-peak hours.
The workloads can be consolidated and excess servers can be powered off
as shown in Figure 1-4.
Figure 1-4 Energy savings with consolidated workloads
򐂰 Business continuity:
Instead of requiring a 1:1 ratio of primary device to backup device in addition
to the 1:1 software-to-hardware ration described earlier, in the virtualized
environment, multiple servers can fail over to a set of backup servers. This
allows a many-to-one backup configuration ratio, which increases service
availability. An additional example is the decoupling of software applications,
operating systems and hardware platforms, where fewer redundant physical
devices are needed to serve primary machines.
򐂰 However, in situations where spike computing is needed, the workloads can
be redistributed onto multiple physical servers without service interruption as
shown in Figure 1-5.
Figure 1-5 Redistribution of workloads onto multiple physical servers
򐂰 High availability:
Virtual devices are completely isolated and decoupled from each other, as
though they were running on different hardware. With features like VMotion,
Live Partition Mobility (LPM) and Live Application Mobility (LAM), planned
outages for hardware/firmware maintenance and upgrades can be a thing of
the past.
Figure 1-6 shows how partitions can be relocated from one server to another
and moved back when the maintenance is complete during planned
maintenance. In other words, changes can be made in the production
environment without having to schedule downtime.
Figure 1-6 Relocation of virtual servers during planned maintenance

򐂰 Fast Installation:
Virtualization allows “dark data center” infrastructures to be built which
enables an on-demand utility. In other words, virtual devices allow for much
faster installation of new server applications or software services, because
the IT department no longer has to purchase additional equipment that can
take days or weeks to order, arrive, install and configure. Any service
establishment is just a set of commands away, true for servers, storage, but
also the network. This whole process generally consists of clicking on an
existing image, copying and pasting which slashes the setup times to minutes
instead of day or weeks. Increase of the resource pool is only needed when
predefined thresholds are reached.
򐂰 Corporate governance:
Centralization also provides greater control over resources, which is key to
complying with government regulations and ensuring business continuity.
Regulations such as the Health Insurance Portability & Accountability Act
(HIPAA), the Sarbanes-Oxley Act (SOX), Gramm-Leach-Bliley Act (GLBA),
Regulation NMS, and BASEL II require that companies protect their critical
applications as well as customer (or patient) data, control access to that data,
and prove how they have done so. Providing these protections is easier if
applications, data, and resources are centralized in a small number of
locations rather than highly distributed.
1.5.1 Server virtualization

Server virtualization is a method of abstracting the operating system from the
hardware platform. This allows multiple operating systems or multiple instances
of the same operating system to coexist on one CPU. A “hypervisor”, also called
a virtual machine monitor (VMM) is inserted between the operating system and
the hardware to achieve this separation. These operating systems are called
“guests” or “guest OSs.” The hypervisor provides hardware emulation to the
guest operating systems. It also manages allocation of hardware resources
between operating systems.
The hypervisor layer is installed on top of the hardware. This hypervisor
manages access to hardware resources. The virtual servers (guest systems) are
then installed on top of the hypervisor, enabling each operating system of the
guest system to access necessary resources as needed; see Figure 1-7. This
solution is also sometimes referred to as “bare-metal” virtualization. This
approach allows the guest operating system to be unaware that it is operating in
a virtual environment and no modification of the operating system is required.
Figure 1-7 Server virtualization architecture
1.5.2 The concept of virtual servers

A virtual server is an emulated piece of system hardware and consists of
memory, CPU cycles and assigned storage. These resources appear like real
even are completely virtualized. The true representation is done in the
scheduling hypervisor, which controls the execution, accepts and executes
privileged instructions and secures the storage boundaries.
Historically, the network has consisted of a separate Network Interface Card

(NIC) per server or a pair of separate NICs for redundancy. The connection to
this NIC from the upstream network device was a single Ethernet link at speeds
of 10/100/1000 Mbps. As virtualization appeared and multiple operating systems
lived in one server, standards had to be developed to manage network
connectivity for guest operating systems. The hypervisor is responsible for
creating and managing the virtual network components of the solution.
Hypervisors are capable of different levels of network virtualization. These levels
can generally be grouped as NIC sharing and virtual switching (vNIC).

NIC sharing
The most basic of the new connectivity standards simply assigned an OS to
share available network resources. In its most basic format, each operating
system has to be assigned manually to each NIC in the platform.
Logical NIC sharing allows each operating system to send packets to a single
physical NIC. Each operating system has its own IP address. The server
manager software generally has an additional IP address for configuration and
management. A requirement of this solution is that all guest OSs have to be in
the same Layer 2 domain (subnet) with each guest OS assigned an IP address
and a MAC address. Because the number of guest OSs that can live on one
platform was relatively small, the MAC address can be a modified version of the
NIC’s burned-in MAC address, and the IP addresses consisted of a small block of
addresses in the same IP subnet. One additional IP address was used for the
management console of the platform.
Features to manage QoS and load balancing to the physical NIC from the guest
OSs were limited. In addition as shown in Figure 1-8, any traffic from Guest OS1
destined to Guest OS2 traveled out to a connected switch and then returned
along the same physical connection. This had the potential of adding extra load
on the Ethernet connection.
Figure 1-8 NIC sharing
vNIC: Virtual switching
The advent of virtual NIC (vNIC) technology enables each server to have a
virtual NIC that connects to a virtual switch (VSWITCH). This approach allows
each operating system to exist in a separate Layer 2 domain. The connection
between the virtual switch and the physical NIC then becomes an 802.1q trunk.
The physical connection between the physical NIC and the physical switch is also
an 802.1q trunk, as shown in Figure 1-9.
Figure 1-9 Virtual switching
A Layer 3 implementation of this feature allows traffic destined for a server that
resides on the same platform to be routed between VLANs totally within the host
platform, and avoids the traffic traversing the Ethernet connection both outbound
and inbound.
A Layer 2 implementation of this feature effectively makes the physical Layer 3

switch or router a “Switch-on-a-stick” or a “One-armed” router. The traffic from
one guest OS destined for a second guest OS on that platform traverses the
physical NIC and the Ethernet twice.

The challenge presented to the network architecture is that now we have a mix of
virtual and physical devices in our infrastructure. We have effectively moved our
traditional access layer to the virtual realm. This implies that a virtual NIC (vNIC)
and a virtual switch all have the same access controls, QoS capabilities,
monitoring, and other features that are normally resident and required on
access-level physical devices. Also, the virtual and physical elements might not
be manageable from the same management platforms, which adds complexity to
network management.
Troubleshooting network outages becomes more difficult to manage when there

is a mixture of virtual and physical devices in the network. In any network
architecture that employs virtual elements, methods will have to be employed to
enable efficient monitoring and management of the LAN.
In the chapters that follow, we discuss how the IBM b-type networking portfolio
helps to meet the needs of the business.
2
Chapter 2. Product introduction

In this chapter we discuss the IBM Networking b-type family of IBM networking
products. We describe various products of the b-type family, including their
functions and components. We also explain the IBM versus Brocade naming
convention.
We cover the following products:

򐂰 IBM m-series Ethernet/IP Routers
򐂰 IBM r-series Ethernet Switches
򐂰 IBM x-series Ethernet Switches
򐂰 IBM c-series Ethernet Switches
򐂰 IBM s-series Ethernet Switches
򐂰 IBM g-series Ethernet access switches

The b-type family is shown in Figure 2-1.
Figure 2-1 IBM b-type Data Center Networking family
2.1 Product overview
In the sections that follow, we describe the IBM Networking b-type family of IBM
networking products. For the most up to date information, see the website:
http://www-03.ibm.com/systems/networking/hardware/ethernet/b-type/
2.1.1 Product features

IBM Networking b-type Ethernet routers and switches provide all the functionality
required to build an efficient dynamic infrastructure and transform physical and
digital assets into more relevant services. The b-type family was engineered to
provide performance, better manage risk, improve service, and reduce cost from
the end user to the server connectivity:
򐂰 IBM m-series Ethernet/IP Routers are designed to meet the most demanding
requirements of data center core and border, as well as enterprise campus
backbone and border layer solutions.
򐂰 IBM r-series Ethernet Switches are designed to meet the most demanding
requirements of the data center aggregation and distribution, as well as
campus backbone and border layer solutions.
򐂰 IBM x-series Ethernet Switches are designed to meet the most demanding
requirements of the data center Top-of-Rack (TOR) server access,
cost-efficient aggregation, and high performance computing solutions.
򐂰 IBM c-series Ethernet Switches are designed to meet the most demanding
requirements of data center TOR server access, metro, and border edge layer
solutions.
򐂰 IBM s-series Ethernet Switches are designed to meet critical cost
requirements of data center aggregation and enterprise campus Power over
Ethernet (PoE) access and distribution layer solutions.
򐂰 IBM g-series Ethernet Switches are designed to meet critical cost
requirements of enterprise campus stackable and PoE converged access
layer solutions.
2.1.2 Naming convention: IBM versus Brocade

In Table 2-1 we list the IBM Networking b-type family products, along with their
equivalent Brocade names. Note that we reference the products by their
standard IBM names as well as the IBM type/model throughout this text.
Chapter 2. Product introduction 33

Table 2-1 IBM Networking b-type family
IBM family Brocade IBM product IBM machine Brocade
name family name name type and name
model
IBM m-series Brocade IBM 4003-M04 Brocade

Ethernet/IP NetIron MLX Ethernet/IP NetIron MLX-4
Routers Router B04M

Ethernet/IP NetIron MLX Ethernet/IP NetIron MLX-8
Routers Router B08M

Ethernet/IP NetIron MLX Ethernet/IP NetIron
Routers Router B16M MLX-16

Ethernet/IP NetIron MLX Ethernet/IP NetIron
Routers Router B32M MLX-32
IBM r-series Brocade IBM Ethernet 4003-R04 Brocade

Ethernet BigIron RX Switch B04R BigIron RX-4
Switches

Switches

Switches
IBM x-series Brocade IBM Ethernet 4002-X2A Brocade

Ethernet TurboIron Switch B24X 4002X2A TurboIron 24X
Switches
IBM c-series Brocade IBM Ethernet 4002-C2A Brocade

Ethernet NetIron CES Switch B24C 4002C2A NetIron CES
Switches (C) 2024C
IBM c-series Brocade IBM Ethernet 4002-C2B Brocade

Ethernet NetIron CES Switch B24C 4002C2B NetIron CES
Switches (F) 2024F
IBM family Brocade IBM product IBM machine Brocade
name family name name type and name
model

Switches (C) 2048C

Switches (F) 2048F

Switches (C) 2048CX

Switches (F) 2048FX
IBM s-series Brocade IBM Ethernet 4003-S08 Brocade

Ethernet FastIron SX Switch B08S FastIron SX
Switches 800
IBM s-series Brocade IBM Ethernet 4003-S16 Brocade

Ethernet FastIron SX Switch B16S FastIron SX
Switches 1600
IBM g-series Brocade IBM Ethernet 4002-G4A Brocade

Ethernet FastIron Switch B48G 4002-G4A FastIron GS
Switches GS/GS-STK 648P
IBM g-series Brocade IBM Ethernet 4002-G5A Brocade

Ethernet FastIron Switch B50G 4002-G5A FastIron GS
Switches GS/GS-STK 648P-STK

2.2 Product description
In this section we provide descriptions of various products and their components.
2.2.1 IBM m-series Ethernet/IP Routers

IBM m-series Ethernet/IP Routers are designed to enable reliable converged
infrastructures and support mission-critical applications. The m-series features
an advanced N+1 redundant switch fabric architecture designed for very high
availability, even in the case of a switch fabric card failure. The redundant fabric
architecture is complemented by comprehensive hardware redundancy for the
management modules, power supplies, and cooling system.
With superior 1 GbE and 10 GbE port densities, the m-series switching routers
are well suited for large-scale high performance cluster computing. By combining
superior data capacity with ultra-low latency, m-series switching routers can
accelerate application performance in high performance computing clusters,
thereby increasing processing power and productivity.
Following are the m-series key features:

򐂰 High performance IPv4/IPv6/MPLS/L2/VRF-enabled routers
򐂰 State-of-the-art Clos fabric offering up to 7.68 Tbps raw fabric capacity,
3.2 Tbps data capacity, and 100 Gbps FDX per full slot
򐂰 Exceptional density for 10/100/1000 Copper, 100/1000 Fiber, 10 GbE, and
POS interfaces
򐂰 Up to 2 million BGP routes and 512,000 IPv4 routes in hardware (scalability
limits dependent on configured system parameters, system profile selected,
and routing database complexity)
򐂰 Wire-speed edge (PE), core (P) Label Switching Router functions support IP
over MPLS, Virtual Leased Lines (VLLs), Virtual Private LAN Services
(VPLSes), and BGP/MPLS VPNs
򐂰 Comprehensive MPLS traffic engineering based on either OSPF-TE or
IS-IS/TE
򐂰 MPLS Fast Reroute and hot standby paths for highly resilient service delivery
򐂰 Advanced Layer 2 metro switching including Super Aggregated VLANs
(Q-in-Q) and rapid converging MRP, VSRP, RSTP, and MSTP protocols
򐂰 Carrier Ethernet Service Delivery (MEF9 and MEF14 certified) with support
for up to 1 million MAC addresses
򐂰 Comprehensive hardware redundancy with hitless management failover and
hitless software upgrades for Layer 2/Layer 3 with BGP and OSPF graceful
restart
The m-series Ethernet Routers are available in these model configurations:

򐂰 IBM Ethernet/IP Router B04M (4003-M04): 4-slot switching router, 400 Gbps
data capacity, and up to 16 10 GbE and 128 1 GbE ports per system
򐂰 IBM Ethernet/IP Router B08M (4003-M08): 8-slot switching router, 800 Gbps
򐂰 IBM Ethernet/IP Router B16M (4003-M16): 16-slot switching router, 1.6 Tbps
򐂰 IBM Ethernet/IP Router B32M (4003-M32): 32-slot switching router, 3.2 Tbps
data capacity, and up to 128 10 GbE and 1,536 1 GbE ports per system.
Al four models are shown in Figure 2-2.
Figure 2-2 IBM m-series Ethernet/IP Routers
All m-series models can only be installed in the rack. Non-rack installation is not
supported.
Operating system
All m-series systems run Brocade Multi-Service IronWare R4.0.00 or higher
operating system.

Exceptional density
The m-series is scalable to one of the industry leading densities for the occupied
space of 128 10 Gigabit Ethernet ports or 1,536 Gigabit Ethernet ports in a
single chassis. The m-series also supports up to or 64 OC-192 or 256 OC-12/48
ports in a single chassis.
Scalable Clos fabric architecture

The m-series Ethernet/IP Routers are using a Clos fabric architecture that
provides a high level of scalability, redundancy and performance. As shown in
Figure 2-3, there are multiple switch fabric modules (SFMs) in the system. A
switch fabric module has multiple fabric elements, each of which has multiple
connections to every interface slot.
Figure 2-3 m-series Clos architecture
Note: The Clos architecture is named after the ground breaking work by
researcher Charles Clos. The Clos architecture has been the subject of much
research over several years. A multi-stage Clos architecture has been
mathematically proven to be non-blocking. The resiliency of this architecture
makes it the ideal building block in the design of high availability, high
performance systems.
The Clos architecture uses data striping technology to ensure optimal utilization
of fabric interconnects. This mechanism always distributes the load equally
across all available links between the input and output interface modules. By
using fixed-size cells to transport packets across the switch fabric, the m-series
switching architecture ensures predictable performance with very low and
deterministic latency and jitter for any packet size. The presence of multiple
switching paths between the input and output interface modules also provides an
additional level of redundancy.
Here are several advantages of a Clos architecture over a traditional architecture:

򐂰 Common architecture across the product family, because the same fabric
elements are used on all chassis of the m-series product range. This
demonstrates the superior scalability of the architecture from a small 4-slot
system to a large 16-slot system.
򐂰 No head-of-line blocking at any point irrespective of traffic pattern, packet
size, or type of traffic.
򐂰 Optimal utilization of switch fabric resources at all times. The data striping
capability ensures that there is fair utilization of the switch fabric elements at
all times without overloading of any single switch fabric element.
򐂰 “Intra-SFM” redundancy: An SFM can withstand the failure of some of the
fabric elements and yet continue to operate with the remaining fabric
elements. This unique capability provides a very high level of redundancy
even within an SFM.
򐂰 Exceptional high availability: The m-series SFMs have (N+1) redundancy,
allowing m-series models to gracefully adapt to the failure of multiple switch
fabric elements. Moreover, because there are multiple fabric elements within
an SFM, the failure of a fabric element does not bring down the entire SFM.
High availability
Both the hardware and software architecture of the m-series are designed to
ensure very high Mean Time Between Failures (MTBF) and low Mean Time To
Repair (MTTR). Cable management and module insertion on the same side of
the chassis allows ease of serviceability when a failed module needs to be
replaced or a new module needs to be inserted.

The ability to handle the failure of not only an SFM, but also elements within an
SFM, ensures a robust, redundant system ideal for non-stop operation. The
overall system redundancy is further bolstered by redundancy in other active
system components such as power supplies, fans, and management modules.
The passive backplane on the m-series chassis increases the reliability of the
system.
The modular architecture of the Multi-Service IronWare operating system has

several distinguishing characteristics that differentiate it from legacy operating
systems that run on routers:
򐂰 Industry-leading cold restart time of less than a minute
򐂰 Support for hitless software upgrade
򐂰 Hitless Layer 2 and Layer 3 failovers
򐂰 Sub-second switchover to the standby management module if a
communication failure occurs between active and standby management
modules.
Distributed queuing for fine-grained Quality of Service (QoS)

A unique characteristic of the m-series is the use of a distributed queuing
scheme that maximizes the utilization of buffers across the whole system during
congestion. This scheme marries the benefits of input-side buffering (Virtual
Output Queuing) with those of an output port driven scheduling mechanism.
Input queuing using virtual output queues ensures that bursty traffic from one
port does not hog too many buffers on an output port. An output-port driven
scheduling scheme ensures that packets are sent to the output port only when
the port is ready to transmit a packet.
Each interface module maintains multiple, distinct priority queues to every output
port on the system. Packets are “pulled” by the outbound interface module when
the output port is ready to send a packet. Switch fabric messaging is used to
ensure that there is tight coupling between the two stages. This closed loop
feedback between the input and output stages ensures that no information is lost
between the two stages. The use of such “virtual output queues” maximizes the
efficiency of the system by storing packets on the input module until the output
port is ready to transmit the packet. In all, there are 512k virtual output queues on
the m-series chassis.
Congestion avoidance is handled by applying Weighted Random Early Discard

(WRED) or taildrop policy. On the output ports, a variety of scheduling
mechanisms such as strict priority, weighted fair queuing or a combination of
these approaches can be applied to deliver tiered QoS guarantees for several
applications.
The QoS subsystem on the m-series has extensive classification and packet
marking capabilities that can be configured:
򐂰 Prioritization based on Layer 2 (802.1p), TOS, DSCP, or MPLS EXP bit of an
input packet
򐂰 Mapping of packet/frame priority from ingress encapsulation to Egress
encapsulation
򐂰 Remarking of a packet’s priority based on the result of the 2-rate, 3-color
policer
Traffic policers and ACLs

All interface modules support a large number of both inbound and outbound
traffic policers in hardware. Up to 512k traffic policers can be concurrently
configured in the system. The 2-rate, 3-color policers meter subscriber flows by
classifying them into compliant (CIR) rates or excess (EIR) rates. This capability
is especially useful when mixing traffic flows with different characteristics on the
same port.
For security purposes, both input ACLs (Access Control Lists) and output ACLs
are supported by the system on every interface module. Up to 114,688 input ACL
entries and 131,072 output ACL entries for ACL rules can be applied to local
interfaces on every interface module.
Denial of Service (DoS) guards

Layer 2 services such as VPLS require support for efficient replication of packets
to the entire broadcast domain. For example, traditional architectures handle
Ethernet frames with unknown MAC address by sending them to a processor to
replicate the packet to the broadcast domain. The involvement of the CPU makes
the system vulnerable to a potential denial of service attack. In contrast, the
m-series handles this scenario very efficiently by performing the flooding in
hardware.
The m-series has a dedicated out-of-band management link between each

interface module and the management module to isolate control traffic from data
traffic. Multiple queues to the management module allow different types of
control traffic to be prioritized. These capabilities, together with secure
management and ACLs, are immensely useful in protecting the system from
potential DoS attacks in the network.

Spatial multicast support
The m-series architecture has native support for spatial multicast, a critical
requirement for offering video services in a network. The input interface module
sends one copy of an incoming multicast packet to the switch fabric. The switch
fabric then replicates the packet within itself to multiple output interface modules
in the system, which in turn replicate the multicast packet to the destination ports.
Industry-leading multi-service feature set

In contrast to some systems that limit the capabilities that can be concurrently
enabled, the m-series architecture allows both Layer 2 and Layer 3 services to be
offered on the same device and the same port concurrently. This ability gives
unprecedented flexibility to the service provider in tailoring the system to meet
end user needs.
Scalability
The m-series of routers is a highly scalable family of routers. Some examples of
its industry-leading scalability include:
򐂰 Up to 4k VPLS/VLL instances and up to 256k VPLS MAC addresses
򐂰 Support for 4094 VLANs and up to 1 million MAC addresses
򐂰 512k IPv4 routes in hardware FIB
򐂰 2 million BGP routes
򐂰 400 BGP/MPLS VPNs and up to 256k VPN routes
Investment protection
The m-series chassis uses a half slot design for interface modules. The divider
between two adjacent half slots can be removed in future to combine them into a
full slot. All chassis have 100 Gbps of full-duplex bandwidth per full slot. In
addition, with the ability to offer multiple services including dual-stack IPv4/IPv6
and MPLS services in hardware, the m-series offers excellent investment
protection.
Physical and thermal parameters

The physical and thermal parameters are shown in Table 2-2.
Table 2-2 m-series physical and thermal parameters

Component IBM Ethernet IBM Ethernet IBM Ethernet IBM Ethernet
Router B04M Router B08M Router B16M Router B32M
Chassis type Modular 4-slot Modular 8-slot Modular 16-slot Modular 32-slot
chassis chassis chassis chassis
H/W/D (cm) 17.68 x 44.32 x 31.01 x 44.32 x 62.15 x 44.32 x 146.58 x 44.32 x
57.15 57.15 64.77 61.21
Rack units (RUs) 4 7 14 33
Max. weight 35 kg 60 kg 107 kg 217 kg
Max. power draw 1289 W 2560 W 5191 W 10781 W
Op. temperature 0 - 40 °C 0 - 40 °C 0 - 40 °C 0 - 40 °C
(°C)
Heat emission 4389 8737 17717 36787

(BTU/hr)
Airflow Side-to-side Side-to-side Front-to-back Front-to-back
Fan assemblies 1 1 3 10
The cooling system of the m-series is configured as follows:

򐂰 IBM Ethernet Router B04M (4003-M04): Is equipped with a fan module
containing two 4-speed fans and two fan controllers to support redundancy.
򐂰 IBM Ethernet Router B08M (4003-M08): Is equipped with a fan module
containing four 4-speed fans and four fan controllers to support redundancy.
򐂰 IBM Ethernet Router B16M (4003-M16): Is equipped with three fan
assemblies. The fan tray located in the lower front of the chassis contains six
4-speed fans. There are two fan assemblies located in the rear of the chassis.
򐂰 IBM Ethernet Router B32M (4003-M32): Is equipped with ten fan assemblies
all located at the back of the chassis. Two smaller fan assemblies are located
at the bottom to cool the power supplies while the remaining eight cool the
rest of the chassis.
All the fans are hot swappable and self adjusting based on sensor readings.
Power parameters
All m-series models provide redundant and removable power supplies with AC
power options. Power supplies can be exchanged between B08M and B16M
models but not between the B04M or B32M. None of the m-series models
provide the Power over Ethernet (PoE) option.

The power parameters are shown in Table 2-3.
Table 2-3 m-series power parameters

Power supplies AC - 100-120V, AC - 100-120V, AC - 100-120V, AC - 200-240V
200-240V 200-240V 200-240V DC - -60. -40V
1200 W 1200 W 1200 W 2400 W
Power supply bays 3 4 8 8
Number of power 1 2 4 4
supply bays
required for fully
loaded chassis
Slots, ports, memory, and performance

All m-series models provide Store & Forward switching engine. Each model has
two management module slots that are exchangeable between all m-series
models. Fabric modules can be exchanged between B08M and B16M models
but not between the B04M and B32M. All models have a passive backplane.
All m-series models have maximum 1GB RAM and FLASH of 32MB.
The number of slots, ports, and performance metrics are shown in Table 2-4.
Table 2-4 Slots, ports and performance metrics

Slots 9 13 22 42
Payload slots 4 8 16 32
Max. number of 3 3 4 8
slots for fabric
modules
Min. number of 2 2 3 8
switch fabric
modules required
for fully-loaded
chassis at line-rate
10/100/1000 48 48 48 48
copper ports per
module (MRJ21)
10/100/1000 max. 192 384 768 1536
copper ports per
system (MRJ21)
10/100/1000 20 20 20 20
copper ports per
module (RJ-45)
10/100/1000 max. 80 160 320 640

copper ports per
system (RJ-45)
1 GbE SFP ports 20 20 20 20

per module
1 GbE max. SFP 80 160 320 640

ports per module
10 GbE ports per 4 4 4 4

module
10 GbE max. ports 16 32 64 128

per module
POS-OC12 32 64 128 256
POS-OC48 32 64 128 256
POS-OC192 8 16 32 64
Fabric switching 960 Gbps 1.92 Tbps 3.84 Tbps 7.68 Tbps
capacity
Data switching 400 Gbps 800 Gbps 1.6 Tbps 3.2 Tbps
capacity
L2 throughput 240 Mbps 480 Mbps 960 Mbps ~2 bpps
L3 throughput 240 Mbps 480 Mbps 960 Mbps ~2 bpps
Wirespeed 240 Mbps 480 Mbps 960 Mbps ~2 bpps

forwarding rate
All slots have half-slot line module design. Slots have removable dividers to
support future full slot modules.

With such design, m-series models are providing exceptional density of usable
ports.
Each half-slot provides 50 Gbps full-duplex user bandwidth. With full-slot

configurations, 100 Gbps full-duplex user bandwidth will be available per slot.
All modules are hot-swappable and do not require power-off to be replaced.
Interface modules
Table 2-5 shows which modules can be installed in the m-series chassis payload
slots.
Table 2-5 Interface modules

Speed Number of ports Connector type
10/100/1000MbE 48 MRJ21
10/100/1000MbE 20 RJ45
100/1000MbE 20 SFP
10 GbE 2 XFP
10 GbE 4 XFP
OC-192 POS/SDH 2 SFP
OC-12/48 POS/SDH 2 SFP
Interface types
Following are the available interface types:
򐂰 10/100/1000 Mbps Ethernet port with MRJ21 connector
򐂰 10/100/1000 Mbps Ethernet port with RJ45 connector
򐂰 100/1000 Mbps Ethernet port with SFP connector
򐂰 10 Gbps Ethernet port with XFP connector
򐂰 OC-192 (STM-64) port with SFP connector
򐂰 OC-12/48 (STM-4/STM-16) port with SFP connector
Transceivers
In Table 2-6, Table 2-7, and Table 2-8, we show the available transceivers to be
used in interface modules.
Table 2-6 Transceivers for 100/1000 Mbps Ethernet ports

Type Connector Speed Distance
1000BASE-T SFP RJ-45 1Gbps Up to 100 m with

copper CAT5 or higher
1000BASE-SX 850 LC 1 Gbps Up to 550 m over

nm SFP optics multi-mode fiber
1000BASE-LX LC 1 Gbps Up to 10 km over

1310 nm SFP single-mode fiber
optics
1000BASE-LHA LC 1 Gbps Up to 70 km over

optics
100BASE-FX 1310 LC 100 Mbps Up to 2 km over

Table 2-7 Transceivers for 10 Gbps Ethernet ports

10GBASE-SR 850 LC 10Gbps Up to 300 m over

nm XFP optics multi-mode fiber
10GBASE-LR LC 10Gbps Up to 10 km over

1310 nm XFP single-mode fiber
optics
10GBASE-ER LC 10Gbps Up to 40 km over

optics
10GBASE-CX4 LC 10Gbps Up to 15 m over

XFP copper CX4 grade copper

Table 2-8 Transceivers for the OC-12/48 (STM-4/STM-16)
POS OC-12 LC OC-12 (STM-4) Up to 500 m over

(STM-4) SFP multi-mode fiber
optics
POS-12 (STM-4) LC OC-12 (STM-4) Up to 15 km over

SR-1/IR-1 SFP single-mode fiber
optics

LR-1 SFP optics single-mode fiber


SR-1 SFP optics single-mode fiber

IR-1 SFP optics single-mode fiber


Table 2-9 Transceivers for OC-192 (STM-64)


SR-1 SFP optic single-mode fiber

IR-2 optic single-mode fiber

LR-2 optic single-mode fiber
Cables for MRJ21 must be ordered separately. One distributor of such cables is
Tyco Electronics:
http://www.ampnetconnect.com/brocade/
Services, protocols, and standards
IBM m-series Ethernet Routers support these services, protocols, and
standards.
The suite of Layer 2 Metro Ethernet technologies with advanced services is

based on the following features:
򐂰 IEEE 802.1Q
򐂰 Rapid Spanning Tree Protocol (RSTP)
򐂰 Metro Ring Protocol (MRP)
򐂰 Virtual Switch Redundancy Protocol (VSRP)
򐂰 Up to 1 million MAC addresses per system
MPLS complements Layer 2 Metro Ethernet capabilities with these features:

򐂰 MPLS-TE
򐂰 Fast Reroute (FRR)
򐂰 MPLS Virtual Leased Line (VLL)
򐂰 Virtual Private LAN Service (VPLS)
򐂰 Border Gateway Protocol (BGP)
򐂰 MPLS VPNs
򐂰 MPLS L3VPNs
In addition to a rich set of Layer 2 and MPLS based capabilities, routers provide
creation of scalable resilient services with the Metro Ethernet Forum (MEF)
specifications for these features:
򐂰 Ethernet Private Line (EPL)
򐂰 Ethernet Virtual Private Line (EVPL)
򐂰 Ethernet LAN (E-LAN)
For Internet edge/aggregation routing, these capabilities are provided:

򐂰 Dual stack IPv4/IPv6 wire-speed routing performance
򐂰 Up to 512000 IPv4 routes in the hardware Forwarding Information Base (FIB)
򐂰 Up to 1 million BGP routes in the BGP Routing Information Base (RIB)
򐂰 RIPv1/v2
򐂰 RIPng
򐂰 OSPFv2/v3
򐂰 IS-IS
򐂰 IS-IS for IPv6
򐂰 Foundry Direct Routing (FDR)
򐂰 BGP-4
򐂰 BGP-4+
To enable use in advanced converged enterprise backbones, providing reliable

transport of Voice over IP (VOIP), video services and mission-critical data, the
following capabilities are utilized:

򐂰 Quality of Service (QoS)
򐂰 Wire-speed unicast/multicast routing for IPv4 and IPv6
򐂰 IPv4 multicasting routing protocols:
– IGMPv1/v2/v3
– PIM-DM/-SM/-SSM
– MSDP
– MLDv1/v2
– PIM-SM/-SSM
– Anycast RP
Provided IPv4 and IPv6 multicast protocols help to make the most efficient use of
the network bandwidth.
Multi-VRF virtual routing allows enterprises to create multiple security zones and
simplified VPNs for different applications and business units while streamlining
overall network management.
The routers also provide intrinsic wire-speed sFlow scalable network-wide

monitoring of flows for enhanced security management by malicious traffic
detection and intrusion detection as well as for proactive management of network
bandwidth through traffic trend analysis and capacity upgrade planning.
For the whole list of supported standard and RFC compliance, see the website:
We describe some of the service implementations in more detail in the following

sections.
Link aggregation
Following are the main characteristics of link aggregation implementation on
m-series models:
򐂰 802.3ad/LCAP support
򐂰 256 servers trunks supported
򐂰 Up to 32 ports per trunk group
򐂰 Cross module trunking
򐂰 Ports in the group do not need to be physically consecutive
򐂰 Tagged ports support in trunk group
򐂰 Compatibility with Cisco EtherChannel
򐂰 Ports can be dynamically added or deleted from the group, except for the
primary port
Layer 2 switching
Following are the main characteristics of Layer 2 switching implementation on
m-series models:
򐂰 Up to 1 million MAC addresses per system
򐂰 Up to 288000 MAC entries per network processor
򐂰 9216 byte jumbo frames
򐂰 L2 MAC filtering
򐂰 MAC authentication
򐂰 MAC port security
򐂰 4090 VLANs
򐂰 Port and protocol based VLANs
򐂰 VLAN tagging:
– 802.1q
– Dual Mode
– SAV/Q-in-Q
򐂰 STP (Spanning Tree Protocol) per VLAN
򐂰 Compatibility with CISCO PVST (Per VLAN Spanning Tree)
򐂰 STP fast forwarding (fast port, fast uplink), root guard, Bridge Protocol Data
Unit (BPDU) guard
򐂰 Up to 128 Spanning-Tree instances
򐂰 Rapid STP (802.1w compatible)
򐂰 MSTP (Multiple Spanning Tree Protocol) (802.1s)
򐂰 MRP Phase I&II
򐂰 Q-in-Q/SAV support with unique tag-type per port
򐂰 VSRP (Virtual Switch Redundancy Protocol)
򐂰 Up to 255 topology groups
򐂰 Hitless OS with 802.1ag (Connectivity Fault Management) and UDLD
(Uni-directional Link Detection)
Multicast
Following are the main characteristics of multicast implementation on m-series
models:
򐂰 IGMP/IGMPv3 (Internet Group Management Protocol)
򐂰 IGMP/IGMPv3 snooping
򐂰 IGMP proxy
򐂰 PIM (Protocol-Independent Multicast) proxy/snooping
򐂰 Multicast routing PIM/DVMRP (Distance Vector Multicast Routing Protocol)
򐂰 Up to 153600 multicast routes
򐂰 IPv4 PIM modes - Sparse, Dense, Source-Specific
򐂰 IPv6 PIM modes - Sparse, Source-Specific
򐂰 Up to 4096 IPv4/IPv6 multicast cache entries

򐂰 Multi-VRF (Virtual Routing and Forwarding) multicast - PIM SM, SSM, DM,
IGMP, Mroute, Anycast RP, MSDP
򐂰 Anycast RP for IPv4/IPv6
2.2.2 IBM r-series Ethernet Switches

The IBM r-series Ethernet Switches are a modular switch series designed to
provide improved service with industry leading levels of performance; reduced
cost with a high density and energy efficient design; and a commonly used
management interface which can leverage existing staff training and experience
to help manage risk when integrating new technology.
Doing this allows network designers to standardize on a single product family for
end-of-row, aggregation, and backbone switching, and is ideal for data center
and enterprise deployment. In addition, the switches, with their high-density and
compact design, are an ideal solution for High-Performance Computing (HPC)
environments and Internet Exchanges and Internet Service Providers (IXPs and
ISPs) where non-blocking, high-density Ethernet switches are needed.
Following are the r-series key features:

򐂰 Powerful suite of unicast and multicast IPv4 and IPv6 protocol support
򐂰 Interchangeable half-height line modules reduce sparing costs, TCO, and
provide cost-effective modular growth
򐂰 Highly density chassis design supports up to 256 10 GbE or 1,536 wire-speed
1 GbE ports in a single 32-slot chassis
򐂰 High availability design features redundant and hot-pluggable hardware,
hitless Layer 2 software upgrades, and graceful BGP (Border Gateway
Protocol) and OSPF (Open Shortest Path First) restart
򐂰 Advanced non-blocking Clos fabric features adaptive self-routing with graceful
system degradation in the event of two or more module failures
򐂰 End to End Quality of Service (QoS) supported with hardware based
honoring and marking and congestion management
򐂰 Scalable hardware-based IP routing to 512,000 IPv4 routes per line module
򐂰 High-capacity 80 Gbps cross-module link aggregation supports
high-bandwidth inter-switch trunking
򐂰 Embedded sFlow per port supports scalable hardware-based traffic
monitoring across all switch ports without impacting performance
The r-series Ethernet Switches are available in the following model
configurations:
򐂰 IBM Ethernet/IP Router B04R (4003-R04) - 4-slot switching router, 400 Gbps
򐂰 IBM Ethernet/IP Router B08R (4003-R08) - 8-slot switching router, 800 Gbps
򐂰 IBM Ethernet/IP Router B16R (4003-R16) - 16-slot switching router, 1.6 Tbps
All r-series models can only be installed in the rack. Non-rack installation is not
supported.
All three models are shown in Figure 2-4.
Figure 2-4 IBM r-series Ethernet Switches
Operating system
All r-series systems run Brocade Multi-Service IronWare for BigIron RX R2.7.02
or higher operating system.
Exceptional density
The r-series is scalable to one of the industry leading densities for the occupied
space of 256 10 Gigabit Ethernet ports or 768 Gigabit Ethernet ports in a single
chassis.

High availability
Both the hardware and software architecture of the m-series are designed to
ensure very high Mean Time Between Failures (MTBF) and low Mean Time To
Repair (MTTR). Cable management and module insertion on the same side of
the chassis allows ease of serviceability when a failed module needs to be
replaced or a new module needs to be inserted.
The ability to handle the failure of not only an SFM but also elements within an
SFM ensures a robust, redundant system ideal for non-stop operation. The
overall system redundancy is further bolstered by redundancy in other active
system components such as power supplies, fans, and management modules.
The passive backplane on the m-series chassis increases the reliability of the
system.
The modular architecture of the Multi-Service IronWare operating system has

several distinguishing characteristics that differentiate it from legacy operating
systems that run on routers:
򐂰 Industry-leading cold restart time of less than a minute
򐂰 Support for hitless software upgrade
򐂰 Hitless Layer 2 and Layer 3 failovers
򐂰 Sub-second switchover to the standby management module if a
communication failure occurs between active and standby management
modules.
Scalability
The m-series of routers is a highly scalable family of routers. Here are a few
examples of its industry-leading scalability:
򐂰 Support for 4094 VLANs and up to 1 million MAC addresses
򐂰 1 million BGP routes
Investment protection
The r-series chassis uses a half slot design for interface modules. The divider
between two adjacent half slots can be removed in future to combine them into a
full slot. All chassis have 100 Gbps of full-duplex bandwidth per full slot. In
addition, with the ability to offer multiple services including dual-stack IPv4/IPv6
in hardware.
Table 2-10 m-series physical and thermal parameters

Component IBM Ethernet IBM Ethernet IBM Ethernet
Switch B04R Switch B08R Switch B16R
Chassis type Modular 4-slot Modular 8-slot Modular 16-slot

chassis chassis chassis
H/W/D (cm) 17.68 x 44.32 x 31.01 x 44.32 x 62.15 x 44.32 x

57.15 57.15 64.77
Rack units 4 7 14
(RUs)
Max. weight 35 kg 60 kg 107 kg
Max. power 1217 W 2417 W 4905 W

draw
Op. temperature 0 - 40 °C 0 - 40 °C 0 - 40 °C
(°C)
Heat emission 4155 8249 16741

(BTU/hr)
Airflow Side-to-side Side-to-side Front-to-back
Fan assemblies 1 1 3
The cooling system of the m-series is configured as follows:

򐂰 IBM Ethernet Switch B04R (4003-R04): Is equipped with a fan module
containing two 4-speed fans and two fan controllers to support redundancy.
򐂰 IBM Ethernet Switch B04R (4003-R08): Is equipped with a fan module
containing four 4-speed fans and four fan controllers to support redundancy.
򐂰 IBM Ethernet Switch B04R (4003-R16): Is equipped with three fan
assemblies. The fan tray located in the lower front of the chassis contains six
4-speed fans. There are two fan assemblies located in the rear of the chassis.
Power parameters
All r-series models provide redundant and removable power supplies with AC
power options. Power supplies can be exchanged between B08R and B16R
models but not between the B04R. None of the r-series models provide the
Power over Ethernet (PoE) option.

Table 2-11 r-series power parameters

Power supplies AC - 100-120V, AC - 100-120V, AC - 100-120V,
200-240V 200-240V 200-240V
1200 W 1200 W 1200 W
Power supply bays 3 4 8
Number of power 1 2 4
supply bays required
for fully loaded
chassis

All r-series models provide a Store & Forward switching engine. Each model has
two management module slots that are exchangeable between all m-series
models. Fabric modules can be exchanged between B08R and B16R models but
not between the B04R. All models have passive backplane.
The number of slots, ports, and performance metrics are shown in Table 2-12.
Table 2-12 Slots, ports, and performance metrics

Slots 9 13 22
Payload slots 4 8 16
Max. number of 3 3 4
slots for fabric
modules
Min. number of 2 2 3
switch fabric
modules required
for fully-loaded
chassis at line-rate
10/100/1000 48 48 48
copper ports per
module (MRJ21)
10/100/1000 max. 192 384 768

copper ports per
system (MRJ21)
10/100/1000 24 24 24
copper ports per
module (RJ-45)
10/100/1000 max. 96 192 384

copper ports per
system (RJ-45)
1 GbE SFP ports 24 24 24

per module
1 GbE Max. SFP 96 192 384

ports per module
10 GbE ports per 16 16 16

module (SFP+)
10 GbE max. ports 16 128 256

per module (SFP+)
10 GbE ports per 4 4 4

module (XFP)
10 GbE max. ports 16 32 64

per module (XFP)
Fabric switching 960 Gbps 1.92 Tbps 3.84 Tbps

capacity
Data switching 400 Gbps 800 Gbps 1.6 Tbps

capacity
Wirespeed 286 Mbps 571 Mbps 1142 Mbps

forwarding rate
All slots have a half-slot line module design, and the slots have removable
dividers to support future full slot modules.
With such a design, m-series models provide exceptional density of usable ports.
Each half-slot provides 50 Gbps full-duplex user bandwidth. With full-slot

configurations, 100 Gbps full-duplex user bandwidth will be available per slot.
All modules are hot-swappable and do not require a power-off to be replaced.

Interface modules
Table 2-13 shows which modules can be installed in the r-series chassis payload
slots.

Speed Number of ports Connector type
10/100/1000MbE 48 MRJ21
10/100/1000MbE 24 RJ45
100/1000MbE 24 SFP
10 GbE 16 SFP+
10 GbE 4 XFP
Interface types
򐂰 10/100/1000 Mbps Ethernet port with MRJ21 connector
򐂰 10 Gbps Ethernet port with SFP+ connector
Transceivers
In Table 2-14, Table 2-15, and Table 2-16 we show the available transceivers to be
used in interface modules.




optics

optics
Table 2-15 Transceivers for 10 Gbps SFP+ Ethernet ports

10GBASE-SR 850 LC 10 Gbps Up to 300 m over

nm SFP+ optics multi-mode fiber
10GBASE-LR LC 10 Gbps Up to 10 km over

1310 nm SFP+ single-mode fiber
optics
Table 2-16 Transceivers for 10 Gbps XFP Ethernet ports



optics
10GBASE-ER LC 10 Gbps Up to 40 km over

optics
10GBASE-CX4 LC 10 Gbps Up to 15 m over

Cables for the MRJ21 must be ordered separately. One of the distributors of such
cables is Tyco Electronics:
http://www.ampnetconnect.com/brocade/

IBM r-series Ethernet Routers support various services, protocols, and
standards.
The suite of Layer 2 technologies is based on the following standards:

򐂰 IEEE 802.1Q
򐂰 Rapid Spanning Tree Protocol (RSTP)
򐂰 Metro Ring Protocol (MRP)

For advanced data center aggregation or enterprise core, the following
capabilities are provided:
򐂰 Dual stack IPv4/IPv6 wire-speed routing performance
򐂰 Up to 512000 IPv4 routes in the hardware Forwarding Information Base (FIB)
򐂰 Up to 1 million BGP routes in the BGP Routing Information Base (RIB)
򐂰 RIPv1/v2
򐂰 RIPng
򐂰 OSPFv2/v3
򐂰 Foundry Direct Routing (FDR)
򐂰 BGP-4
򐂰 BGP-4+
To enable use in advanced converged enterprise backbones, providing reliable

transport of Voice over IP (VOIP), video services and mission-critical data, the
following capabilities are utilized:
򐂰 Wire-speed unicast/multicast routing for IPv4 and IPv6
– IGMPv1/v2/v3
– PIM-DM/-SM/-SSM
– MSDP
– MLDv1/v2
– PIM-SM/-SSM
– Anycast RP
The IPv4 and IPv6 multicast protocols help to make the most efficient use of the
network bandwidth.
The switches also provide intrinsic wire-speed sFlow scalable network-wide

monitoring of flows for enhanced security management by malicious traffic
detection and intrusion detection as well as for proactive management of network
bandwidth through traffic trend analysis and capacity upgrade planning.
For the whole list of supported standard and RFC compliance, see the website:
In the following sections, we describe various service implementations in more

detail.
Link aggregation
Following are the main characteristics of link aggregation implementation on
m-series models:
򐂰 802.3ad/LCAP support
򐂰 31 servers trunks supported
򐂰 Up to 8 ports per trunk group
򐂰 Cross module trunking
򐂰 Ports in the group do not need to be physically consecutive
򐂰 Tagged ports support in trunk group
򐂰 Compatibility with Cisco EtherChannel
򐂰 Ports can be dynamically added or deleted from the group, except primary
port
Layer 2 switching
Following are the main characteristics of Layer 2 switching implementation on
r-series models:
򐂰 9216 byte jumbo frames
򐂰 L2 MAC filtering
򐂰 MAC authentication
򐂰 MAC port security
򐂰 4090 VLANs
򐂰 Port and protocol based VLANs
򐂰 VLAN tagging:
– 802.1q
– Dual Mode
򐂰 STP (Spanning Tree Protocol) per VLAN
򐂰 Compatibility with CISCO PVST (Per VLAN Spanning Tree)
򐂰 STP fast forwarding (fast port, fast uplink), root guard, BPDU (Bridge Protocol
Data Unit) guard
򐂰 Up to 128 Spanning-Tree instances
򐂰 Rapid STP (802.1w compatible)
򐂰 MSTP (Multiple Spanning Tree Protocol) (802.1s)
򐂰 MRP Phase I&II
򐂰 VSRP (Virtual Switch Redundancy Protocol)
򐂰 Up to 255 topology groups
򐂰 Hitless OS with UDLD (Uni-directional Link Detection)
Multicast
Following are the main characteristics of multicast implementation on m-series
models:
򐂰 IGMP/IGMPv3 (Internet Group Management Protocol)

򐂰 IGMP/IGMPv3 snooping
򐂰 IGMP proxy
򐂰 PIM (Protocol-Independent Multicast) proxy/snooping
򐂰 Multicast routing PIM/DVMRP (Distance Vector Multicast Routing Protocol)
򐂰 Up to 153600 multicast routes
򐂰 IPv4 PIM modes - Sparse, Dense, Source-Specific
򐂰 IPv6 PIM modes - Sparse, Source-Specific
򐂰 Up to 4096 IPv4/IPv6 multicast cache entries
򐂰 Anycast RP for IPv4/IPv6
2.2.3 IBM x-series Ethernet Switches

The IBM x-series Ethernet Switch is a compact, high-performance,
high-availability, and high-density 1 RU switch specifically designed for
mission-critical data centers and High-Performance Computer (HPC)
requirements. This switch provides twenty-four 10/1 GbE (SFP+) ports plus four
10/100/1000 MbE (RJ45) ports of connectivity in an ultra-low-latency,
cut-through, non-blocking architecture.
This switch is an ideal cost-effective solution for server or compute-node

connectivity. It can support 1 GbE servers until they are upgraded to 10 GbE
capable Network Interface Cards (NICs), simplifying migration to 10 GbE server
farms. In addition, the switch can be positioned as a 10 GbE aggregation switch
behind 1 GbE access switches.
In any deployment scenario, this switch is designed to save valuable rack space,
power, and cooling in the data center while delivering 24x7 service through its
high-availability design.
Following are the key features:

򐂰 Flexibility to mix 10 GbE and 1 GbE servers, protecting investments and
streamlining migration to 10 GbE-capable server farms
򐂰 Wire-speed performance with an ultra-low-latency, cut-through, non-blocking
architecture that is ideal for HPC environments
򐂰 Highly efficient power and cooling with front-to-back airflow, automatic fan
speed adjustment, and use of SFP+ and direct attached SFP+ copper
(Twinax) for maximum flexibility
򐂰 High availability with redundant, load-sharing, hot-swappable,
auto-sensing/switching power supplies and a resilient triple-fan assembly
򐂰 End-to-end Quality of Service (QoS) with hardware-based marking, queuing,
and congestion management
򐂰 Embedded per-port sFlow capabilities to support scalable hardware-based
traffic monitoring
These switches are available in the following model configuration:

򐂰 IBM Ethernet Switch B24X (4002-X2A;4002AX2) - 24-port 10/1 GbE
dual-speed ports plus 4 10/100/1000 MbE RJ45 ports
IBM x-series can be installed in an EIA-310D compliant rack.
The IBM Ethernet Switch B24X is shown in Figure 2-5.
Figure 2-5 IBM x-series Ethernet Switch
Operating system
IBM x-series supports running Brocade IronWare OS R04.1.00 or higher.
Flexible data center deployment and future-proofing

Each dual-speed port on the IBM Ethernet B24X can function as a 1 GbE port by
plugging in a 1 GbE SFP transceiver, making it a flexible solution for
environments where some servers have not yet been upgraded to 10 GbE
capable NICs. In data center environments where many servers still utilize 1 GbE
links, organizations can initially deploy the B24X switch as a compact and
cost-effective 10 GbE aggregation switch and later move it to the access layer as
servers are upgraded to 10 GbE.
When organizations upgrade a server's NICs to 10 GbE, they will only need to
replace the 1 GbE SFPs with 10 GbE SFP+ transceivers or direct attached
10 GbE SFP+ copper (Twinax) transceivers. This approach protects
Ethernet-based investments and streamlines migration to 10 GbE. The switch
also includes four 10/100/1000 MbE RJ45 ports for additional server connectivity
or separate management network connectivity.
The high density of dual-speed ports in a 1U space enables organizations to

design highly flexible and cost-effective networks. In addition, organizations can
utilize various combinations of short-range and long-range transceivers for a
variety of connectivity options.

High availability hardware features
The IBM Ethernet Switch B24X supports two 300 W AC power supplies for 1+1
redundancy. The AC power supplies are hot-swappable and load-sharing with
auto-sensing and auto-switching capabilities which are critical components of
any highly available system. In addition, efficient front-to-back cooling is achieved
with a hot-swappable, resilient triple-fan assembly that automatically and
intelligently adjusts the speed of the fan. A single fan can fail without any impact
on system operation for a 2+1 fan redundancy within the fan assembly.
The hot-swappable power supplies and fan assembly are designed to enable
organizations to replace components without service disruption. In addition,
several high-availability and fault-detection features are designed to help in
failover of critical data flows, enhancing overall system availability and reliability.
Organizations can use sFlow-based network monitoring and trending to
proactively monitor risk areas and optimize network resources to avoid many
network issues altogether.

Table 2-17 x-series physical and thermal parameters

Component IBM Ethernet Switch B24X
Chassis type Fixed form factor
H/W/D (cm) 4.28 x 43.5x 39.378
Rack units (RUs) 1
Max. weight 7.4 kg
Max. power draw 176 W
Op. temperature (°C) 0 - 40 °C
Heat emission (BTU/hr) 600
Airflow Front/side-to-back
Number of fans 1 fan tray (3 fans in fan tray)
All the fans are hot-swappable and have fixed speeds.
Power parameters
All x-series models provide redundant and removable power supplies with the AC
power option.
The x-series models do not support Power over Ethernet (PoE).
The power supplies are auto-sensing and auto-switching, and provide up to 300
watts of total output power. The power supplies are hot swappable and can be
removed and replaced without powering down the system.
The system power parameters are shown in Table 2-18.
Table 2-18 x-series power parameters

Component IBM Ethernet Switch B24X
Power supplies 100 - 240 VAC / 50 - 60 Hz
Number of power supply bays 2
Number of power supply bays required for 1

fully loaded chassis
Ports, memory, and performance

The x-series models provide a Cut & Forward switching engine and 512 MB of
memory.
The number of ports and performance metrics are shown in Table 2-19.
Table 2-19 Ports and performance metrics

Component IBM Ethernet Switch B48G
10 GbE ports per system SFP+ 24
10/100/1000 Mbps RJ45 ports per system 4
Data switching capacity 488 Gbps
Packet routing capacity 363 Mpps
Interface types
򐂰 10 Gbps Ethernet port with SFP+ connector

Transceivers
Table 2-20 shows the available transceivers which can be used.
Table 2-20 Transceivers for 10/1 Gbps Ethernet SFP+ ports

10GBASE-SR LC 10Gbps Up to 300 m over

850 nm optic multi-mode fiber

1310 nm optic single-mode fiber
10GBASE Active LC 10Gbps Up to 1 m

TwinAx 1m
10GBASE Active N/A - SFP+ 10Gbps Up to 3 m

TwinAx 3m transceiver on
either end
10GBASE Active N/A - SFP+ 10Gbps Up to 5 m

TwinAx 5m transceiver on
either end
1000BASE-SX N/A - SFP+ 1 Gbps Up to 550 m over

850 nm optic transceiver on multi-mode fiber
either end



IBM x-series Ethernet Switches support various services, protocols, and
standards.
The following Layer 2 protocols are supported:

򐂰 Protected link groups
򐂰 Link aggregation (IEEE 802.3ad, LACP)
򐂰 UDLD
򐂰 STP/RSTP/MSTP
򐂰 Root guard
򐂰 BPDU guard
򐂰 Up to 32,000 MAC addresses
򐂰 Up to 4096 VLANs
򐂰 Up to 512 STP groups
򐂰 Up to 8 ports per trunk, up to 28 trunk groups
Quality of Service:
򐂰 MAC address mapping to priority queue
򐂰 ACL mapping to priority queue
򐂰 ACL mapping to ToS/DSCP
򐂰 Honoring DSCP and 802.1p
򐂰 ACL mapping and marking of ToS/DSCP
򐂰 DHCP assist
򐂰 QoS queue management using weighted round robin (WRR), strict priority
(SP), and a combination of WRR and SP
Traffic management:
򐂰 Inbound rate limiting per port
򐂰 ACL-based inbound rate limiting and traffic policies
򐂰 Outbound rate limiting per port and per queue
򐂰 Broadcast, multicast and unknown unicast
The whole list of supported standards and RFC compliance can be found at:
2.2.4 IBM c-series Ethernet Switches

Network planners today must expand and extend the range of services offered
further into the edge of the network. This requires extending the intelligence and
high-touch processing capabilities to the network edge, whether in a metro
network, a campus network, or in a data center. The challenge at the network
edge is compounded by the need to flexibly define and easily manage customer
services in an intuitive manner. Further, the expanding role of the converged
network makes Quality of Service (QoS), resiliency, and security crucial to the
success of many rollouts.
Whether deployed from a central or remote location, availability of space often

determines the feasibility of deploying new equipment and services within any
environment. To meet these challenges, IBM c-series Ethernet Switches are
purpose-built to offer flexible, resilient, secure and advanced services in a
compact form factor.
IBM c-series Ethernet Switches are compact 1 RU, multi-service

edge/aggregation switches with a powerful set of capabilities that combine
performance with rich functionality at the network edge. These switches offer
network planners a broad set of high-performance IPv4 Full Layer 2 and Base
Layer 3 functionality with flexible software upgrade options in the same device.

Following are key features of the c-series models:
򐂰 Compact 1 RU Layer 3 switch that is purpose-built for advanced Ethernet
applications
򐂰 MEF 9 and MEF 14 certified
򐂰 Comprehensive OAM capabilities based on IEEE 802.1ag-2007 and MEF
Service OAM Framework provide rapid troubleshooting of Layer 2 networks
and Ethernet services
򐂰 Innovative Ethernet Service Instance (ESI) framework provides complete
flexibility in separating, combining and managing customer service instances
򐂰 MEF standard E-LINE, E-LAN and E-TREE services
򐂰 Full IPv4 unicast and multicast capabilities
򐂰 Powered by field-proven Multi-Service IronWare OS that also runs on
m-series models
򐂰 Available in a variety of configurations (48-port) in both Hybrid Fiber (HF) and
RJ45 versions
򐂰 Wire-speed, non-blocking performance in all configurations
򐂰 Flexible 100M/1G SFP support and 10G XFP support with advanced optical
monitoring
򐂰 Provider Backbone Bridge (IEEE 802.1ah) and Provider Bridge (IEEE
802.1ad) support
The c-series Ethernet Switches are available in these model configurations:

򐂰 IBM Ethernet Switch B24C (4002AC2) - 24 x 10/100/1000 MbE RJ-45 ports
including 4 x 100/1000 MbE SFP combination ports with optional 2-port 10
GbE XFP upgrade slot
򐂰 IBM Ethernet Switch B24C (4002BC2) - 24x 100/1000 MbE SFP ports
including 4 x 10/100/1000 MbE RJ-45 combination ports with optional 2-port
10 GbE XFP upgrade slot
򐂰 IBM Ethernet Switch B48C (4002-C4A; 4002AC4) - 48 x 10/100/1000 Mbps
Ethernet RJ45 ports including 4 x 100/1000 MbE combination SFP ports
򐂰 IBM Ethernet Switch B48C (4002-C4B;4002BC4) - 48 x 100/1000 Mbps
Ethernet hybrid fiber SFP ports
򐂰 IBM Ethernet Switch B50C (4002-C5A;4002AC5) - 48 x 10/100/1000 Mbps
Ethernet RJ45 ports and 2 x 10 Gbps Ethernet XFP uplink ports
򐂰 IBM Ethernet Switch B50C (4002-C5B; 4002BC5) - 48 x 100/1000 Mbps
Ethernet hybrid fiber SFP ports and 2 x 10 Gbps Ethernet XFP uplink ports
All c-series models support rack and non-rack installation.
All six models are shown in Figure 2-6.
Figure 2-6 IBM c-series Ethernet Switches
Operating system
All c-series systems run Brocade Multi-Service IronWare R3.8.00 or a higher
operating system.
Carrier-class resiliency with Multi-Service IronWare

The c-series is built on Multi-Service IronWare, the same operating system
software that powers widely deployed Brocade equivalents of m-series of routers,
thereby allowing ease of integration with existing networks. These capabilities
include support for robust routing protocols, advanced Layer 2 protocols,
industry-standard user interface, a broad range of OAM protocols, security and
simplified management capabilities. Multi-Service IronWare on c-series includes
all these capabilities and additionally supports Provider Bridge and Provider
Backbone Bridge functionality.
Enabling true Carrier-Grade Ethernet services

Carrier Grade Ethernet, or Carrier Ethernet for short, is a ubiquitous,
standardized service that is defined by five attributes:
򐂰 Standardized services
򐂰 Scalability
򐂰 Service management
򐂰 Reliability
A Carrier Ethernet service can be delivered over any transport technology as

long as it satisfies the standards and attributes associated with the service.
Examples of underlying transport mechanisms that can be used are native
Ethernet using 802.1Q VLANs, MPLS-based Layer 2 VPNs, IEEE 802.1ad
Provider Bridges, IEEE 802.1ah Provider Backbone Bridges, Ethernet over
SONET, and so on.

The c-series models supports all five key attributes of Carrier Ethernet.
Standardized services
The c-series is compliant with both the MEF 9 and MEF 14 specifications. Using
the c-series models, a provider can offer E-LINE, E-LAN and E-TREE services,
the standardized service names for point-to-point, multipoint, and rooted
multipoint services. These services can be offered using 802.1Q VLANs,
Provider Bridges or Provider Backbone Bridges.
Scalability
The c-series supports up to 128k MAC addresses per system. Support for
100/1000 Mbps SFP ports or 10/100/1000 Mbps RJ45 ports, with wire-speed
performance even at full load, ensures that abundant capacity is available on
user facing ports to accommodate a provider’s customers who wish to upgrade to
a higher bandwidth service. Additionally, the use of Link Aggregation Groups
(LAG) allows multiple links to be aggregated and offer even higher bandwidth
services at the user network interface (UNI) to the end-user.
Service management
Recently developed specifications such as IEEE 802.1ag-2007 (Connectivity
Fault Management) and MEF 17 (Service OAM Framework and Specifications)
allow the rapid and proactive identification and isolation of faults in the network or
service, thereby maintaining service uptime and maximizing the ability to meet
customer SLAs. The c-series supports all the capabilities in IEEE 802.1ag,
including Connectivity Check Messages, Loopback Message/Response and
LinkTrace Message/Response. It allows flexible association and definition of both
Maintenance End Points (MEP) and Maintenance Intermediate Points (MIP)
within a network. Fault management functions of MEF 17 Service OAM are also
supported.
Reliability
To provide a high level of reliability in the Carrier Ethernet service, the c-series
supports Foundry’s innovative Metro Ring Protocol (MRP/MRP-II), the ring
resiliency protocol of choice on several metro networks worldwide. Standard
Layer 2 protocols such as MSTP, RSTP and STP are also supported. Foundry’s
MRP/MRP-II allows Carrier Ethernet services to be delivered over ring-based
topologies, including overlapping rings that help optimize the use of fiber in metro
rings and provide fast recovery from node/link failures in milliseconds. Foundry
MRP/MRP-II can also be used within a PB/PBB network.
Hard QoS
The c-series supports up to eight queues per port, each with a distinct priority
level. Advanced QoS capabilities such as the use of 2-rate, 3-color traffic
policers, Egress shaping, and priority remarking can also be applied to offer
deterministic “hard QoS” capability to customers of the service. The c-series can
be configured with Ingress and Egress bandwidth profiles per UNI that are in
compliance with the rigid traffic management specifications of MEF 10/MEF 14.
Multicast support
Multicast transport is a key enabler of next-generation services like IPTV. It is
also typically a major consumer of capacity in many multi-service networks. It is
therefore critical for next-generation edge switches to efficiently handle multicast
traffic. The c-series has comprehensive support for multicast switching and
routing by a variety of protocols, including PIM-SM, PIM-DM, PIM-SSM, IGMP
v2/v3, and other platform-independent multicast capabilities built in Multi-Service
IronWare.
Multicast traffic within c-series is handled with a very high degree of efficiency by
avoiding unnecessary replications and conserving bandwidth within the system.
By performing Egress interface based replication, switch performance and buffer
usage are optimally used within the system thereby maximizing network
performance when running multicast traffic.
Routing capabilities
Based on Multi-Service IronWare, the operating system software that
successfully powers thousands of m-series routers deployed around the world,
the c-series offers routing capabilities that are commonly required in edge
aggregation and other applications within a provider’s domain.
These routing capabilities include Brocade advanced hardware-based routing

technology, which ensures secure and robust wire-speed routing performance.
Multi-Service IronWare on the c-series includes support for IPv4 unicast
protocols—RIP, OSPF, IS-IS and BGP. Further, to increase overall service
availability, the c-series supports Graceful Restart helper mode for both OSPF
and BGP to support hitless management failover and hitless OS upgrades on
adjacent modular routers with these functions.
The powerful feature set of the c-series makes it an ideal candidate for
applications beyond Carrier Ethernet service delivery. For example, data center
networks and edge/aggregation routing within ISP networks often require a
compact Layer 3 switch with sufficient scalability in IPv4 routes. The
comprehensive support for IPv4 routing protocols, when complemented with
VRRP, and VRRP-E makes the c-series ideally suited for such applications.

Table 2-21 c-series physical and thermal parameters

Component IBM Ethernet Switch IBM Ethernet Switch IBM Ethernet Switch
B24C (C&F) B48C (C&F) B50C (C&F)
Chassis type Fixed form factor Fixed form factor Fixed form factor
H/W/D (cm) 4.4 x 44.3 x 44.8 4.4 x 44.3 x 44.8 4.4 x 44.3 x 44.8
Rack units (RUs) 1 1 1
Max. weight 7.5 kg 7.5 kg 8 kg
Max. power draw 170 W B48C (Cooper) 205 W B48C (Cooper) 255 W B50C (Cooper)
195 W B48C (Fiber) 245 W B48C (Fiber) 295 W B50C (Fiber)
Op. temperature (°C) 0 - 40 °C 0 - 40 °C 0 - 40 °C
Heat emission 580 - B48C (Cooper) 700 - B48C (Cooper) 870 - B50C (Cooper)
(BTU/hr) 666 - B48C (Fiber) 836 - B48C (Fiber) 1007 - B50C (Fiber)
Airflow Front-to-back Front-to-back Front-to-back
Number of fans 6 (1 fan tray) 6 (1 fan tray) 6 (1 fan tray)
Power parameters
All c-series models provide redundant and removable power supplies with AC
power options. Power supplies can be exchanged between various c-series
models. None of the c-series models provide Power over Ethernet (PoE) option.
Table 2-22 c-series power parameters

Component IBM c-series Ethernet Switch
Power supplies AC - 100-120V, 200-240V
Power supply bays 2
Number of power supply bays required for 1

fully loaded system
All c-series models provide a Store & Forward switching engine.
All c-series models have maximum 512 GB RAM and FLASH of 32 MB.

Component IBM IBM IBM IBM IBM IBM
Ethernet Ethernet Ethernet Ethernet Ethernet Ethernet
Switch Switch Switch Switch Switch Switch
B24C (C) B24C (F) B48C (C) B48C (F) B50C (C) B50C (F)
10/100/1000 24 N/A 48 N/A 48 N/A

RJ45 copper
ports per
system
100/1000 SFP N/A 24 N/A 48 N/A 48

ports per
system
10G XFP 2 (optional 2 (optional N/A N/A 2 (built-in) 2 (built-in)

uplinks module) module)
Combination 4 4 (10/100/ 4 N/A N/A N/A

ports (100/1000 1000 RJ45 (100/1000
SFP ports) ports) SFP Ports)
Forwarding 88 Gbps 88 Gbps 96 Gbps 96 Gbps 136 Gbps 136 Gbps

performance (w/2x 10 (w/2x 10
GbE) GbE)
Packet 65 Mpps 65 Mpps 71 Mpps 71 Mpps 101 Mpps 101 Mpps

forwarding (w/2x 10 (w/2x 10
performance GbE) GbE)
Buffering 128 MB 128 MB 128 MB 128 MB 192 MB 192 MB
Interface types

Transceivers
Table 2-24 and Table 2-25 show the available transceivers which can be used.

1000BASE-T SFP RJ-45 1 Gbps Up to 100 m with



optics

optics




optics
10GBASE-ER LC 10 Gbps Up to 40 km over

optics
10GBASE-CX4 LC 10 Gbps Up to 15 m over

Optional features
The following optional features are available for c-series models:
򐂰 Full Layer 3 Premium Activation:
Enables OSPFv2, IS-IS, IGMPv1/v2/v3, PIM-DM/-SM/-SSM, MSDP, Anycast
RP, MPLS, VPLS, Multi-VRF, Ethernet Service Instance (ESI), IEEE 802.1ag
Connectivity Fault Management (CFM), 802.1ad (Provider Bridges), and
802.1ah (Provider Backbone Bridges)
򐂰 Metro Edge Premium Activation:
Enables OSPFv2, BGP-4, IS-IS, IGMPv1/v2/v3, PIM-DM/-SM/-SSM

IBM c-series Ethernet Switches support these services, protocols, and
standards.
Advanced Carrier-Grade Ethernet services are provided by:

򐂰 Up to 128k MAC addresses
򐂰 4k VLANs/S-VLANs/B-VLANs
򐂰 Ability to reuse VLAN-ID on each port using Brocade innovative “Ethernet
Service Instance” (ESI) framework
򐂰 IEEE 802.1ad Provider Bridges
򐂰 IEEE 802.1ah Provider Backbone Bridges
򐂰 IEEE 802.1ag Connectivity Fault Management
򐂰 Comprehensive set of Layer 2 control protocols: Foundry MRP/MRP-II, VSRP,
RSTP, MSTP
򐂰 MEF 9 and MEF 14 certified
򐂰 E-LINE (EPL and EVPL), E-LAN and E-TREE support
򐂰 Protocol tunneling of customer BPDUs
Comprehensive IPv4 unicast routing support based on the rich feature set of
Multi-Service IronWare:
򐂰 High performance, robust routing by Foundry Direct Routing (FDR) for
complete programming of Forwarding Information Base (FIB) in hardware
򐂰 RIP, OSPF, IS-IS, BGP-4 support
򐂰 Support for VRRP and VRRP-E
򐂰 8-path Equal Cost Multipath (ECMP)
򐂰 Up to 32k IPv4 unicast routes in FIB
Support for trunks (link aggregation groups) using either IEEE 802.3ad LACP or
static trunks:
򐂰 Up to 12 links per trunk
򐂰 Support for single link trunk
Rich multicast support:

򐂰 Supported IPv4 multicast protocols include PIM-DM, PIM-SM, PIM-SSM
򐂰 IGMP v2/v3 routing and snooping support
򐂰 IGMP static groups support

򐂰 Multicast boundaries facilitate admission control
򐂰 Up to 4k multicast groups in hardware
򐂰 Multicast traffic distribution over LAGs
򐂰 Efficient Egress interface based replication maximizes performance and
conserves buffer usage
Deep Egress buffering for handling transient bursts in traffic:

򐂰 128 MB to 192 MB of buffering
򐂰 Based on configuration
Advanced QoS:
򐂰 Inbound and outbound two rate three color traffic policers with accounting
򐂰 8 queues per port, each with a distinct priority level
򐂰 Multiple queue servicing disciplines: Strict Priority, Weighted Fair Queuing,
and hybrid
򐂰 Advanced remarking capabilities based on port, VLAN, PCP, DSCP, or IPv4
flow
򐂰 Egress port and priority-based shaping
Comprehensive hardware-based security and policies:

򐂰 Hardware-based Layer 3 and Layer 2 ACLs (both inbound and outbound)
with logging
򐂰 Ability to bind multiple ACLs to the same port
򐂰 Hardware-based receive ACLs
Additional security capabilities:

򐂰 Port-based network access control using 802.1x or MAC port security
򐂰 Root guard and BPDU guard
򐂰 Broadcast, multicast and unknown unicast rate limits
򐂰 ARP Inspection for static entries
Advanced monitoring capabilities:

򐂰 Port and ACL-based mirroring allows traffic to be mirrored based on incoming
port, VLAN-ID, or IPv4/TCP/UDP flow
򐂰 Hardware-based sFlow sampling allows extensive Layer 2-7 traffic monitoring
for IPv4 and Carrier Ethernet services
򐂰 ACL-based sFlow support
The following interface capabilities are provided:

򐂰 Jumbo frame support up to 9,216 bytes
򐂰 Optical monitoring of SFP and XFP optics for rapid detection of fiber faults
򐂰 UDLD, LFS/RFN support
The whole list of supported standards and RFC compliance can be found at:
2.2.5 IBM s-series Ethernet Switches

IBM s-series Ethernet Switches are designed to extend control from the network
edge to the backbone. The switches provide intelligent network services,
including superior quality of service (QoS), predictable performance, advanced
security, comprehensive management. and integrated resiliency. A common
operating system and shared interface and power supply modules between the
Ethernet Switch B08S and B16S help reduce the cost of ownership by
minimizing operational expenses and improving return on investment (ROI).
A highly dense, resilient and flexible architecture allows scaling up to 384
10/100/1000 Mbps Class 3 (15.4 watts) PoE capable ports or 36 ports of
high-speed 10 GbE.
IBM s-series Ethernet Switches have an extensive feature set, making them well
suited for real-time collaborative applications, IP telephony, IP video, e-learning
and wireless LANs to raise an organization’s productivity. With wire-speed
performance and ultra low latency, these systems are ideal for converged
network applications such as VoIP and video conferencing. Providing one of the
industry’s most scalable and resilient PoE designs, the 1 GbE PoE capable ports
support the IEEE 802.1AB LLDP and ANSI TIA1057 LLDP-MED standards,
enabling organizations to build advanced multi-vendor networks.
LLDP enables discovery of accurate physical network topologies, including those

that have multiple VLANs where all subnets might not be known. LLDP-MED
advertises media and IP telephony specific messages, providing exceptional
interoperability, IP telephony troubleshooting, and automatic deployment of
policies, advanced PoE power negotiation, and location/emergency call
service.These features make converged network services easier to install,
manage, and upgrade, significantly reducing operational costs.
Following are the s-series key features:

򐂰 Industry-leading, chassis-based convergence solution provides a scalable,
secure, low-latency and fault-tolerant infrastructure for cost-effective
deployment of Voice over IP (VoIP), wireless, and high-capacity data services
throughout the enterprise.
򐂰 N+1 power redundancy design to enhance power operation and simplify
system configuration.
򐂰 A rich suite of security features including IP source guard, dynamic Address
Resolution Protocol (ARP) inspection, and DHCP snooping shields the
enterprise from internal and external threats.

򐂰 Highest Class 3 PoE capacity in the industry; the s-series B16S scales to 36
10-GE and 384 PoE ports of 10/100/1000 Mbps, each capable of delivering
15.4 watts to provide customers with a convergence-ready infrastructure that
will scale to support future growth.
򐂰 Combined SP (Strict Priority) /WRR (Weighted Round Robin) queuing and
cell-based switch fabric ensure low latency and jitter for voice and video
traffic.
򐂰 Intelligent PoE and configuration management with LLDP Link Layer
Discovery Protocol), LLDP-MED (Link Layer Discovery Protocol-Media
Endpoint Discovery) and PoE Prioritization for IP Phones.
򐂰 Redundant architecture and resilient protocols ensure business continuity in
the event of network or equipment failure(s).
򐂰 Embedded, hardware-based sFlow traffic monitoring enables network-wide
accounting, utilization reporting, capacity planning, intrusion detection, and
more.
򐂰 Advanced IronWare Layer 2 Ethernet switching with robust suite of security
capabilities including extended ACLs, MAC filters, TCP and IGMP denial of
service protection, spanning tree BPDU Guard, Root Guard, unicast and
multicast rate limiting, Metro Ring Protocol, Virtual Switch Redundancy
Protocol, and more.
򐂰 Flexibility option to upgrade the software to full Layer 3, including support for
IP routing protocols such as RIPv1/v2, OSPF, BGP, and support for multicast
routing.
򐂰 IronShield 360 intrusion protection delivers dynamic and real-time protection
from network and host-based attacks.
򐂰 IPv6 capable blades to future proof the network for IPv6 application migration.
The s-series Ethernet Switches are available in the following model

configurations:
򐂰 IBM Ethernet Switch B08S (4003-S08): Switch with redundant management
and switch fabric modules for enhanced system resilience; 464 Gbps data
switching capacity, PoE over tri-speed 10/100/1000 Mbps interfaces
򐂰 IBM Ethernet Switch B16S (4003-S16): Switch with redundant management
and switch fabric modules for enlaced system resilience; 848 Gbps data
switching capacity, up to 384 Class 3 PoE ports with N+1 power redundancy
making it the most powerful PoE solution in the industry, PoE over tri-speed
10/100/1000 Mbps interfaces
Both configurations are shown in Figure 2-7.
Figure 2-7 IBM s-series Ethernet Switches
All s-series models can only be installed in the rack. Non-rack installation is not
supported.
Operating system
All s-series systems run Brocade IronWare R5.0.00 or higher operating system.
Future-proofing the network with IPv6

Migration to IPv6 is inevitable, but by starting with the deployment of
IPv6-capable hardware the transition can be more controlled and less disruptive
to the network. Japan and Europe are aggressively deploying IPv6, and
deployment in North America is on the rise. In fact, some government agencies
are mandating the purchase of IPv6-capable switches and routers. Therefore, it
is important that enterprises and service providers plan to deploy IPv6-capable
devices to capitalize on this inevitable change.
Configuration alternatives
The s-series family of switches is optimized for flexibility with upgradeability for
PoE, redundant management, switch fabric and power, and 10 Gigabit Ethernet.
Available in three chassis models, the scalable s-series family helps enterprises
and service providers reduce costs and gain operational benefits of a common
operating system, a shared interface, and common power supply modules.

High-quality and reliable network convergence
The s-series family provides a scalable, secure, low latency, and fault-tolerant
infrastructure for cost-effective integration of VoIP, video, wireless access, and
high-performance data onto a common network. The system architecture
features a scalable and resilient PoE design and a low-latency, cell-based switch
fabric with intelligent traffic management to ensure reliable and high-quality VoIP
service.
A rich suite of security features, including policy-based access control, IP source

guard, dynamic ARP inspection, and DHCP snooping, work in unison to control
network access and shield the network from internal and external threats. The
s-series family establishes a new class of convergence-ready solutions, enabling
organizations to implement a secure, reliable, scalable, and high-quality
infrastructure for total network convergence.
Resilient green power distribution and consumption

The s-series family features a unique power distribution design for the system
and PoE power. The chassis are designed with independent systems and PoE
power subsystems. This design achieves optimal power operation and
configuration, reducing the equipment and ongoing costs, in comparison to
modular systems that use a common power supply for both the systems and the
PoE equipment. In the s-series family, the power consumption of a line module’s
PoE circuitry does not impact the system power.
Similarly, the power consumption of the line modules, switch modules, and
management modules does not impact the PoE power. Power consumption for
the system and PoE are calculated, provisioned, and managed independently of
one another. As more PoE devices are added to a switch, a simple power budget
calculation determines whether another PoE power supply needs to be added to
the switch.
The system power distribution and the PoE power distribution subsystems are
each designed for M+N load-sharing operation. This dual-distribution power
design simplifies the power configuration of the system while enhancing system
reliability. The chassis can be configured for a wide range of power
environments, including 110V/220V AC power, -48V DC power and mixed AC/DC
power configurations. To scale PoE configurations, PoE power supplies are
available in two ratings of 1250W and 2500W. When configured with four 2500W
PoE supplies, the s-series supports up to 384 10/100/1000 Mbps Class 3 PoE
ports and still maintains N+1 power redundancy. This resiliency is unmatched in
the industry.
Intelligent and scalable Power over Ethernet
Power over Ethernet (PoE) is a key enabler of applications such as VoIP, IEEE
802.11 wireless LANs, and IP video. The s-series is Brocade’s third-generation
PoE-capable switch family and incorporates the latest advances in PoE
provisioning and system design, delivering scalable and intelligent PoE to the
enterprise. The PoE power distribution subsystem is independent of the system
power, eliminating system disruption in the event of PoE over-subscription or a
PoE power failure.
Customers have the choice of purchasing PoE-ready line modules or upgrading

10/100/1000 Mbps line modules when needed with field-installable PoE daughter
modules. PoE power per port can be manually or dynamically configured.
Dynamic configuration is supported using standards-based auto discovery or
legacy Layer 2 discovery protocols. Port priorities are also configurable and are
used to prioritize PoE power in over-subscribed configurations.
Advanced QoS and low latency for enterprise convergence

The s-series family offers superior Quality of Service (QoS) features that enable
network administrators to prioritize high-priority and delay-sensitive services
throughout the network. S-series switches can classify, re-classify, police, mark,
and re-mark an Ethernet frame or an IP packet prior to delivery. This flexibility
lets network administrators discriminate among various traffic flows and enforce
packet-scheduling policies based on Layer 2 and Layer 3 QoS fields.
After being classified, the traffic is queued and scheduled for delivery. Three
configured queuing options provide the network administrator with flexible control
over how the system services the queues. Weighted Round Robin (WRR)
queuing applies user-configured weighting for servicing multiple queues,
ensuring that even low priority queues are not starved for bandwidth. With Strict
Priority (SP) queuing, queues are serviced in priority order ensuring that the
highest-priority traffic is serviced ahead of lower priority queues. Combined SP
and WRR queuing ensures that packets in the SP queue are serviced ahead of
the WRR queues. Combined queuing is often used in VIP networks where the
VIP traffic is assigned to the SP queue and data traffic to the WRR queues.
In addition, the switch management modules are available with integrated Gigabit
Ethernet or 10-Gigabit Ethernet ports. These modules provide cost-effective
system configurations supporting high-capacity connections to upstream
switches. The management modules utilize high-performance system processors
with high-capacity memory for scalable networking up to a routing capacity of 1
million BGP routes and 20 BGP peers.
The s-series switches utilize an advanced cell-based switch fabric with internal
flow-control, ensuring very low latency and jitter performance for converged
applications.

Ease of use: plug and play
The s-series family supports the IEEE 802.1AB LLDP and ANSI TIA 1057
LLDP-MED standards, enabling organizations to build open convergence,
advanced multi-vendor networks. LLDP greatly simplifies and enhances network
management, asset management, and network troubleshooting. For example, it
enables discovery of accurate physical network topologies, including those that
have multiple VLANs where all subnets might not be known.
LLDP-MED addresses the unique needs that voice and video demand in a
converged network by advertising media and IP telephony specific messages
that can be exchanged between the network and the endpoint devices.
LLDP-MED provides exceptional interoperability, IP telephony troubleshooting,
and automatic deployment of policies, inventory management, advanced PoE
power negotiation, and location/emergency call service. These sophisticated
features make converged network services easier to install, manage, and
upgrade and significantly reduce operations costs.
Flexible bandwidth management

The s-series switches support a rich set of bandwidth management features,
allowing granular control of bandwidth utilization. On ingress, extended ACLs can
be used in combination with traffic policies to control bandwidth by user, by
application, and by VLAN. On egress, outbound rate limiting can control
bandwidth per port and per priority queue. These features allow the network
operator fine-grained control of bandwidth utilization based on a wide range of
application and user criteria.
Complete solution for multicast and broadcast video

The use of video applications in the workplace requires support for scalable
multicast services from the edge to the core. IGMP and PIM snooping improves
bandwidth utilization in Layer 2 networks by restricting multicast flows to only
those switch ports that have multicast receivers. In Layer 3 networks, support for
IGMP (v1, v2, and v3), IGMP Proxy, PIM-SM, PIM-SSM, and PIM-DM multicast
routing optimizes traffic routing and network utilization for multicast applications.
Advanced full Layer 2/3 wire-speed IP routing solution

Advanced IronWare supports a full complement of unicast and multicast routing
protocols, enabling users to build fully featured Layer 2/Layer 3 networks.
Supported routing protocols include RIPv1/v2, OSPF, PIM-SM/DM, BGP, and
Equal Cost Multi-path (ECMP) for improved network performance. M2, M3, and
M4 management modules can support routing table capacity of up to 1,000,000
BGP routes and 20 BGP peers. s-series switches can be upgraded with
Advanced IronWare routing software (a Layer 3 upgrade).
To achieve wire-speed Layer 3 performance, the s-series switches support
Foundry Direct Routing (FDR), in which the forwarding information base (FIB) is
maintained in local memory on the line modules. The hardware forwarding tables
are dynamically populated by system management with as many as 256,000
routes.
Comprehensive bulletproof security suite

Security is a concern for today’s network managers, and the s-series switches
support a powerful set of network management solutions to help protect the
switch. Multilevel access security on the console and a secure web management
interface prevent unauthorized users from accessing or changing the switch
configuration. Using Terminal Access Controller Access Control Systems
(TACACS/TACACS+) and RADIUS authentication, network managers can enable
considerable centralized control and restrict unauthorized users from altering
network configurations.
The s-series family includes Secure Shell (SSHv2), Secure Copy, and SNMPv3
to restrict and encrypt communications to the management interface and system,
thereby ensuring highly secure network management access. For an added level
of protection, network managers can use ACLs to control which ports and
interfaces have TELNET, web, and/or SNMP access.
After the user is permitted access to the network, protecting the user’s identity
and controlling where the user connects becomes a priority. To prevent “user
identity theft” (spoofing), the s-series switches support DHCP snooping, Dynamic
ARP inspection, and IP source guard. These three features work together to
deny spoofing attempts and to defeat man-in-the-middle attacks. To control
where users connect, the s-series switches support private VLANs, quarantine
VLANs, policy-based routing, and extended ACLs, all of which can be used to
control a user’s access to the network.
In addition, s-series switches feature embedded sFlow packet sampling, which

provides system-wide traffic monitoring for accounting, troubleshooting, and
intrusion detection.
Resilient design for business continuity

A s-series networking solution is built for high-value environments. Featuring
redundant management modules, redundant fans, redundant load-sharing switch
fabrics, and power supply modules, the s-series switches are designed for
maximum system availability. Switch fabric failover preserves network
connectivity in the event of a switch module failure. Automatic management
failover quickly restores network connectivity in the event of a management
module failure.

In the event of a topology change due to a port or facility failure, Layer 1 and
Layer 2 protocols, such as Protected Link, Metro Ring Protocol (MRP), IEEE
802.3ad, UDLD, VSRP, and Rapid Spanning Tree Protocol, can restore service in
sub-second time (tens to hundreds of milliseconds, depending on the protocol),
protecting users from costly service disruption. Enhanced spanning tree features
such as Root Guard and BPDU Guard prevent rouge hijacking of spanning tree
root and maintain a contention and loop-free environment especially during
dynamic network deployments. These high availability capabilities enable
network deployments of a highly reliable network infrastructure that is resilient to,
and tolerant of, network and equipment failures.
Investment protection through IPv6 capable hardware

Networks are in the early stages of large-scale IPv6 production deployment,
however few IPv6 innovative applications are currently on the market. Although
the success of IPv6 will ultimately depend on the new applications that run over
IPv6, a key part of the IPv6 design is the ability to integrate into and coexist with
existing IPv4 switches within the network and across networks during the steady
migration from IPv4 to IPv6.
Following are benefits of the IPv6-capable modules:

򐂰 The IPv6-capable s-series management modules are non-blocking, with a
built-in switch fabric module and 12 combination Gigabit Ethernet copper or
fiber ports that provide connectivity to your existing management network.
򐂰 The IPv6-capable s-series management modules have a console port and a
10/100/1000 port for out-of-band management. The management modules
optionally support 2-port 10-GbE ports.
򐂰 The IPv6-capable s-series management modules are interchangeable
between devices.
򐂰 Redundant management modules on the IPv6-capable s-series provide
100% redundancy.
򐂰 The crossbar (xbar) architecture enables the management module to switch
30 Gbps between each interface module and within the management module.
򐂰 The IPv6-capable interface modules and power supplies are interchangeable
among s-series switches.
򐂰 The IPv6-capable s-series management, switch fabric, and interface modules
are hot swappable, which means a module can be removed and replaced
while the chassis is powered on and running.
Table 2-26 s-series physical and thermal parameters

Component IBM Ethernet Switch IBM Ethernet Switch
B08S B16S
Chassis type Modular 8-slot chassis Modular 16-slot chassis
H/W/D (cm) 26.3 x 44.5 x 43.8 62.2 x 44.5 x 43.8
Rack Units (RUs) 6 14
Max. weight 31 kg 88.6 kg
Max. power draw 1428 W 2440 W
Max. power draw with PoE 4203 W 7990 W

using 1250 W PoE PS
Max. power draw with PoE 5227 W 10037 W

using 2500 W PoE PS
Op. temperature (°C) 0 - 40 °C 0 - 40 °C
Heat emission (BTU/hr) 4874 8326

with PoE using 1250 W
PoE PS

with PoE using 2500 W
PoE PS
Airflow Side-to-side Front-to-back
Fan tray/assemblies 1 2
The cooling system of the s-series is configured as follows:

򐂰 IBM Ethernet Switch B08S (4003-S08): Is equipped with six fans.
򐂰 IBM Ethernet Switch B16S (4003-S16): Is equipped with two fans in the rear
of the chassis.
All the fans are hot swappable and have adjustable speeds.

Power parameters
All s-series models provide redundant and removable power supplies with AC
power option. Power supplies can be exchanged between B08S and B16S
models.
The s-series models provide Power over Ethernet (PoE) option.
There are separate power supplies for system power (SYS) and PoE power
(PoE). Power consumption between PoE and SYS power supplies is not shared,
meaning loss of a System power supply does not impact a PoE power supply,
and vice versa.
System power supplies have internal power of 12V and PoE power supplies have
internal power of 48V.
All power supplies are auto-sensing and auto-switching. All are hot swappable
and can removed and replaced without powering down the system.
Table 2-27 s-series SYS power parameters

B08S B16S
1200W system (SYS) 100 - 240 VAC / 50 - 60 Hz 100 - 240 VAC / 50 - 60 Hz

power supplies
Number of SYS power 2 4

supply bays
Number of SYS power 1 2

supply bays required for
The system (SYS) power supplies provide power to the management module, all
non-PoE interface modules, and all ports on PoE modules that do not require
PoE power or to which no power-consuming devices are attached. The installed
SYS power supplies provide power to all chassis components, sharing the
workload equally. If a SYS power supply fails or overheats, the failed power
supply’s workload is redistributed to the redundant power supply.
The PoE power parameters are shown in Table 2-28.
Table 2-28 s-series PoE power parameters

B08S B16S
1250W PoE power 100 - 240 VAC / 50 - 60 Hz 100 - 240 VAC / 50 - 60 Hz

supplies
2500W PoE power 200 - 240 VAC / 50 - 60 Hz 200 - 240 VAC / 50 - 60 Hz

supplies
Number of PoE power 2 4

supply bays
Number of PoE power 1 3

supply bays required for
The PoE Power Supplies provide power to the PoE daughter card, and ultimately
to PoE power consuming devices. The installed PoE power supplies share the
workload equally. If a PoE power supply fails or overheats, the failed power
supply’s workload is redistributed to the redundant power supply. The number of
PoE power-consuming devices that one PoE power supply can support depends
on the number of watts (Class) required by each power-consuming device (PD).
The number of PoE power-consuming devices that one 1250W PoE power
supply can support depends on the number of watts required by each
power-consuming device. Each supply can provide a maximum of 1080 watts of
PoE power, and each PoE port supports a maximum of 15.4 watts of power per
PoE power-consuming device. For example, if each PoE power-consuming
device attached to the s-series consumes 15.4 watts of power, one power supply
will power up to 70 PoE ports. You can install additional power supply for
additional PoE power.
Each 2500W PoE power supply can provide a maximum of 2160 watts of PoE
power, and each PoE port supports a maximum of 15.4 watts of power per PoE
power-consuming device. For example, if each PoE power-consuming device
attached to the s-series consumes 15.4 watts of power, it will supply power up to
140 PoE ports.
Note: The system powers on as many PoE ports as each PoE power supplies
can handle. The system calculates the maximum number of PoE ports it can
support based on the number of PoE power supplies installed. PoE ports are
enabled based on their priority settings. Keep in mind that the system will
reserve the maximum configured power per PoE-enabled port, even if the PoE
power-consuming device is drawing less power.

In the B08S chassis, the system power supplies occupy slot numbers 3 and 4 on
the right, with the redundant supply in slot 4. The PoE power supplies occupy slot
numbers 1 and 2 on the left. Figure 2-8 shows power supply placement.
Figure 2-8 BS08S power supply placement
In the B16S chassis, the system power supplies occupy slot numbers 1 – 4 in the
top row with the redundant supplies in slot numbers 3 and 4. The PoE power
supplies occupy slot numbers 5 – 8 in the bottom row. Figure 2-9 shows power
supply placement.
Figure 2-9 BS16S power supply placement
What happens when one or more system power supplies fail:
If one or more system power supplies fail and the system is left with less than
the minimum number of power supplies required for normal operation, the
power supplies will go into overload and the system will start to shut down.
Several things can happen. The output voltage of the remaining good power
supplies will likely drop as they try unsuccessfully to generate more power
than they are capable of. The system will react to a drop in voltage by
increasing the current draw. The hardware will shut down due to over-current
protection or under-voltage protection, whichever takes place first. One by
one, the interface modules will shut down until the power is within the power
budget of the remaining power supplies. There is no particular order in which
the interface modules will shut down, as this will occur in hardware and not in
software. The management CPU requires power as well, and can also shut
down during a power supply failure.
What happens when one or more poe power supplies fail:
If one or more PoE power supplies fail and the system is left with less than the
minimum number of PoE power supplies, the PoE power supplies will go into
overload. Non-PoE functions will not be impacted, provided the System power
supplies are still up and running.
Several things can happen with a PoE power supply failure. The output voltage
of the remaining good power supplies will likely drop as they try unsuccessfully
to generate more power than they are capable of. The system will react to a
drop in voltage by increasing the current draw. The hardware will shut down
PoE function due to over-current protection or under-voltage protection,
whichever occurs first. The interface modules will start to shut down its PoE
ports one by one until the over-power is within the power budget of the
remaining power supplies. There is no particular order in which the PoE ports
will shut down, as this occurs in hardware and not in software.
After a power loss, if the system is left with less than the minimum number of
power supplies required for normal operation, the system will be left in an
unknown state. At this point, manual recovery is required (that is, restore
power and power cycle the chassis).

The s-series family employs a distributed switching design to deliver high-speed
forwarding across the platform. Switching between ports on the same module is
performed locally and switch between ports across modules is performed across
the crossbar switch fabric.

Each model has two management module and two fabric module slots that are
exchangeable between all s-series models. Two management and fabric
modules provide redundancy as s-series can also operate with only one
management and/or one fabric module. In case of only one operational fabric
module, performance can be degraded in certain traffic patterns. All models have
passive backplanes.
All s-series models have maximum 512MB RAM and management processor
with 667 MHz.
The number of slots, ports and performance metrics are shown in Table 2-29.
Table 2-29 Slots, ports, and performance metrics

B08S B16S
Interface slots 8 16
Max. number of slots for 2 2

management modules
Min. number of 1 1
management modules
required for operations
Max. number of slots for 2 2

fabrics modules
Min. number of switch 2 2
fabric modules required for
fully-loaded chassis at
wire-speed
Min. number of switch 1 1

fabric modules required for
operations
10/100/1000 copper ports 24 24

per module
1 GbE SFP ports per 24 24

module
10 GbE ports per module 2 2
Backplane switching 600 Gbps 1080 Gbps

capacity
Data switching capacity 464 Gbps 848 Gbps
Packet forwarding capacity 384 Mpps 636 Mpps
All modules are hot-swappable and do not require power-off to be replaced.
Table 2-30 and Table 2-31 show the available port density.
Table 2-30 Non-PoE port density

Port type IBM Ethernet Switch IBM Ethernet Switch
B08S B16S
100 Base FX (SFP) 200 392
1000 Base X (SFP) 200 392
1000 Base T (RJ45) 200 392
10/100/1000 Base Total 200 392

(SFP+RJ45)
10 Base X (XFP) 20 36
Table 2-31 PoE port density

Port type IBM Ethernet Switch IBM Ethernet Switch
B08S B16S
IEEE 802.3af Class-1,2,3 192 384

10/100/1000
IEEE 802.3af Class-1,2,3 140 384

10/100/1000 with N+1
Power Supply Redundancy
Management modules
The following types of management modules are available:
򐂰 IPv4 management module
򐂰 IPv4 management module with 2-port 10 GbE (XFP)
򐂰 IPv6 management module with 2-port 10 GbE (XFP)
Interface modules
Table 2-32 shows which modules can be installed in the s-series chassis
interface slots.

IP version & Speed Number of ports Connector type
IPv4 10/100/1000MbE 24 RJ45
IPv4 100/1000MbE 24 SFP
IPv4 10 GbE 2 XFP
IPv6 10/100/1000MbE 24 RJ45
IPv6 100/1000MbE 24 SFP
IPv6 10 GbE 2 XFP
Interface types
Transceivers
Table 2-33 and Table 2-34 show the available transceivers to be used in interface
modules.


Copper CAT5 or higher


optics

optics



optics

optics

Optional features
The s-series is capable of providing Layer 3 functions. Following are the optional
features:
򐂰 IPv4 Full Layer 3 Premium Activation
Enables RIPv1/v2, OSPFv2, BGP-4, IGMPv1/v2/v3, PIM-SM/-DM/-SSM,
VRRP-E
򐂰 IPv6 Full IPv4 Layer 3 Premium Activation
Enables RIPv1/v2, OSPFv2, BGP-4, IGMPv1/v2/v3, PIM-SM/-DM/-SSM,
VRRP-E
򐂰 IPv6 Full IPv6 Layer 3 Premium Activation
Enables RIPv1/v2, RIPng, OSPFv2, OSPFv3, BGP-4, IGMPv1/v2/v3,
PIM-SM/-DM, DVMRP, VRRP-E

IBM s-series Ethernet Switches support these services, protocols, and
standards.
PoE capable ports support the following standards:

򐂰 IEEE 802.1AB
򐂰 LLDP
򐂰 ANSI TIA 1057 LLDP-MED

Various Layer 2 and Layer 3 protocols are supported:
򐂰 Protected Link Groups
򐂰 Link Aggregation (IEEE 802.3ad, LACP)
򐂰 UDLD
򐂰 STP/RSTP/MSTP
򐂰 Root Guard
򐂰 BPDU Guard
򐂰 MRP
򐂰 VLAN stacking
Layer 3 Premium Activation adds support for these features:

򐂰 IGMP (v1,2,3)
򐂰 IGMP Proxy
򐂰 PIM-SM/PIM-SSM/PIM-DM multicast routing
򐂰 RIP v1/2
򐂰 OSPFv2
򐂰 BGP-4
򐂰 Equal Cost Multi Path (ECMP)
Layer 3 IPv4+IPv6 Premium activation is adding support for these features:

򐂰 OSPFv3
򐂰 RIPng
򐂰 IPv6 over IPv4 tunnels
򐂰 IPv6 ACLs
QoS is provided with:

򐂰 Weighted round robin (WVR) queuing
򐂰 Strict priority (SR) queuing
򐂰 Extended ACLs with traffic policies on ingress traffic for controlling the
bandwidth per user, application or VLAN
Powerful sets of network management solutions are supported:

򐂰 Multilevel access security on the console
򐂰 Secure web management
򐂰 Terminal Access Controller Access Control (TACACS/TACACS+)
򐂰 Remote Authentication Dial in User Service (RADIUS) authentication
Access to management interface can be restricted and encrypted; this can be
achieved by using:
򐂰 Secure Shell (SSHv2) access
򐂰 Secure Copy (SCPv2)
򐂰 SNMPv3
򐂰 HTTPS
򐂰 ACLs to define which ports and interfaces have CLI, web and/or SNMP
access
To prevent “user identity theft” (spoofing) the s-series supports these features:
򐂰 DHCP snooping
򐂰 Dynamic ARP inspection
򐂰 IP source guard
The whole list of supported standard and RFC compliance can be found at:
2.2.6 IBM g-series Ethernet Switches

IBM g-series Ethernet access switches provide enterprise organizations with a
flexible and feature-rich solution for building a secure and converged network
edge. The switches support 48 x 1 GbE RJ45 ports including 4x 1 GbE SFP
combination ports. The B48G is upgradeable with two 10 GbE uplink ports to
consolidate connections into the enterprise aggregation point, campus LANs, or
metro area networks. The B50G comes with 2x 10 GbE CX4 stacking ports,
providing the flexibility of a “pay-as-you-grow” architecture.
Both models enable a converged solution for vital network applications such as
VoIP, wireless access, WebTV, video surveillance, building management
systems, triple play (voice + video + data) services and remote video kiosks in a
cost-effective, high-performance compact design.
Following are the g-series key features:

򐂰 Compact 48-port 10/100/1000 Mbps access switch models; field upgradeable
with Power over Ethernet (PoE), 10 Gigabit Ethernet, and IronStack stacking
for scalable and secure network access
򐂰 Hot-swappable, load-sharing AC and DC power supply options
򐂰 Industry leading IEEE 802.3af PoE Class 3 port capacity in a compact form
factor delivers a scalable and cost-effective solution for unified
communications at the network edge
򐂰 Advanced IronWare Layer 2 Ethernet switching with robust suite of security
capabilities including ACLs, MAC filters, TCP SYN and ICMP denial of service

(DoS) protection, Spanning Tree BPDU Guard, Root Guard, unicast,
broadcast and multicast rate limiting, 802.1X authentication, and enhanced
lawful intercept features
򐂰 Base Layer 3 capabilities enable routed topologies to the network edge;
supported features include: RIP v1/v2 route announcement, static routes,
virtual and routed interfaces, DHCP relay, and VRRP (Virtual Router
Redundancy Protocol)
򐂰 Open and standards-based network access control features multi-host 802.1x
access control, multi-device MAC authentication, and policy controlled
MAC-based VLANs
򐂰 Low packet latency and advanced Quality of Service (QoS) with eight
hardware-based priority queues and combined strict priority and weighted
round robin scheduling ensure dependable and high-quality network
convergence
򐂰 Embedded hardware-based sFlow packet sampling enables network wide
traffic monitoring for traffic accounting, intrusion detection, 802.1x identity
monitoring, link utilization, and fault isolation
򐂰 IronShield 360 intrusion protection delivers dynamic, real-time protection from
network and host-based attacks
򐂰 Brocade’s IronStack stacking technology provides cost-effective expansion at
network edge with operational simplicity of a single switch
򐂰 IronStack supports up to eight B50G units in a logical chassis scaling to 384
PoE ports, and it features automatic failover in event of link fault or active
controller failure and hot insertion and cross-unit trunking for increased
resilience
The g-series Ethernet Switches are available in these model configurations:

򐂰 IBM Ethernet Switch B48G (4002-G4A;4002-AG4): 48-port 10/100/1000 with
4-port combo ports that support 10/100/1000 or 100/1000 SFP connections
and one redundant, removable power supply; field upgradeable to include a
2-port 10 GbE module for either XFP or CX4 connectivity and another
redundant removable power supply; PoE upgradeable
򐂰 IBM Ethernet Switch B50G (4002-G5A;4002-AG5): The same connectivity,
availability and PoE features as B48G plus advanced IronStack stacking
technology over 2 x 10 GbE CX4 ports
Both models are shown in Figure 2-10.
Figure 2-10 IBM g-series Ethernet Switches
All g-series models can only be installed in the rack. Non-rack installation is not
supported.
Operating system
G-series B48G is running Brocade IronWare R4.3.01 or higher and B50G is
running R5.0.01 or higher version of operating system.
Performance and scalability

Today’s enterprise organizations require cost-effective, flexible, and secure
solutions for delivering data and unified communication services on a network
architecture that can scale and evolve to meet their ever-changing needs. The
g-series is designed to meet these requirements. Its wire-speed architecture
delivers non blocking performance for high-speed Gigabit Ethernet desktops
while providing QoS assurances at VoIP endpoints.
For cost-effective and rapid scaling at the network edge, the g-series is equipped
with IronStack stacking technology, which supports stacking up to eight units in a
virtual chassis. The IronStack system supports 40-Gbps switching capacity
between stacked units providing a high-capacity interconnect across the stack.
The g-series IronStack supports stacking over copper and fiber cables. This
provides for flexible stack configurations in which stacked units can be separated
by more than several hundred meters of fiber.

Convergence planning and deployment can occur over an extended period, and
budget constraints might require phased deployments. The g-series models
make it easy to deploy a solution today that can be upgraded later to support
PoE,10-GbE, and stacking as needed.
Each power supply within a g-series delivers up to 480 watts of PoE power. In a
dual power supply configuration, up to 48 10/100/1000 Mbps PoE ports of 15.4
watts per port (full Class 3) can be supported. This scalability enables the
network manager to size the installation to meet current needs and have room for
future growth.
As network traffic increases, network managers can easily upgrade to 10-GbE to

provide high-capacity connectivity to the network backbone and/or
high-performance server. The B48G can be upgraded in the field with a two-port
10-GbE XFP/CX4 module.
High availability hardware features

Convergence solutions such as VoIP require high availability, especially for the
power supplies that power the PoE interfaces. G-series switches fulfill this
requirement with dual, hot-swappable AC or DC power supplies. Both redundant
AC and redundant DC power configurations are included.
The g-series features 1+1 power redundancy, using hot-swappable and field
replaceable power modules, which install into the rear of the unit. The power
modules are load-sharing supplies providing full 1+1 redundancy for as many as
48 Class 1and Class 2 PoE ports and 31 Class 3 (15.4 watts) PoE ports.
Additional design features include intake and exhaust temperature sensors and
fan spin detection to aid in rapid detection of abnormal or failed operating
conditions to help minimize mean time to repair.
IronStack solution
IronStack is advanced stacking technology that supports stacked configurations
in which as many as eight g-series switches can be interconnected and maintain
the operational simplicity of a single switch. Each IronStack enabled g-series
model can support up to 40Gbps of stacking bandwidth per unit. IronStack
configurations can be built using 10-GbE CX4 copper or XFP-based fiber
connections. When XFP-based fiber connections are used, an IronStack
configuration can be extended between racks, floors, and buildings with fiber
lengths up to several hundred meters.
The B50G models are pre-configured with a two-port 10-GbE CX4 module,
expanded CPU memory, and IronStack license (IronStack PROM) and software.
An IronStack system operates as a single logical chassis (with a single IP
management address) and supports cross-member trunking, mirroring,
switching, static routing, sFlow, multicast snooping and other switch functions
across the stack. An IronStack stack has a single configuration file and supports
remote console access from any stack member. Support for active-standby
controller failover, stack link failover, and hot insertion/removal of stack members
delivers the resilience that is typical of higher end modular switches.
High density and full class 3 Power over Ethernet

When configured with Power over Ethernet (PoE), the g-series switches support
IEEE 802.3af standards-based PoE on all ports. The g-series switches capability
to deliver high-density, full-power PoE on all ports reduces the need to purchase
additional hardware to support the higher power requirements.
When configured with dual power supplies, the 48-port g-series switch supports
up to 48 10/100/1000 Class 3 (15.4 watts) PoE ports, which is one of the highest
Class 3 PoE port density for a compact switch in the industry. These capacities
are a significant advantage for environments that require full Class 3 power for
devices such as surveillance cameras, color LCD phones, point-of-service
terminals, and other powered endpoints.
An IronStack configuration of eight g-series switches can support as many as

384 PoE ports supporting full Class 3 PoE power without the need for external
power supplies. Other solutions require external power supplies adding
installation and operational complexity.
Ease of use: plug and play

The g-series supports the IEEE 802.1AB LLDP and ANSI TIA 1057 LLDP-MED
standards enabling organizations to deploy interoperable multi-vendor solutions
for unified communications. Configuring IP endpoints, such as VoIP, stations can
be a complex task requiring manual and time-consuming configuration.
LLDP and LLDP-MED address these challenges, providing organizations with a

standard and open method for configuring, discovering, and managing their
network infrastructure. The LLDP protocols help reduce operations costs by
simplifying and automating network operations. For example, LLDP-MED
provides an open protocol for configuring QoS, security policies, VLAN
assignments, PoE power levels, and service priorities. Additionally, LLDP-MED
provides for the discovery of device location and asset identity, information that is
used for inventory management and by emergency response services. These
sophisticated features make converged networks services easier to deploy and
operate while enabling new and critical services.

Comprehensive enterprise-class security
The g-series switches are powered by IronWare operating software, which offers
a rich set of Layer 2 switching services, Base Layer 3 functionality, an advanced
security suite for network access control (NAC), and DoS protection. IronWare
security features include protection against TCP SYN and ICMP DoS attacks,
Spanning Tree Root Guard and BPDU Guard to protect network spanning tree
operation, and broadcast and multicast packet rate limiting.
Network access control

Network managers can rely on features such as multi-device and 802.1X
authentication with dynamic policy assignment to control network access and
perform targeted authorization on a per-user level. Additionally, the g-series
supports enhanced MAC policies with the ability to deny traffic to and from a
MAC address on a per-VLAN basis. This powerful tool allows network
administrators to control access policies per endpoint device.
Standards-based NAC enables network operators to deploy best-of-breed NAC

solutions for authenticating network users and validating the security posture of a
connecting device. Support for policy-controlled MAC-based VLANs provides
additional control of network access, allowing for policy-controlled assignments
of devices to Layer 2 VLANs.
Traffic monitoring and lawful intercept

Organizations might need to set up traffic intercept, that is, lawful intercept, due
to today’s heightened security environment. For example, in the United States,
the Communications Assistance for Law Enforcement Act (CALEA) requires
businesses be able to intercept and replicate data traffic directed to a particular
user, subnet, port, and so on. This capability is particularly essential in networks
implementing IP phones. The g-series provides the capability necessary to
support this requirement through ACL-Based Mirroring, MAC filter-Based
Mirroring, and VLAN-Based Mirroring.
Network managers can apply a “mirror ACL” on a port and mirror a traffic stream
based on IP source/destination address, TCP/UDP source/destination ports, and
IP protocols such as ICMP, IGMP, TCP, and UDP. A MAC filter can be applied on
a port and mirror a traffic stream based on a source/destination MAC address.
VLAN-Based mirroring is another option for CALEA compliance. Many
enterprises have service-specific VLANs, such as voice VLANs. With VLAN
mirroring, all traffic on an entire VLAN within a switch can be mirrored or specific
VLANs can be transferred to a remote server.
Threat detection and mitigation
Support for embedded, hardware-based sFlow traffic sampling extends Brocade
IronShield 360 security shield to the network edge. This unique and powerful
closed loop threat mitigation solution uses best-of-breed intrusion detection
systems to inspect sFlow traffic samples for possible network attacks. In
response to a detected attack, network management can apply a security policy
to the compromised port. This automated threat detection and mitigation stops
network attacks in real time, without human intervention. This advanced security
capability provides a network-wide security umbrella without the added
complexity and cost of ancillary sensors.
Advanced multicast features

g-series switches support a rich set of Layer 2 multicast snooping features that
enable advanced multi-cast services delivery. Internet Group Management
Protocol (IGMP) snooping for IGMP version 1, 2, and 3 is supported. Support for
IGMPv3 source-based multicast snooping improves bandwidth utilization and
security for multicast services. To enable multicast service delivery in IPv6
networks, the g-series supports Multicast Listener Discovery (MLD) version 1
and 2 snooping, the multicast protocols used in IPv6 environments.
Building resilient networks

Software features such as Virtual Switch Redundancy Protocol, Foundry’s Metro
Ring Protocol, Rapid Spanning Tree Protocol, protected link groups, and 802.3ad
Link Aggregation, and trunk groups provide alternate paths for traffic in the event
of a link failure. Sub-second fault detection utilizing Link Fault Signaling and
Remote Fault Notification ensures rapid fault detection and recovery.
Enhanced Spanning Tree features such as Root Guard and BPDU Guard prevent
rogue hijacking of Spanning Tree root and maintain a contention and loop free
environment especially during dynamic network deployments. Additionally, the
g-series supports Port Loop Detection on edge ports that do not have spanning
tree enabled. This capability protects the network from broadcast storms and
other anomalies that can result from Layer 1 or Layer 2 loopbacks on Ethernet
cables or endpoints.
Base Layer 3 functionality enhances the capability of the g-series as an edge

platform. Base Layer 3 allows enterprises to use simple Layer 3 features such as
IPv4 static routes, virtual interfaces (VE), routing between directly connected
subnets, RIPv1/v2 announce, VRRP, DHCP Relay, and routed interfaces.
Network managers can remove complexity from an end- to-end Layer 3 network
design and eliminate the cost required for a full Layer 3 edge switch.

Fault detection
The g-series switches support logical fault detection through software features
such as Link Fault Signaling (LFS), Remote Fault Notification (RFN), Protected
Link Groups, and Unidirectional Link Detection (UDLD).
򐂰 Link Fault Signaling (LFS) is a physical layer protocol that enables
communication on a link between two 10-GbE switches. When configured on
a 10-GbE port, the port can detect and report fault conditions on transmit and
receive ports.
򐂰 Remote Fault Notification (RFN) enabled on 1Gb transmit ports notifies the
remote port whenever the fiber cable is either physically disconnected or has
failed. When this occurs the device disables the link and turns OFF both LEDs
associated with the ports.
򐂰 Protected Link Groups minimize disruption to the network by protecting
critical links from loss of data and power. In a protected link group, one port in
the group acts as the primary or active link, and the other ports act as
secondary or standby links. The active link carries the traffic. If the active link
goes down, one of the standby links takes over.
򐂰 UDLD monitors a link between two g-series switches and brings the ports on
both ends of the link down if the link goes down at any point between the two
devices.
In addition, the g-series supports stability features such as Port Flap Dampening,
single link LACP, and Port Loop Detection. Port Flap Dampening increases the
resilience and availability of the network by limiting the number of port state
transitions on an interface. This reduces the protocol overhead and network
inefficiencies caused by frequent state transitions occurring on misbehaving
ports.
Single link LACP provides a fast detection scheme for unidirectional or

bi-directional faults. This standards based solution works with other switch
vendors. The Port Loop Detection feature enables network managers to detect
and prevent Layer 2 loops without using STP. Enterprises that do not enable a
Layer 2 Protocol, such as STP to detect physical loops at the edge, can use Port
Loop detection. Port Loop detection can be used to detect loops occurring on a
port and within an entire network.
Table 2-35 g-series physical and thermal parameters

B48G B50G
Chassis type Fixed form factor Fixed form factor
H/W/D (cm) 6.68 x 44.45 x 49.78 6.68 x 44.45 x 49.78
Rack units (RUs) 1.5 1.5
Max. weight 11.36 kg 11.36 kg
Max. power draw 1200 W 1200 W
Op. temperature (°C) 0 - 40 °C 0 - 40 °C
Airflow Front-to-back Front-to-back
Number of fans 2 2
All the fans are not swappable and have fixed speeds.
Power parameters
All g-series models provide redundant and removable power supplies with AC
power option. Power supplies can be exchanged between B48G and B50G
models.
The g-series models provide Power over Ethernet (PoE) option.
Both power supplies provide power for the system and PoE ports.
Power supplies
The power supplies are auto-sensing and auto-switching, and provide 600 watts
of total output power, including +12VDC @ 10A to the system and -48VDC@
10A for Power over Ethernet applications. The power supplies provide 100-240
VAC input, 50-60Hz @ 8A to 3.2A. All are hot swappable and can removed and
replaced without powering down the system.

Power parameters for PoE
Table 2-36 g-series SYS power parameters

B48G B50G
Power supplies 100 - 240 VAC / 50 - 60 Hz 100 - 240 VAC / 50 - 60 Hz
Number of power supply 2 2

bays
Number of power supply 1 1

bays required for fully
loaded chassis
The PoE port density is shown in Table 2-37.
Table 2-37 s-series PoE power parameters

B48G B50G
10/100/1000 Mbps PoE 48 (with 2 Power Supplies) 48 (with 2 power supplies)

density with 15.4W each
10/100/1000 Mbps PoE 48 (with 1 power supply) 48 (with 1 power supply)

density with 10W each
Power specifications for PoE

The implementation of the 802.3af standard limits power to 15.4W (44V to 57V)
from the power sourcing device. This limit complies with safety standards and
existing wiring limitations. Though limited by the 802.3af standard, 15.4 watts of
power is ample, as most powered devices consume an average of 5 to 12 watts
of power. IP phones, wireless LAN access points, and network surveillance
cameras each consume an average of 3.5 to 9 watts of power.
Foundry 48-volt power supplies provide power to the PoE daughter card, and
ultimately to PoE power-consuming devices. The number of PoE
power-consuming devices that one 48-volt power supply can support depends on
the number of watts required by each device. Each 48-volt power supply provides
480 watts of power for PoE, and each PoE port supports a maximum of 15.4
watts of power per PoE power-consuming device. For example, if each PoE
power-consuming device attached to the g-series consumes 12 watts of power,
one 48-volt supply will power up to 40 PoE ports. You can install a second power
supply for additional PoE power.
Note: If your g-series device has 48 ports and only one power supply, and
each PoE enabled port needs 15.4 watts, then a maximum of 31 ports can
supply power to connected devices.

The g-series models provide a Store & Forward switching engine.
All g-series models have a maximum of 256MB RAM.

B48G B50G
10/100/1000 Mbps RJ45 44 plus 4 combo ports 44 plus 4 combo ports

ports per system
100/1000 Mbps SFP Ports 4 combo ports 4 combo ports

per system
10 GbE ports per system 2 2
Data switching capacity 136 Gbps 136 Gbps
Packet routing capacity 101 Mpps 101 Mpps
Interface types
򐂰 10 Gbps Ethernet port with CX4 connector

Transceivers
Table 2-39 and Table 2-40 show the available transceivers that can be used.




optics

optics




optics

optics

Optional features
The g-series is capable of providing Layer 3 functions. Following are the optional
features:
򐂰 Edge Layer 3 Premium Activation: Enables RIPv1/v2, OSPFv2
IBM g-series Ethernet Switches support various services, protocols, and
standards.
The following Layer 2 protocols are supported:
򐂰 Protected Link Groups
򐂰 Link Aggregation (IEEE 802.3ad, LACP)
򐂰 UDLD
򐂰 STP/RSTP/MSTP
򐂰 Root Guard
򐂰 BPDU Guard
򐂰 Up to 16000 MAC addresses (valid also for 8 unit stack)
򐂰 Up to 4096 VLANs
򐂰 Up to 253 STPs
򐂰 Up to 8 ports per trunk, up to 25 trunk groups
The following Layer 2 Metro features are supported:

򐂰 VLAN Stacking
򐂰 Metro Ring Protocol (MRP I)
򐂰 Virtual Switch Redundancy Protocol
򐂰 Topology Groups
򐂰 Super Aggregated VLANs (SAV)
The following Base Layer 3 features are supported:

򐂰 Virtual Interfaces (VE)
򐂰 Routed Interfaces
򐂰 IPv4 Static Routes
򐂰 Routing between directly connected subnets
򐂰 RIP v1/v2 announce
򐂰 Virtual Route Redundancy Protocol
The following Quality of Service features are supported:

򐂰 MAC Address Mapping to Priority Queue
򐂰 ACL Mapping to Priority Queue
򐂰 ACL Mapping to ToS/DSCP
򐂰 Honoring DSCP and 802.1p
򐂰 ACL Mapping and Marking of ToS/DSCP
򐂰 DiffServ Support
򐂰 Classifying and Limiting Flows based on TCP flags
򐂰 DHCP Relay
򐂰 QoS Queue Management Using Weighted
򐂰 Round Robin (WRR), Strict Priority (SP), and a combination of WRR and SP

The following traffic management features are supported:
򐂰 Inbound rate limiting per port
򐂰 ACL-based inbound rate limiting and traffic policies
򐂰 Outbound rate limiting per port and per queue
򐂰 Broadcast, multicast, and unknown unicast
For a complete list of supported standards and RFC compliance, see the
following website:
3
Chapter 3. Switching and routing

In this chapter we introduce the concepts of switching and routing. We take a
look at the general information needed to understand the difference between
these concepts and the ability of the IBM Ethernet products to provide solutions
in this area.

3.1 Brief network history
In this section we take a brief look at the history of computer networking and then
move on to the current industry best practice. The early concepts of networking
create the foundation to truly understanding the reason for changes in network
equipment.
3.1.1 Connectivity to the early computers

As we probably all know, computers did not start out as something that everyone
had on their desk. In fact, Thomas J. Watson Sr. is quoted as saying, “I think
there is a world market for maybe five computers.” Indeed, the world of computer
networking might have been very different if this had been the case.
The early computers used punch cards, the original implementation of what can
be referred to today as “Sneakernet”, although back then sneakers were not
really the fashion. Punch cards required a programmer to either punch out, or
shade in, circles on the card, where each circle represented a specific field for
the program or data set. After the programmer had finished creating the cards for
a program or data set, a large stack of cards was expected.
The programmer then took these cards to the computer lab where the operator
fed them into the card reader, and the computer read them and provided output
(often printed or on more punch cards). This output was then returned to the
programmer for analysis. As you can see, there is a lot of foot work involved in
moving data (in this case, cards) around, hence the term “Sneakernet,” which is
still used today to refer to passing data physically, whether the data is on memory
keys, portable hard drives, or tapes.
It was not long before computer languages evolved and computer terminals were
created to enable the programmer to enter data directly into a machine readable
file. These terminals were connected back to the computer by cables, usually
coaxial or twinaxial (a pair of coaxial cables together). Twinaxial cable allowed for
up to 7 terminals to be connected to the single length of cable. This was
effectively the first iteration of the computer network.
At that time, computers were expensive and people were not always sending or
receiving data between their terminal and computer, therefore a cheaper device
was placed in front of the computer to allow one connection on the computer to
be used by more than one terminal. This device was called a front end processor
(FEP). The FEP was able to control data for communications between the
terminal and the computer, allowing each terminal to have time to communicate
to the computer.
The data links were quite slow compared with today’s speeds and the displays
were text based, green screens. The FEP was one of the earliest network
devices. The FEP acted as the hub for communications, the terminals were
connected like spokes on a wheel, and this is where the terminology for a
hub-and-spoke network probably originated.
To this point, we have not really described any complex networking. The FEP is
really just allowing remote display of data. Transferring data between computers
still required physically moving tapes or hard disks between systems. Let us fast
forward a bit.
During the 1960’s, more computers were entering the market and data had to be
shared to improve processing power. In 1969 the Advanced Research Projects
Agency Network (ARPANET) was created to enable multiple computers to
connect together between various universities and USA government agencies.
ARPANET was the beginning of what we now call the Internet. The concept
of connecting more computers together evolved with the availability of
microprocessor computers. Xerox first suggested the concept of inter-networking
with an early version of Ethernet. Over time, TCP/IP became the standard
protocol for the Internet.
3.1.2 Introduction to Ethernet networks

To connect computers together, various technologies have been used in the past.
Ethernet became the main computer network of choice today, perhaps because
of its simplicity, perhaps because of its relative low cost.
Ethernet was not always the thin cable connecting into the computer that we
have today. It started out on a coaxial wire which was run from one end of the
required network to the other end, to a maximum of 500 meters, this is called a
bus topology. To connect a computer, the network administrator utilized a special
connection, called a vampire tap, which when inserted into the coaxial Ethernet
cable, mechanically pierced the shield and created two electrical connections,
one to the central wire in the coaxial cable, one to the shielding of the coaxial
cable.
This vampire tap had a connection that was then inserted into the personal
computer’s network interface card (NIC). The next generation of Ethernet used a
thinner coax cable with a maximum length of 185 meters for the entire single
network. This thinner cable was also designed to allow for easier connection at
the NIC. To add a computer, the network administrator had to cut the coaxial
cable and terminate each side with connectors, then both connectors were
attached to a T-piece that connected to the computer’s NIC.
Chapter 3. Switching and routing 111

This method of adding a computer to the network bus caused disruption on the
network as the cable was severed, connectors attached, and then connected to
the T-piece before connecting to the computer. The data from one computer
traversed the coaxial cable (bus) being seen by all other attached computers and
processed when addressed to that NIC.
To reduce the disruption of inserting a new computer into the Ethernet, the
design moved to a hub and spoke topology, introducing the cable and connector
we are familiar with today. The network hub was placed in a central location to
the site. Each computer, or spoke, was then connected by a cable made of
twisted pairs of wires which had RJ45 connectors at each end (RJ45 is the name
of the connector you typically use when connecting to a wired network today).
The new cable provided a maximum length of 100 meters, but now it was
measured from the network hub to the computer, instead of from one end of the
coaxial (bus) network to the other. Besides the hub and cable, not much really
changed in the logical design.
Important: In the early days of Ethernet computer networks, each computer

was able to see every packet as it traversed the network, whether it was the
intended destination or not.
3.1.3 Ethernet cables

In 3.1.2, “Introduction to Ethernet networks” on page 111, we mentioned that
computers were connected with various forms of cables. Each of these cable
types have two things in common:
򐂰 The electrical connections happen in pairs.
򐂰 Some protection is provided against electrical interference.
Coaxial has one pair of electrical conductors, twinaxial is two coaxial cables
bonded together, hence two pairs of electrical conductors. Today’s connections
utilize 4 twisted pairs; the twisting reduces the impact of electrical interference.
Terms such as Category 5 (Cat5) and Category 6 (Cat6) define how tightly
twisted these pairs of wire are. The tighter the twisting, the more resiliency the
cable has to electrical interference, which in turn allows the speed, or bandwidth,
of the network to increase.
3.2 Network communications

What are packets and why do we need them? To answer this question, we take a
high level look at data communications.
3.2.1 Introduction to data communications
As we are all aware, computers work in binary with bits, which are either on or
off, typically represented as a one (1) for on and zero (0) for off. These bits are
then grouped together in a set of eight (8) bits which is called a byte. While there
are more groupings, for now we only need to understand bits and bytes.
Data communications can either happen in parallel transmitting 8-bits (1 byte) at

a time over eight (8) conductors, or in serial, one bit at a time over one set of
conductors. Electrical properties require each conductor to be shielded from
electrical interference especially over the distances involved in a network. To
transmit data with minimal electrical interference, serial communications is used.
Because serial communications sends a single bit at a time, it is measured in

bits-per-second (bps). Due to the speeds required today, we typically see the
following terms used:
Kilobits-per-second (kbps) Defined as a data speed of 1,000 bits per
second. This term is usually only seen today in
wide area communications; depending on
region increments, it is commonly 56 kbps or
64 kbps.
Megabits-per-second (Mbps) Defined as 1,000 Kbps, or 1,000,000 bps.
Until recently, computers have utilized 10 Mbps
or 100 Mbps connections to the network. More
recently, 1,000 Mbps is now possible (Gbps).
Gigabits-per-second (Gbps) Defined as 1,000 Mbps, or 1,000,000,000 bps.
Although we typically expect this term to be
used in higher end computing at either 1 Gbps
or 10 Gbps, many of today’s PCs are capable of
1 Gbps Ethernet connectivity.
Terminology: The measurement of bits (b) here is important, especially for

those readers coming from a storage background. Traditionally storage is
measured in Bytes (B), for example, a disk might store 100 MegaBytes of data
or 100 MB.
3.2.2 Addressing the NIC

Each Network Interface Card (NIC) is assigned a unique address out of the
factory. This is a hardware level address and is known as the Media Access
Control address (MAC address). The MAC address is a 48 bit (6 Byte) address

and is typically notated in hexadecimal format, where two hexadecimal
characters represent 1 byte, for example, 00:16:6F:03:C8:0D.
Although the MAC address is factory set for each NIC, it is possible to change the
MAC address to a locally administered address. This is sometimes referred to as
MAC spoofing. Changing the MAC address is also used in some network
equipment for high availability purposes.
3.2.3 Computer network packets and frames

In the early days of computer networking, data had to be broken up into chunks
for two reasons. First, doing so keeps one machine from being able to utilize the
entire bandwidth of the network. Second and more importantly, early networks
were not as reliable as they are today and data was often lost in transit.
Each of these data chunks was called a packet or a frame. If a packet was lost in
transit, then only that small packet had to be resent instead of the entire set of
data. In general the term frame is used to define the OSI Layer 2 transport
protocol, in our case, Ethernet; and the term packet is used to define the OSI
Layer 3 network protocol, in our case TCP/IP. For more information about the OSI
model, see the Redbooks publication, TCP/IP Tutorial and Technical Overview,
GG24-3376.
There have been many different network types over the years that you might
have heard of, including Token Ring, ATM, and of course Ethernet. All of these
computer network technologies utilized packets to transmit data. Some media,
such as Token Ring, allowed for a variable maximum packet size depending on
various network parameters. Others, such as Ethernet and ATM, decided on a
fixed maximum packet size.
More recently, due to the increase in both speed and reliability of networks,
Ethernet has defined the jumbo frame. This frame is much larger than the normal
Ethernet frame, however, all network devices in the path must support jumbo
frames in order for them to be used.
Let us take a look at the standard 802.3 Ethernet packet. While there are two
types of Ethernet packets in use today, the 802.3 Ethernet packet is international
standard, so will be used for our examples. The Ethernet packet has a maximum
size of 1518 bytes, but it is permitted to be smaller depending on the data being
sent. Figure 3-1 shows the standard 802.3 Ethernet frame. The header is the first
25 bytes and includes the destination address (Dest Addr) and source address
(Source Addr). These addresses are the MAC addresses of the destination and
source NICs.
Figure 3-1 IEEE 802.3 Ethernet packet format
Note: The destination MAC address for every packet is very close to the start
of the Ethernet packet, starting only nine (9) bytes from the start of the frame.
3.2.4 Broadcast packets

Before we build on the Ethernet frame, we have to mention the broadcast packet.
This type of packet sets the destination MAC address to FF:FF:FF:FF:FF:FF and
every NIC on the Ethernet network segment will process that packet. This packet
can be a request for information that each NIC will then respond to, or simply a
request for information network information.

3.3 TCP/IP brief introduction
We now take a brief look into the TCP/IP protocol. Keeping in mind this book is
focused on switching and routing products, we do not dig too deeply into TCP/IP,
just enough for our purposes.
3.3.1 TCP/IP protocol

TCP/IP, which defines a number of different protocols, was agreed upon as the
standard protocol for the Internet in 1982. For the purpose of this chapter, we
only consider what is known as the IP header.
Each computer on a TCP/IP network is assigned an IP address. This is a 32 bit

address, usually written in dotted decimal format. Each number between 0 and
255 fits into a 2 byte space allowing IP addresses to look like; 10.100.5.20 or
192.168.1.100. A subnet mask is used to define how many hosts on a given IP
network, the subnet mask is represented as a number of bits, it has also been
written in dotted decimal format as well (for example, a subnet mask of
255.255.255.0 can also be represented as “/24”). In this book we use private IP
addresses for all examples, these private IP addresses were first defined in RFC
1918 and must not be routed on the Internet.
Earlier we saw an example of the Ethernet frame (Figure 3-1). This frame has a
field labeled “Info” with a size of “variable”. The IP header starts in this Ethernet
“Info” field.
Figure 3-2 shows the IP header. The fields of interest to us are the 32-bit source
IP address and the 32-bit destination IP address. The source IP address is the IP
address assigned to the system initiating the packet. The destination IP address
is the IP address assigned to the target system for the packet.
Figure 3-2 IP header
3.3.2 IP addresses
Of course we typically do not use IP addresses in daily life. Instead we use
names that are more easily read by humans, for example, www.ibm.com refers to
the server named “www” in the domain “ibm.com”. These names utilize the
domain name system (DNS) to translate human readable names into IP
addresses. So your TCP/IP packet destined for www.ibm.com has an IP address
of one of the servers that IBM provides for Internet data.
Note: Each TCP/IP packet contains source and destination IP addresses. The
destination IP address is located within the IP header, which is contained
within the Ethernet frame, starting forty-two (42) bytes into the frame.

3.4 Switching
We now move from the network foundational sections into how the packets are
delivered within a network. First we start with a Local Area Network (LAN)
example and build on that.
This book is not a detailed network protocol book, so we do not delve into the
details of how each device learns the correct destination MAC address. For more
information, see the product configuration guide for your device
For our LAN, we assume that all computer (servers and workstations) are
located on the same network, as shown in Figure 3-3.
Figure 3-3 Single segment LAN
3.4.1 Ethernet packet delivery through coaxial cable or a hub

In the networks described in Figure 3-3, a packet destined from workstation A to
the server only requires MAC address information. The Ethernet cable connects
all computers in a bus topology. Each frame traverses the network, being seen by
every computer. If the packet was destined for the MAC address assigned to the
device, it takes the packet off the network and processes it on the NIC.
If the packet was not destined for the MAC address of the device, it simply
passes the packet on to the next network device. This is true for the coaxial and
hub deployments of an Ethernet network.
The bus topology proved to be both inefficient and insecure; inefficient in that the
network was often congested with packets and each NIC had to determine
whether it was the intended destination or not; insecure in that any computer on
the network can potentially have its NIC forced into promiscuous mode.
Promiscuous mode instructs the NIC to process every packet regardless of the
intended destination, making it possible for software to then filter out specific data
and provide the unauthorized user with data that they must not have access to.
Ethernet networks can be extended by the use of bridges which can connect two
physical network segments together. These bridges learn the MAC addresses of
devices connected to each segment, including other bridges, and accept packets
destined for the other segments connected to the bridge.
Promiscuous mode has uses even today. Promiscuous mode is often used in
detailed network analysis, or for specific network security devices (such as IDS)
to determine whether unauthorized traffic is present on the network. However, it
is used by trained professionals and in controlled environments, both physically
and logically controlled.
3.4.2 Ethernet packet delivery through a switch

Although the physical network for a switch does not look much different to that of
a hub based network, the logical differences are much greater. The switch is
located in a central part of the building and cables are run from the switch to each
device needing to connect to the Ethernet network. Figure 3-4 shows the same
Ethernet network, but now we are using a switch.
The switch has some in built Ethernet intelligence. It learns the MAC addresses
of each device connected to each specific port. Therefore, if a packet is sent from
Workstation A to Server Z, when the switch receives the packet, it opens the
Ethernet Frame to determine the Destination MAC address. Then the switch
uses the table of MAC addresses it has built up and forwards the packet out the
interface that has the destination MAC address connected to it.
In this way, each network device acts like it is connected to its very own hub.
In a well built switch there is no undue contention for network resources, and any
contention that does exist can be buffered. Each NIC still confirms it is the
intended recipient of the frame, but is only receiving frames intended for it, either
directly specified or broadcast packets.

Because the packet is sent only to the intended destination, the other users of
the network cannot see data from other devices. This greatly improves the
security of the network, but this does not solve all network security issues.
Consult a network security expert for your detailed security needs. Figure 3-4
illustrates an Ethernet network switch environment.
Figure 3-4 Ethernet Switch Environment
Note: An Ethernet switch learns the MAC addresses of the systems

connected to each port and only delivers Ethernet frames to the intended
destination MAC address.
3.5 Routing
So far we have discussed Ethernet networks on a single physical segment or
bridged environment. These do not scale well as each device on the LAN has to
keep a table of the MAC addresses of every other device. Also consider Ethernet
networks still rely on broadcast traffic for certain communications. This broadcast
traffic is best kept to as small an environment as possible. Similarly, a faulty NIC
can still overwhelm some switches, this can cause faults on the Ethernet
segment that NIC is connected to. For these reasons Ethernet segments are also
referred to as broadcast domains, or fault domains.
3.5.1 TCP/IP network
As we build on our existing example, again we do not delve into the detail of how
each device learns the correct destination MAC address. This time the network
architect has connected the servers onto one Ethernet segment. Similarly, the
workstations have been placed on another, separate, Ethernet segment. Both
segments are using Ethernet Switch technology.
Figure 3-5 shows a diagram of this network:

򐂰 The servers are connected to the network with IP address 10.192.10.0/24.
򐂰 The workstations are connected to the network with IP address 10.2.20.0/24.
򐂰 The router is connected to both networks and has an IP address in each
network. This IP address is configured as the default route for all computers
connected to each network.
Figure 3-5 IP Network with servers and workstations on separate segments connected
by a router

The switches still learn the MAC addresses of the devices connected to each
port. If the users need to communicate with the servers, the TCP/IP protocol on
the users computer knows to direct the packet to the default gateway (that is, a
router). Similarly, the servers TCP/IP protocol also directs any packets not
destined for the local network, to the default gateway address. The router then
knows which network is connected to which interface and will redirect packets to
the correct interface.
If we consider the packet from workstation A to server Z. Workstation A initiates

the packet destined to the IP address of Server Z. The TCP/IP protocol on the
workstation directs the packet to the default gateway. The user access switch has
already learnt the port that connects to the MAC of the router and directs the
packet to that port. Now the router looks at the destination IP address of the
packet and determines the interface of the destination network and the MAC
address of the destination server Z.
Remember that the destination IP address is the 8-bytes starting 42-bytes into
the Ethernet packet. The router also changes the destination MAC address to
that of server Z before sending the packet towards the server access switch. The
server access switch has already learnt which interface the destination MAC
address is connected to and transmits the packet to the correct interface. Finally,
server Z receives the packet and processes it.
To make a routing decision, the router must process the destination IP address
which is 42-bytes into the Ethernet Frame. While this sounds like a time
consuming, convoluted process the actual time taken is typically in the order of
milliseconds.
Note: The router makes decisions based on the destination IP address, which
is located 42-bytes into the Ethernet Frame.
3.5.2 Layer 3 switching
Historically, the router has made routing decisions in software on the router’s
CPU, which was slower than the switch. More recently, routers have migrated to
making routing decisions through application specific integrated circuit (ASIC)
hardware. Such routers are often called Layer 3 switches, referring to the OSI
Layer 3 and the fact that the routing decision is made in hardware on the ASIC.
Terminology is dependent on the audience and most network routers now utilize
ASIC hardware.
Note: A Layer 3 (L3) switch is simply a router that makes routing decisions
through ASIC hardware rather then code on a microprocessor. Most purpose
built network routers today are in fact L3 switches.
3.6 IBM Ethernet switches and routers

The IBM Ethernet products cover both the switching and routing functions. The
switches are the g-series and s-series; the routers are the c-series and m-series.
The s-series and g-series are designed more as switches. They can be operated
as routers with some routing functions limited by hardware design and software
limitations.
Similarly, the c-series and m-series are designed as routers and have hardware
and software capable of supporting more routing functions. These devices can
also operate as switches at Layer 2.
All routing functions are executed in hardware on ASICs, so that the IBM
Ethernet product range covers all L3-switches.

4
Chapter 4. Market segments addressed

by the IBM Ethernet
products
Within Information Technology (IT), various market segments have emerged.
In this chapter we introduce the following market segments:
򐂰 Data center
򐂰 Enterprise
򐂰 High Performance Computing
򐂰 Carrier

4.1 Data center market segment
The data center (DC) is where all the IT systems come together in a robust
environment. A single outage in a data center can cause major interruptions to a
company, including loss of business. Therefore the data center focuses on
reliability and resiliency of systems or services to an end user. With this goal in
mind, the DC usually ensures physical and infrastructure resiliency.
Typical physical resiliency takes into account things such as physical location to
avoid building in a geologically or politically unstable area, fences, and
RAM-RAID prevention techniques are usually employed. Further physical
security is used to ensure that the DC has controlled access, which might include
multiple levels of security and authentication (for example, locked doors and
physical guards) before a person can access equipment within the DC.
A data center deployment also needs to consider business continuity and

disaster recovery planning. This might include connecting to another data center
with sufficient data connectivity between the two sites to operate one site as a
backup data center. The backup data center in such a case is referred to as a hot
or warm backup data center and can take over operations in the case of a
catastrophic event to the primary DC.
Infrastructure resiliency can address such areas as power, cooling and

telecommunications.
Power resiliency can be provided by a combination of an Uniterruptible Power

Supply (UPS) system as well as on site power generators. The UPS is sized to
provide sufficient power to the DC for the time period required to ensure that the
generators are providing power to the DC.
Cooling can include the design of a cooling systems that contain multiple cooling
units. The cooling system is designed in such a way that the loss of one cooling
unit still allows the rest of the cooling system to maintain the correct temperature
for the equipment in the data center.
Telecommunications resiliency can address such issues as redundant diverse

paths or diverse carriers. Diverse telecommunications paths are created by
ensuring that the carrier can provide one circuit entering the DC from one side
(such as the East) and the backup circuit entering the DC from the other side
(such as the West). It is also important to work with the carrier to ensure that
different exchanges and physically diverse media are used in the end-to-end
path. The DC will not achieve telecommunications resiliency if the East and West
circuits connect into the same exchange, or share the same trench for the
physical circuit along part of the route. A back-hoe is not going to differentiate
between the primary and backup circuits in a trench.
Capacity management of redundant telecommunications is both a design
consideration as well as a cost consideration. There are two typical modes of
operation for dual circuits:
򐂰 Active / Backup
򐂰 Active / Active
With Active / Backup, the two circuits have the same bandwidth, but only one is
active at any time. In case of a loss of the active circuit, data will traverse the
backup circuit without performance degradation because both circuits are
maintained at the same bandwidth. It is very important to ensure that any
increase in bandwidth on the active circuit is also reflected in the backup circuit.
For example, if 10 Mbps is required between two sites, each circuit must be
configured to 10 Mbps, if an increase in bandwidth of 2 Mbps is required, then
both circuits must be upgraded to 12 Mbps. Some carriers might provide
discounted services for a nominated backup circuit.
In the other option, Active / Active, both links transport data at the same time,
allowing both links to be configured to less than the total bandwidth required,
which might be preferable in situations where the carrier does not offer discounts
for nominated backup circuits. However, to ensure minimal disruption in the case
of a circuit outage, they must both have the same bandwidth.
It is also important to utilize a data protocol that can balance traffic over paths of
equal cost. In this case, if one link fails, the other link transports all the traffic with
some degradation to services. As a general rule of thumb, each circuit must be
maintained at 75% of the required bandwidth, which will result in a degradation of
25%. For example, if 10 Mbps is required between two sites, each circuit must be
at least 7.5 Mbps. If one circuit fails, the other circuit can carry 75% of the total
expected traffic, resulting in a 25% degradation of bandwidth.
Also note in this example that if an increase in the required bandwidth of 2 Mbps
is required (from 10 Mbps to 12 Mbps), then both circuits must be increased to
9 Mbps (12Mbps * 75%). Keep in mind the calculations are based on required
bandwidth, not the usable bandwidth. It might also benefit the site to operate a
policy controller which can drop traffic that is not business critical in the case of a
circuit outage.
There are many components to the data center, including application services,
storage services, data center network infrastructure, and multiple connections at
the data center edge to various environments. Figure 4-1 shows the interaction of
these components within a data center. In this example, the WAN component
can connect to other locations for the Enterprise. The LAN connectivity is used
where the building that houses the data center also houses the Enterprise users.

Figure 4-1 Data center component interaction
With the growth of IT and the cost of providing full DC services, it is not unusual
for a DC site to house more than one Enterprise. In this case the network
architect needs to determine whether to deploy multiple network devices or
utilize virtual separation. For more information, see Chapter 5, “IBM Ethernet in
the green data center” on page 133.
With a resilient data center network architecture, most network architects will
follow a hierarchical design. Although it is possible to deploy a single tier
architecture, it is usually only suited to very small locations where fault domains
and function segregation is not required. These small locations are not those
expected in a data center.
Tiers can consist of physical or logical tiers. The traditional multi-tier network
architecture shows the connectivity between the Access, Distribution, and Core
layers, as well as the edge connectivity from the Core. In some cases it is
preferable to collapse some components of the infrastructure such as Access
and Distribution, or Distribution and Core.
Figure 4-2 shows a typical multii-tier data center infrastructure.
Figure 4-2 Multi-tier data center infrastructure
The IBM Ethernet products are suited to all tiers within the data center. These
are discussed in more detail in Chapter 11, “Network design for the data center”
on page 249.
4.2 Enterprise market segment

The Enterprise market segment encompasses organizations that have a need for
high-availability (HA) in their IT services. The Enterprise can also have multiple
facilities, which can be dedicated for a data center (see 4.1, “Data center market
segment” on page 126). One facility might house employees, while another
facility might function as both a data center and employee location. For the
purpose of this book, we focus on the Enterprise segment being the user access
component of a corporate network.

Industry standard network architecture utilizes the multi-tier approach as
illustrated in Figure 4-3.
Figure 4-3 Enterprise Campus Network Infrastructure
An Enterprise can be a single location with users requiring access to services

housed within a data center. This data center might be located in the same
campus, or might be in a remote, hardened facility. A larger Enterprise might
consist of multiple locations, but typically maintain the standard multi-tier model
for Enterprise networks. Some servers might exist within the Enterprise LAN, for
example, a print server provides the ability for users to print to a local device, or a
group level file server might be located at a site when users have a requirement
to share files at a frequent basis, but backup of those files is not required as
often.
4.3 High Performance Computing market segment
High Performance Computing (HPC) is found in various commercial and
research markets. It refers to the ability to connect various compute systems
together with high speed network links forming a cluster of computers. This
allows parallel processing of complex tasks. There are two primary differentiators
in HPC, these are:
򐂰 Tasks requiring low latency
򐂰 Tasks requiring high bandwidth
It is also possible that a particular HPC intent might require both low latency and
high bandwidth.
HPC processing is often required in scientific research fields where large

complex computational procedures must be executed. Consider a geological
research application processing seismic data to identify the most probable
location for an oil deposit. Geological research obtains large data sets from
seismic testing, this data then needs to be analyzed. In this case the data sets
require high bandwidth for speedy distribution. However, each data set can be
processed independently and the results combined at the end of computation.
On the other hand, when modeling data where all HPC nodes need to be in
constant communication, advising all other nodes what they are working on, a
more important metric is low latency. This feature can be used in atomic research
or weather forecasting, where computation is complex and dependent on the
results from another node. If the latency is not low enough, the HPC cluster might
find that a number of nodes are waiting for the results from a single node before
computation can continue.
4.4 Carrier market segment

Carriers provide connectivity between data centers and/or Enterprises. A carrier
can also be a telecommunications vendor, providing fixed line and wireless
networks, or they might utilize another telecommunications vendor. The carrier
segment can include data transport over a metropolitan or wide area network.
The wide area network can be across the city, across the state, across the county
or across continents.
Because the carrier’s business is solely to transport data, their network is typified
by a predominance of data ports. Their equipment will transport data from many
customers, so data separation is of great importance.

The carrier can transport voice, video, or data. Each of these services has
different priority needs. Voice is the most sensitive to latency or lost packets; a
“choppy voice” call turns into unintelligible noise.
Video is less sensitive, but data loss is still noticeable when large sets of video
data is lost. Most streaming video applications will allow for some jitter and not
display data that is too far out of sync. Other data is less concerned about loss or
latency, except in the case of human input, where slow response is often
reported. However, even this slow response complaint is nowhere near as
sensitive as loss of voice data.
Consider a situation where multiple customers are located in different office

space (or floors) within the same building. The carrier might have a closet of their
own in the building where they install their Provider Edge (PE) equipment. From
this closet, the carrier can then run separate physical connections (wire or fiber)
to the different office spaces as required. In this case, the PE needs to run carrier
protocols and maintain separation of data, in fact, the two customers might even
be using the same private IP address space.
Figure 4-4 shows an example of a metropolitan network.
Figure 4-4 Carrier Metropolitan Network
5
Chapter 5. IBM Ethernet in the green

data center
In this chapter we provide information about the potential of the IBM Ethernet
products to reduce the impact on the environment in an operational deployment.
This benefit can be accomplished as follows:
򐂰 Reduced power usage
򐂰 Increased port utilization
򐂰 Virtualization of functions

5.1 Key elements
Over the years, the computing infrastructure has continually expanded and
became increasingly demanding of space, power, and equipment, possibly due
to the focus on environmental impact, or just the evolution of the IT industry itself.
However, today the computing infrastructure does not have to be the sprawling
data center of the past. Organizations are looking at ways to effectively use
technology while reducing impact to the environment. The data center of
tomorrow will become more “green” with the correct application of technology.
The three main components of environmental impact from a functional data

center can be categorized as follows:
򐂰 Total power consumption, where each device uses power and creates heat,
then the heat has to be removed by air conditioning units
򐂰 Inefficient power utilization, whether in the design of the power supply, the
number of spare ports per network device, or the percentage of unused CPU
cycles
򐂰 Multiple devices being deployed to overcome complexities of networking
Although the “silver bullet” is still eluding us, the IBM Ethernet products can
provide some solutions and overall benefits to the “green data center”
5.2 Power consumption

Data centers are continually looking at ways to reduce the impact on the
environment. Some of the biggest issues in the data center today are power and
heat dissipation, which is a change from the past, where space was the most
sought after resource in the data center, and our main concern was how to
reduce power consumption in the data center.
5.2.1 General power consumption

Reducing power consumption is beneficial on two fronts; it saves the
environment for future generations, and it reduces cost. Because a lot of power is
generated from non-renewable sources, this impacts the environment, both in the
sustainability of producing the power and in producing green house gasses. It
can be argued that other forms of power generation impact the environment in
other ways we have not yet fully considered as a people. Reducing power
consumption to a minimum is beneficial to the business as well as the
environment.
Within the data center, a reduction in the power consumption of equipment can
mean more equipment can be powered by the same emergency infrastructure,
that is, existing UPS and discrete generators. However it does not have to also
mean loss of features.
IBM Ethernet products have some of the industry’s lowest power consumption
per port on each of the products, as shown in Table 5-1 for select products in the
IBM Ethernet range. For specific details on each product, see Chapter 2,
“Product introduction” on page 31.
Table 5-1 Power consumption per Port

Maximum Ports Maximum Power Power per Port
Consumption (Watts - W)
(Watts - W)
x-series 24 (10 GbE) + 176 W 7.3 W / 10 GbE

4 (1 GbE)
g-series (without 48 (1 GbE) + 204 4.25 W / 1 GbE

PoE) 2 (10 GbE)
g-series (with PoE) 48 (1 GbE, PoE) + 844 17.58 W / 1 GbE

2 (10 GbE)
c-series 48 (1 GbE) + 295 6.14 W / 1 GbE

2 (10 GbE)
s-series (without 384 (1 GbE) 2,440 6.35 W / 1 GbE

PoE)
s-series (with PoE) 384 (384 PoE) 10,037 26.14 / 1 GbE

and 2500 W PoE
Power Supplies
m-series 128 (10 GbE) 10,781 84.23 W / 10 GbE
5.2.2 PoE power saving features

The IBM s-series chassis have a unique feature to greatly increase the
opportunity for power saving. The PoE supply can be configured to switch on and
off according to a daily schedule, which can represent huge power savings if
power can be shut off to the PoE devices during non-working hours. The
schedule can be totally automated, allowing the loss of power to PoE equipment
to be transparent to the user. This feature is only available through the IronView
Network Manager (INM).
Chapter 5. IBM Ethernet in the green data center 135

Let us consider ways that a site with an s-series chassis can be configured, in
three power usage scenarios (Table 5-2):
򐂰 The first type of site is powered seven (7) days per week, fifty-two (52) weeks,
twenty-four (24) hours per day.
򐂰 In the second type of site, PoE supply is only required five (5) days per week,
fifty-two (52) weeks, twenty-four (24) hours per day (that is, no power required
over weekends).
򐂰 In the third type of site, PoE supply is only required five (5) days per week,
fifty-two (52) weeks, but only 10 hours per day (allowing 8 hours per day, plus
an hour either side to ensure PoE for employees who arrive early or stay late).
For every power supply there is a potential cost savings of around 29% to
remove power on the weekends and around 70% to remove power during
non-office hours.
Table 5-2 PoE Potential Power Savings

PoE Power supply 7 days / week Mon. - Fri. Office hours
Watts (max. input 7 * 52 = 364 days 52 * 5 = 260 Days 52 * 5 = 260 days
Watts) 24 hours 24 hours 10 hours
2775 24,142 kWh 17,316 kWh 7,215 kWh
Savings 29% 70%
Safety: Human safety is a primary concern. For a site with VoIP handsets, or
PoE IP surveillance cameras, controls need to be in place for human security.
If an employee is in the site during normal PoE down time, they must be safe.
This might require a process to allow the PoE to be enabled upon request, or
alternate communication measures (for example, cellular / mobile phone).
5.3 Power utilization

In conjunction with keeping the overall power consumption of each device as low
as possible, greater power efficiencies can be obtained by utilizing as many ports
as possible. Power efficiencies drop off quickly if the total available ports are not
utilized.
Table 5-3 shows the effect on the power per port if they only have the maximum
number of ports are used on the m-series Ethernet Router.
Table 5-3 Power consumption per Used Port

Ports used Maximum power Power per port
consumption (Watts - W)
(Watts - W)
m-series 320 5191 16.2
m-series 160 5191 32.4
We now consider some options to assist in utilizing the available ports more
effectively.
5.3.1 VLAN
The Virtual Local Area Network (VLAN) is not a new concept, and many
networks architects already utilize these wherever possible. VLANs allow for
multiple Layer 2 networks to exist on the same network device, without
interaction with each other. They are typically used for separation of logical
teams or groups, perhaps one VLAN is configured for the finance department
while another VLAN is configured for the developers. These two VLANs can exist
on the same device which allows for separation of function without installing
separate equipment for each group.
Various organizations have differing opinions on how VLANs can be used for
separation in a secured environment. Consult your organizations security team
before deploying VLANs.
5.3.2 VSRP
The Virtual Switch Redundancy Protocol (VSRP) can also assist in maximizing
the use of equipment and connections. As discussed in the Chapter 6, “Network
availability” on page 141, VSRP removes the requirement for STP/RSTP and can
improve the Layer 2 resiliency. VSRP can also be configured so that links can be
active for one VLAN and backup for another VLAN. With this configuration, it is
possible to load balance and allow both physical links to be utilized for different
VLANs. In contrast to STP/RSTP, that can block one of the interfaces for all
traffic. While this might result in degradation of the link during a failure, if both
links were being used near their theoretical maximum, it provides maximized use
of the links during normal operations.

5.3.3 MRP
The Multi-Ring Protocol (MRP) provides a Layer 2 loop free topology in a ring
environment. as discussed in Chapter 6, “Network availability” on page 141.
There are two main benefits to MRP in regards to power utilization / efficiencies.
Each VLAN can have a specified MRP Master switch. Therefore, in a ring with
two VLANs operating on it, the network can be configured with a different MRP
master switch for each VLAN. Due to the operation of MRP, each master switch
will block the port where it receives its own Ring Hello Packet (RHP). With two
different master switches, the blocked port for one VLAN will not be blocked for
the other VLAN. This allows every link to be utilized for traffic.
In addition to separate configured MRP master switches, Topology Groups can

be defined containing multiple VLANs. For each topology group the master
switch only initiates a single RHP packet and defines the ring topology, blocking
the port the RHP packet is received on. Each VLAN that is a member of the
topology group then shares the ring topology with the same port blocked for all
member VLANs. This reduces the CPU overhead of maintaining separate ring
topologies for each VLAN.
Combining these two methods of efficiencies can allow for all links to be utilized
with a minimum of CPU overhead. MRP is a proprietary protocol available on all
the IBM Ethernet range of products.
5.4 Device reduction

Historically a proliferation of devices occurred in the data center, to allow for
controlled access to services. In the case of a service provider, simply
configuring access between the client’s dedicated service and their access point
to the data center required
We have already seen in 5.3.1, “VLAN” that the physical number of switches can
be reduced by deploying multiple VLANs on a single switch, this is fairly common
practice today. However that is where the technology stayed for many years.
More recently virtualization has been developed for routers as well. On the IBM
m-series Ethernet Routers, Virtual Router and Forwarding (VRF) is available.
5.4.1 Common routing tricks
Here we consider some historical designs for an environment where multiple
clients needed to connected to a single data center, but to dedicated systems.
To achieve this capability, each client has required a separate data connection
that connects into a dedicated client router and perhaps a dedicated firewall.
Often clients have chosen to utilize private address space (RFC 1918), which
causes routing conflicts if that address space is utilized on the same network.
This caused many different routing tricks to be created and used. Depending on
the situation, routing might include:
򐂰 Network Address Translation (NAT)
򐂰 Policy Based Routing (PBR)
򐂰 Tunnelling
NAT exists when an IP address is translated to another IP address unique to the

destination environment. Various methods have been used including 1:1 address
translation which can quickly exhaust a service providers IP address space, this
method does allow bidirectional communication to the client’s IP address. This
also had a variant whereby only the network portion of the address was changed.
This network to network address translation reduced the complexity of single
NAT configuration for each IP address.
Another form of NAT provides a many-to-one address translation, where any IP

address from one side of the network device is translated to a single address on
the other side of the network device executing the translation. This type of NAT
has been known as address masquerading or port address translation (PAT). The
NAT device changes the source IP address and the source port of the TCP or
UDP packet, keeping a table of the original IP address and the source port
assigned. The response packet is then destined to the port assigned by the NAT
device, which can then execute a reverse lookup to provide the original IP
address and destination port. This method was used to provide access to shared
services that did not carry any IP address information in the payload of the
packet.
PBR exists where a routing decision is made based on the known source
address of the packet. This method required each client to have a dedicated
destination within the service provider network. This dedicated destination has
an IP address that was either registered or unique within the data center. The
routing decision for a packet destined to the client’s network, was made by the
PBR configuration. The PBR configuration defines the next hop address based
on the source address of the packet.

For tunneling, various methods have been used, encryption and clear tunnels, all
with the same end result. Tunnelling also requires the client to be connecting to a
dedicated service, however the IP address can be part of the client’s IP
addressing scheme. Tunneling encapsulates the client’s packet within another
protocol, for transportation between two end-points. These two end-points are
typically the Layer 3 device providing the client’s entry point to the data center
and the Layer 3 device directly in front of the destination system. This method
adds overhead to every packet that utilizes the tunnel. For tunnels using
encryption this also adds extra overhead to the encryption session.
All these solutions required dedicated equipment which took up space, power
and air conditioning.
5.4.2 VRF
Virtual Routing and Forwarding (VRF) tables allow for multiple instances to be
configured for route and forwarding information on the same network device. The
VRF configuration include a Route Distinguisher (RD) which is unique to each
VRF within the network. The VRFs must connect to the same Layer 3 network,
although the intervening Layer 2 networks can vary, a combination that allows for
the same private addresses to be routed on the same IBM Ethernet routers
without conflicting with each other. Doing this simplifies the creation of a trusted
virtual private network (VPN) without the complexity or overhead of other
solutions such as MPLS.
The deployment of VRF in the Layer 3 router can reduce the number of separate
routers required within a data center. Network complexity can never be solved
completely; however, reducing the number of devices and deploying trusted
VPNs through the use of VRFs is achievable today with IBM’s m-series and
c-series Ethernet/IP devices.
6
Chapter 6. Network availability

In this chapter we provide an overview of the high availability features available in
the IBM b-type Ethernet product range. We cover the following features:
򐂰 Network design
򐂰 Layer 2 availability features
򐂰 Layer 3 availability features
򐂰 Protected link groups
򐂰 Link aggregation groups
򐂰 PoE priority
򐂰 Hitless upgrades

6.1 Network design for availability
As mentioned in Chapter 3, “Switching and routing” on page 109 chapter,
networks, historically, were seen as unreliable. Because of these unreliability
issues, networks have evolved to allow designs to reduce the single points of
failure. Although technology has improved the reliability of protocols and data
transmission over wired or wireless media, there are still instances of outages.
So while the reasons for unreliability have changed, it is still a best practice to
design a network for high availability.
While designing for high availability, it is also important to ensure that

management and monitoring processes are in place. Management and
monitoring of issues on the network must identify alerts that might leave a single
point of failure. Such alerts must be rectified quickly to ensure that high
availability is maintained.
6.1.1 Hardware resiliency

The closer to the core of the network a device is located, the greater the number
of compute devices relying on each network device. At the access tiers, where
systems connect, hardware resiliency might not provide the cost benefits
required. However, the distribution and core tiers must consider resilient
hardware and design to enable high availability.
Network devices, such as switches and routers, designed with hardware

resiliency in mind, typically include the following components:
򐂰 Redundant power supplies
򐂰 Redundant fan units
򐂰 Redundant fabrics
򐂰 Redundant management modules
If any of these components fail, the device will send an alert to the network
monitoring stations, typically by SNMP and or Syslog. This alert must be acted
upon to maintain network reliability. For example, if the fan unit on the primary
device fails, it will send an alert to the network monitoring stations as the backup
device takes over operations. If that alert is simply filed away and no action taken
to rectify the problem, in a couple of weeks, if power is lost to the backup device
(now operating as primary), there is nothing to fail over to. Simply replacing the
fan unit as soon as possible allows the system to fail over on any fault on the
backup unit.
Note: An integral part of a high availability design is network monitoring with
defined actions for any alert. Fixing any fault that has been alerted will
maintain the ability for the high availability design to maintain operations.
6.1.2 Infrastructure resiliency

No amount of high availability design work will help the network if all devices rely
on a single instance of infrastructure components. For example, if all the network
devices in a rack are connected to a single power feed or circuit breaker, a power
failure will cut power to all devices in that rack. Even if your site only has one
power feed from the street, simply using power from a different distribution board,
or circuit breaker, will provide resiliency. If a system’s redundant power supply is
connected to the same power circuit as the other power supplies, the single point
of failure is the power circuit.
Similarly, with telecommunications feeds, the network architect can work with the
site operations team and carrier, to ensure diverse paths into the data center. We
will assume the site operations team also consider other site environmental
factors before deciding on the site. For example, floor loading capability,
uninterruptible power supply (UPS), power generators, air conditioning, humidity
controllers, static electricity protection, and fire protection, just to name a few.
6.2 Layer 2 availability features

Switches operate at Layer 2 of the OSI model. They can be used at various tiers
of the network design. To avoid data being caught in a physical network loop,
Layer 2 protocols are used to create a logical single path in an otherwise meshed
network. These protocols also act to provide HA capabilities. However, at the
access tier, the computer must be able to support dual Layer 2 connections to
the same network.
6.2.1 Spanning Tree Protocol and Rapid Spanning Tree Protocol

To avoid Layer 2 protocols allowing packets to follow a never ending loop,
Spanning Tree Protocol (STP) was developed and defined in the 802.1d protocol
standard. This protocol identifies loops in the physical network topologies and
blocks traffic over interfaces creating a loop, according to the STP definition. STP
still permits the flow of Bridge Protocol Data Unit (BPDU) communications
through an otherwise blocked link. This provides link management as well as
redundancy.
Chapter 6. Network availability 143

If STP identifies the chosen primary path as no longer available, it will unblock
the backup path allowing regular Ethernet traffic over that link. STP has a slow
recovery time, between 30 and 60 seconds to identify a failed path and permit
traffic over the backup path.
Due to this lengthy delay, Rapid Spanning Tree Protocol (RSTP) was developed
as the standard 802.1w protocol. While there are differences in the operation of
RSTP, the basic function is the same as STP. Ports creating a loop are logically
blocked until a fault is identified in the chosen primary path. The benefit of RSTP
is the greatly reduced failover time, RSTP has a failover time between 50ms and
5 seconds.
In today’s networks, RSTP is the preference for Layer 2 protection. Other

methods typically build on RSTP, but provide different functions and benefits. By
running RSTP, the network still has a fall back in case other methods fail.
The Multiple Spanning Tree Protocol (MSTP) allows a separate instance of

spanning tree to run for each port based VLAN. For more information about the
various forms of spanning tree, see the product configuration guide for your
device.
6.2.2 Virtual Switch Redundancy Protocol

Virtual Switch Redundancy Protocol (VSRP) is a Brocade proprietary protocol
that can be used as an alternative to STP / RSTP for use in a Layer 2 meshed
environment. It provides one of the other methods for Layer 2 protection as
mentioned in 6.2.1, “Spanning Tree Protocol and Rapid Spanning Tree Protocol”
on page 143. It is an extension of the Brocade VRRPE protocol, which we
discuss in 6.3.3, “VRRP and VRRPE” on page 147, but it offers Layer 2 and
Layer 3 redundancy.
VSRP switches are all configured as backup switches and an election process
selects the primary switch. The primary switch broadcasts VSRP packets which
are forwarded by VSPR aware switches. If the backup switches do not receive
the hello packet within the configured time period, the next highest priority
backup switch takes over as the primary, providing sub-second failover.
VSRP allows the network administrator to configure a “Track Port.” If a configured

track port fails, VSRP will signal this failure and can then automatically change
the active switch. Because this feature does not depend on a time-out value for a
BPDU packet, the failover is completed in sub-second time from the failure of the
track port.
Each VLAN can run a separate VSRP, and MAC addresses are assigned to each
VSRP instance, thus allowing for better utilization of the available links, as
discussed in Chapter 5, “IBM Ethernet in the green data center” on page 133.
When the VSRP failover occurs, a VSRP aware switch will see the VSRP hello
packets and MAC address originating from the backup VSRP switch. As a result,
the VSRP aware switch can redirect packets to the backup switch without the
need to relearn the entire Layer 2 network again.
For more detailed information about MRP consult the Brocade resource center:
http://www.brocade.com/data-center-best-practices/resource-center/index
.page
6.2.3 Metro Ring Protocol

The Metro Ring Protocol (MRP) is a Brocade proprietary protocol designed to
offer an alternative to STP in a ring environment. Within a campus or
metropolitan network, a ring topology is often considered to be a cost effective
method to provide redundancy and availability. MRP offers subsecond failover in
the event of a failure. MRP can be considered a sister technology of VSRP.
Where VSRP is designed for a mesh topology, MRP has been designed for the
ring topology.
For more detailed information about MRP consult the Brocade resource center:
http://www.brocade.com/data-center-best-practices/resource-center/index
.page
MRP sends a Ring Health Packet (RHP) out one interface and expects to see it
return on the other interface to the ring. An election process is held to select the
ring master, this ring master initiates the RHP on the lowest numbered interface
and will block the interface the RHP is received on. If the RHP is not received
within the configured time-out period, the blocked port is unblocked. Similarly,
when the fault is rectified the RHP will be received again on the second interface
and it will be blocked once again. The network administrator needs to define the
time out value for the RHP, this time-out value must be greater than the combined
latency of the ring.
MRP is not restricted to a single ring, it can be configured in multiple rings. Each
ring is assigned a ring number and elects a master. The devices connecting to
multiple rings will forward the RHP out all interfaces with an equal or lower ring ID
as that of the RHP. The RHP is then blocked on devices that have received a
matching RHP from another interface. This allows multiple rings to provide
availability for each other without flooding rings with traffic not destined for that
ring.

Note: The ring latency must be less than the RHP interval, and the RHP
interval plus the dead timer interval must be greater than the ring latency.
6.2.4 Access tier availability

The access tier is the most vulnerable tier in the network. Computers have
traditionally only connected to a single network switch. However, as data centers
have become more central to the core business, many servers are connecting to
two (or more) access switches, a practice called dual homing (or multi-homing).
A server that is dual homed to the same network, must run some form of Network
Interface Backup (NIB) configuration. While more information about NIB can be
found in the references, the main feature to note is NIB allows the server to be
connected to two different Layer 2 switches. If one switch fails, NIB automatically
moves the traffic over to the other switch.
For more information about NIB, see the website:

http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=
/com.ibm.cluster.rsct_csm16.admin.doc/bl503_nibback.html
For user workstations, dual homing is not typically required or deployed. Instead,
many businesses are deploying wireless networks as well as wired networks. If
one connection happens to fail (such as the wired connection), the other one
(such as the wireless network) is usually available. In the event of a catastrophic
failure at a user access switch, the users can move to another physical location
while the fault on the switch is repaired.
6.3 Layer 3 availability features

Routers operate at Layer 3 of the OSI model and are used in various tiers of the
network. To allow for high availability, routers will often be deployed in pairs and
can also run specific protocols for dynamic fail over in case one router, or path
connected to the router, fails.
Routers maintain a table of networks the router can forward traffic to. The route
table is populated in a number of ways these can all be categorized as either
static or dynamic routes.
Again, the access tier is the most vulnerable to availability issues, however,
protocols exist to greatly reduce this vulnerability.
6.3.1 Static routing
Static routes are defined by the network administrator and require manual
intervention for any changes. A static route consists of a destination network and
a next hop address. The next hop address is the IP address of the router that is
closer to the destination network. The next hop address must be within a network
that is physically or logically connected to the router.
While some implementations of static routes allow for multiple paths to the same
destination network, many do not. Even those routers that allow for multiple
paths to the same network might not behave in a fully predictable manner in case
one path is no longer be available.
6.3.2 Dynamic routing

Dynamic routing protocols learn how to reach networks through routing protocol
updates. Various protocols can be used, more common ones today are BGP,
OSPF and RIP v2.
With dynamic routing, if a path to a network is no longer available the router

closest to the network no longer advertises that route. If an alternate path is
available, the dynamic routing protocol will use the alternate path. If no alternate
path is available the route will be removed from the route table. Depending on the
protocol chosen updates to the routing table take different times often dependent
on the size or complexity of the network.
6.3.3 VRRP and VRRPE

Although VRRP or VRRPE can be deployed at any tier of the network, they are
best deployed closer to the edge at the distribution or aggregation tiers. At the
core of the network, it is best to utilize routing protocols to provide redundancy.
Each computer attached to a network is configured with a default gateway IP

address. This IP address belongs to the router connected to the network offering
the best path out to the destinations. While a network can have more than one
egress path, that is not typically advised for average end-point connections. For
example, servers that are connecting to multiple networks for various functions
must have carefully managed IP configurations to ensure connectivity to the
required networks is transmitted out the correct interface. Servers must not
typically have routing enabled between segments.
For a simple end-point device the network can provide redundancy of the default
gateway by utilizing Virtual Router Redundancy Protocol (VRRP) configurations.

Brocade also has a proprietary extension named VRRP Extension (VRRPE) both
are available on the IBM Ethernet routers.
The IBM Ethernet routers allow for up to 512 VRRP or VRRPE instances to be
configured on a single router.
VRRP
For a pair of routers deployed at the distribution tier, redundancy can be
configured by using VRRP, more than two routers can be used if required. Each
router in the virtual router group, must be configured with VRRP to provide
redundancy. VRRP configuration requires each VRRP member to be configured
with the IP address of the master as the virtual IP address, this allows the virtual
router to share one IP address. Each interface within the virtual router still has a
unique IP address, but only the virtual IP address is used to transmit traffic
through the virtual router.
The VRRP session communicates over the same Layer 2 network as the
computers using a unicast address. The owner of the IP address (VRRP master)
defines a virtual MAC address for the virtual router. If the backup VRRP routers
do not receive the VRRP hello packet within a predetermined time, the next
highest priority router will take ownership of both the IP address and the virtual
MAC address. This allows all computers on the network to maintain data
communications through the same IP address which is configured as their
default route.
To allow for load balancing, multiple instances of VRRP can be configured on

each router with each being defined as the master for one VRRP. For example, if
only two routers are being used, two instances of VRRP can be configured on
each router. Using VRRP priority each router is then configured as the primary
for one virtual router and backup for the other virtual router. Following the VRRP
configuration, half the computers on the IP segment can be configured with a
default gateway as the IP address of one virtual router, while the other half of the
computers have their default gateway set to the virtual IP of the other virtual
router. In this case, traffic is balanced between the two routers, but if one router
fails, the other router takes over the IP address and VRRP virtual MAC address.
Within the IBM Ethernet products, VRRP has been extended to allow the virtual
router configuration to monitor another link. If the monitored link fails on one
router it notifies the virtual router over VRRP and one of the backup routers will
assume ownership of the IP address and virtual MAC address. For example, a
distribution router might be configured to monitor the link to the core router, if this
link fails, the backup router will take over as the primary until the fault can be
fixed.
VRRPE
There are a number of differences and benefits of using VRRPE. Perhaps the
biggest benefit of VRRPE is that the IP address for the virtual router is not a
physical interface on any of the member routers. Instead both the IP address and
MAC address are virtual. The master is elected based on the highest priority of
all routers in the virtual group.
There is one item to be aware of, when VRRPE is set to monitor another link.
When that link fails, the priority of the device is reduced by the value in the track
priority setting. This result is very different to VRRP, where the priority is dropped
to twenty (20) if the tracked link fails. Therefore, if the track priority is not set large
enough to reduce the priority to below the next backup device, the next backup
device will not take ownership of the virtual IP or virtual MAC.
Note: When using VRRPE with track ports, the track port priority is subtracted
from the VRRPE device’s interface priority. If the interface priority of the
backup device is not greater than the resultant VRRPE priority less the track
port priority, the backup device will not take over as the master.
Consider an example where the VRRPE interface priority of the intended master
is set to 200 and the track port priority is set to 20. If the tracked port fails, the
new interface priority is 180 (200 - 20). If the intended backup interface priority is
not greater than 180, it will not take over as master in this example.
6.4 Protected link groups

The IBM Ethernet s-series and g-series products allow multiple ports to be
configured in a protected link group, which is used to back up a physical
connection between two switches. Each port in the link group is given the same
MAC address, and one port on each switch must be configured as active. The
active port will transmit data, all other ports will remain in standby mode until
needed. When the active port fails, the first available standby port takes over
communication without interruption to traffic flow.
Protected link groups can be created across modules and can include links of
different speed. For example, the active link can be a 10 GbE link and the
standby link can be a 1 GbE link, or a group of multiple 1 GbE links (6.5, “Link
Aggregation Group”). Doing this allows for a reliable high speed link to be backed
up by a lower speed (lower cost) link. Although a lower speed link will impact
performance, it is a cost effective link backup method that allows data
communications to continue traversing, at a slower speed, while the failed link is
repaired.

Note: Protected link groups provide fast Layer 2 resiliency, removing the need
for spanning tree protocols to unblock interfaces in the case of a single port or
cable failure. However, spanning tree protocols must still be utilized for added
insurance against failures such as a whole device failure.
6.5 Link Aggregation Group

A Link Aggregation Group (LAG) allows for a number of ports to be configured as
a single load balancing link. The number of ports can range from one (1) up to a
maximum of thirty-two (32). The maximum number of ports in a LAG
configuration is dependent on the device, module, and code.
The LAG can also be configured to provide a few high availability benefits:
򐂰 Multiple connections between devices can be grouped together to balance
the load. This provides faster throughput as well as some high availability
benefits. In the case of two links between devices, in case one link fails, the
other link in the LAG continues transporting data without spanning tree
protocols needing to update connectivity status.
򐂰 Ports within a LAG set can be distributed across different modules. This
provides module based HA, if one module fails, the LAG is still active allowing
degraded throughput while the failed module is replace.
򐂰 A minimum number of active ports can be configured. This is especially useful
if there are multiple paths between devices. For example if two LAGs are
configured between two devices, the primary path was designed with six (6) 1
GbE ports the backup path designed with four (4) 1 GbE ports, the six (6) port
LAG can be configured to ensure at least four (4) ports were active, if less
than four (4) ports are active that LAG set will shut down and communications
can use the other LAG. Consider this in conjunction with VSRP where the
same MAC is also assigned, this allows for seamless cut over to a backup
path.
6.6 PoE priority

The IBM s-series and g-series Ethernet Switches are capable of supporting
Power over Ethernet (PoE). For the s-series PoE is supplied by separate power
supplies, whereas the g-series uses the same power supplies for both system
and PoE power.
The IEEE specification defines five classes of power as shown in Table 6-1, all
power is measured as maximum at the Power Source Equipment (PSE) which is
the IBM Ethernet switch in this book.
Table 6-1 PoE classifications

IEEE Class Usage Maximum Power (W) at
PSE
0 Default 15.4
1 Optional 4
2 Optional 7
3 Optional 15.4
4 Reserved for future use Act as class 0
For the purpose of the calculations in this section, we use the worst case
scenario of class-3 9 (15.4W) for all our PoE devices. However, typically not all
devices are class-3 or even require a constant 15.4W supplied.
Each of the g-series and s-series can be configured with more available PoE
power than available ports. This allows the majority of PoE devices to continue
operating even though a power supply has failed. If all the ports are requiring full
class-3 PoE supply, each device will by default assign power from the lowest port
number to the highest port number, disabling power to ports when the available
PoE power has been exhausted.
There are two options to design availability into the PoE environment:
򐂰 Place your more important PoE devices on the lowest available port
򐂰 Set PoE priority
Although it is entirely possible to define a site standard forcing the use of the first
ports to be connected according to priority, staff movement might change seat
allocation over time. However, devices such as PoE powered access points or
surveillance cameras do not move ports. Consider reserving the first five (5) or
ten (10) ports, depending on site constraints, for business critical infrastructure.

The PoE priority setting allows the network administrator to define three levels of
priority:
PoE Priority 1 Critical: Reserve this priority for critical infrastructure such as
IP Surveillance Cameras, wireless AP’s (if critical to the
business) and a minimal number of emergency VoIP
handsets.
PoE Priority 2 High: Use this priority for devices such as VoIP handsets that
are more important than others, for example, a manager or
team lead handset on a call center.
PoE Priority 3 Low: Keep the majority of PoE devices in this priority group,
which is the default.
If insufficient power is available, PoE priority starts assigning power to the highest
priority devices first, disabling PoE supply to remaining ports when supply is
exhausted. In the case of contention for power to ports of equal PoE priority the
lowest ports will be assigned power first.
6.6.1 g-series
The g-series power supply shares power between system functions and PoE
functions. This shared power supply has been designed to allow for 480 W of
PoE supply, each individual power supply can therefore provide power to 31 PoE
ports (480 / 15.4 = 31). By using redundant power supplies more power is
available than ports.
6.6.2 s-series
The IBM s-series has specific PoE power supplies, there are two power options
within each of the mains voltage models depending on your PoE requirements.
The network architect can decide between the 1250 W or 2500 W power supply.
The 1250 W PoE power supply can provide power to 81 class-3 PoE devices.
The 2500 W model provide power to 162 class-3 PoE devices.
The B08S can support up to 324 class-3 PoE devices if two (2) PoE power
supplies each of 2500 W were installed, whereas the current maximum number
of copper ports on this model is 192. Even with the failure of one PoE power
supply, 162 class-3 PoE ports can still be provided full class-3 power.
Similarly, the B16S can support up to 648 class-3 PoE devices with four (4)
2500 W PoE power supplies. The current maximum number of copper ports for
this model is 384. Even with the failure of one PoE power supply, there is still
enough power for more than 384 class-3 PoE ports.
6.7 Hitless upgrades
The IBM Ethernet chassis based products (s-series and m-series) are capable of
having code upgraded without interrupting the operation of the system. This is
known as a hitless upgrade, because the device does not take a “hit” to
operations. This feature improves the high availability of the network as it is not
required to be off-line at any stage during an upgrade.
Both platforms require two management modules and access to the console
ports for hitless upgrades. There might also be release-specific upgrades or
other constraints that are documented in the upgrade release notes.
6.7.1 s-series
The s-series supports hitless upgrade for Layer 2 functions only. If the s-series is
operating Layer 3 functions, these functions will be interrupted.
6.7.2 m-series
The m-series supports hitless upgrade for both Layer 2 and Layer 3 functions.
Specific configuration is required to provided graceful restart for OSPF and / or
BGP.

7
Chapter 7. Quality of Service

In this chapter we discuss Quality of Service (QoS) and how QoS is implemented
in the IBM b-type portfolio. We also show how QoS and its accompanying
mechanisms (Rate Limiting and Rate Shaping) can be used to build reliable
networks.

7.1 QoS introduction
In the area of networking and other communication networks, Quality of Service
(QoS) usually refers to the capability to provide a better, or guaranteed, service
to a selected network’s traffic. QoS can be provided on various networking
technologies such as Ethernet and 802.1 networks, SONET, frame relay, ATM,
and IP-routed networks.
The primary goal of a good QoS is to provide dedicated bandwidth, controlled

jitter and latency, controllable delay, packet dropping probability, and error rate for
a particular network flow. With this, different priorities can be assigned to different
applications, users, or just a simple data stream traveling the network
infrastructure.
Network flow can be defined in a number of ways. Usually it is defined as a

combination of source and destination address, source and destination socket
numbers and, for example, session number. It can also be defined as a packet
coming from a particular application or from an incoming interface. With
improvements in identification techniques, the flow can be identified even more
precisely as, for example, the address inside an SMTP packet. When we refer to
a flow in this chapter, it can be any of these definitions.
QoS guarantees are very important when network capacity is not sufficient,
especially for real-time streaming multimedia applications such as voice over IP
(VOIP), IP based TV, online gaming, and cellular data communication, because
those types of applications require fixed bandwidth and are delay sensitive. In
cases where there is no network congestion or when the network is oversized,
QoS mechanisms are not required.
A network without QoS in place is usually called a “best-effort” network. An

alternative to complex QoS mechanisms is to over-provision the capacity to
provide enough load for expected peak traffic load. To achieve optimal network
performance, combine both approaches, because only deploying QoS without
enough baseline capacity can lead to insufficient network performance.
Note: Do not get QoS confused with high level of performance or achieving
service quality. QoS only means that some traffic will be prioritized over other
traffic. If there are not enough resources available, even utilizing QoS cannot
produce high performance levels. QoS will ensure that if there is capacity
available, it will be assigned in a consistent manner to prioritize traffic when
required, and with this, the level of performance can be maintained.
In the past, QoS was not widely used because of the limitation of the networking
devices’ computer power for handling the packet in the network infrastructure.
7.2 Why QoS is used
In over-subscribed networks, many things can happen to a packet while traveling
from the source to the destination:
򐂰 Dropped packets:
In some cases routers fail to deliver packets if they arrive when the buffers are
already full. Depending on the situation, some, none, or all of the packets can
be dropped. In this case the receiving side can ask for packet retransmission
which can cause overall delays. In such a situation it is almost impossible to
predict what will happen in advance.
򐂰 Delays:
It can take a long time for a packet to reach its destination, because it can get
stuck in long queues, or, for example, it takes a non-optimal route to avoid
congestion. Some applications which are sensitive to delays (that is, VOIP)
can become unusable in such cases.
򐂰 Out-of-order delivery:
It can happen that when a collection of packets travel across the network that
different packets take different routes. This can result in a different delay and
packets arriving in a different order than they were sent in. Such a problem
requires special additional protocols responsible for rearranging out-of-order
packets to a correctly ordered state (isochronous state) once they reach the
destination. This is especially important for video and VOIP streams where
the quality can be dramatically affected.
򐂰 Jitter:
When traveling from the source to the destination the packets reach the
destination with different delays. The delay is affected by the position in the
queue of the networking equipment along the path between the source and
destination and this position will vary unpredictably. Such a variation in delay
is know as jitter and can seriously affect the quality of streaming applications
such as streaming audio and video.
򐂰 Error:
It can also happen that packets are misdirected, combined together, or even
corrupted while traveling the network. The receiving end has to detect this
and as in the case when packets are dropped ask the sender to retransmit the
packet.
By applying QoS policies, the foregoing problems can be minimized or

eliminated.
Chapter 7. Quality of Service 157

QoS can help in solving most congestion problems. But in cases when there is
just too much traffic for the available bandwidth, QoS can become a bandage.
Constant over-capacity requirements need to be addressed with increased
resources, not just using QoS tools. Of course, if there is a lot of unimportant
traffic, it can be trimmed down using QoS techniques.
QoS is the most beneficial for what are known as inelastic services:
򐂰 VOIP (Voice over IP)
򐂰 IPTV (IP based TV)
򐂰 Streaming multimedia
򐂰 Video teleconferencing
򐂰 Dedicated link emulation
򐂰 Safety critical applications (for example, remote medical procedures requiring
a guaranteed level of availability, sometimes also called hard QoS)
򐂰 Online gaming, especially real time simulation in a multi-player environment
Inelastic services or applications require a certain minimum level of bandwidth

and a certain maximum latency to provide its function.
7.3 QoS architecture

The basic QoS architecture must address three basic elements to achieve QoS
goals:
򐂰 Identification and marking of the packets that will be used in end to end
network QoS between network elements
򐂰 Executing QoS in a single network element such as queuing, scheduling, and
traffic-shaping
򐂰 QoS end to end administration with the use of policies, management, and
accounting functions to control traffic in end to end fashion
7.3.1 QoS enabled protocols

The following protocols provide QoS:
򐂰 IEEE 802:
– IEEE 802.1p
– IEEE 802.1Q
– IEEE 802.11e
– IEEE 802.11p
򐂰 Multiprotocol Label Switching (MPLS)
򐂰 Resource Reservation Protocol - Traffic Engineering (RSVP-TE)
򐂰 Frame relay
򐂰 X.25
򐂰 Asynchronous Transfer Mode (ATM)
򐂰 TOS (Type of Service)) field in the IP header (now superseded by Diffserv)
򐂰 IP Differentiated services (DiffServ)
򐂰 IP Integrated services (IntServ)
򐂰 Resource reSerVation Protocol (RSVP)
7.3.2 QoS mechanisms

There are several mechanisms to manage the traffic when QoS is deployed:
򐂰 Scheduling algorithms:
– Weighted fair queuing (WFQ)
– Class based weighted fair queueing (CBWFQ)
– Weighted round robin (WRR)
– Deficit weighted round robin (DWRR)
򐂰 Congestion avoidance:
– Random early detection (RED)
– Weighted random early detection (WRED)
– Policing
– Explicit congestion notification
– Buffer tuning
򐂰 Traffic shaping:
– TCP rate control
– Leaky bucket
– Token bucket
7.3.3 Classification
To provide priority handling of particular types of traffic, first this traffic needs to
be identified. Classification is the process of selecting packets which will be
handled by the QoS process. The classification process assigns a priority to
packets as they enter the networking device (that is, switch or router). The priority
can be determined based on the information that is already contained within the
packet or assigned to the packet on the arrival. After a packet or traffic flow is
classified, it is mapped to a corresponding queue for further processing.

7.3.4 QoS in a single network device
Queue management, congestion management, link efficiency, and
shaping/policing are QoS tools provided within single network device.
Queue management
The size of the queues is usually not infinite, so the queues can fill and thus
overflow. In the case when a queue is full, any additional packets cannot get into
it and they are dropped. This is called tail drop. The issue with tail drops is that a
network device cannot prevent dropping of the packets (even if those packets are
high priority packets). To prevent such issues, there are two options:
򐂰 Provide some kind of criteria for dropping packets that have lower priority
before dropping higher priority packets.
򐂰 Avoid the situation when queues fill up, then there is always space for high
priority packets.
Both of these functions are, for example, provided by Weighted Random Early
Detection (WRED).
Congestion management
Bursty traffic can sometimes exceed the available speed of the link. In such a
case, the network device can put all the traffic in one queue and use a first in, first
out (FIFO) method for the packets, or it can put packets into different queues and
service some queues more often than the others.
Link efficiency
Low speed links can be a problem for a smaller packets. On a slow link, the
serialization delay of a big packet can be quite long. For example, if an important
small packet (such as a VoIP packet) got behind such a big packet, the delay
budget for this is exceeded even before the packet has left the network device
(router). In such a situation, link fragmentation and interleaving allows large
packets to be segmented into smaller packets interleaving important small
packets. It is important to use both options, interleaving and fragmentation. There
is no reason to fragment big packets if later you do not interleave other packets
between those fragments.
Too much header overhead over payload can also influence efficiency. To
improve that situation, compression can be utilized.
Traffic shaping and policing
Shaping is used to limit full bandwidth of the particular traffic flow and is mainly
used to prevent overflow situations. Shaping can be used in the links to the
remote sites that have lower capacity as a link to the main site. For example, in a
hub-spoke model, you can have a 1 Gbps link from a central site and 128 Kbps
from remote sites. In such a case, traffic from the central site can overflow links to
the remote sites. Shaping the traffic is a perfect way to pace traffic to the
available link capacity. In the case of traffic shaping, traffic above the configured
rate is buffered for later transmission.
Policing is very similar to shaping. It only differs in one very important way:
All traffic that exceeds the configured rate is not buffered and it is normally
discarded.
7.3.5 Network wide QoS

When providing network wide QoS, this is usually referred to as a service level
meaning that the network is capable of delivering service for specific network
traffic from edge to edge. The service is defined by its level of QoS strictness,
which defines how tightly the service can be bound to a specific parameter of the
network traffic, that is, bandwidth, delay, jitter, and loss.
Basic end-to-end QoS can be provided across the network in three ways:
򐂰 Best-effort service: Such a service is also known as a service without QoS.
This service provides connectivity without any guarantees. This can be best
characterized by the FIFO type of queues, which does not differentiate
between flows.
򐂰 Differentiated service (soft QoS): Some of the traffic is treated with priority
over the rest, that is, faster handling, more bandwidth, and lower loss rate,
and is statistical preference, not a hard guarantee. Such QoS is provided by
classification of traffic and applying QoS tools as mentioned above.
򐂰 Guaranteed service (hard QoS): With this service, network resources are
explicitly reserved for a specific traffic. For example, guaranteed service can
be achieved with RSVP.
The type of service that is deployed in a particular infrastructure depends on the

environment and type of traffic that will be carried over the infrastructure.

7.3.6 QoS management
As with any other tools in the area of performance management, QoS must also
be managed correctly. Managing QoS helps in evaluating and setting QoS
policies, and goals. Usually, you can follow these steps:
1. Before applying any policies, record a baseline of all network devices. Such a
baseline can help in determining the traffic characteristic of the network. It is
especially important that applications that are targeted for QoS are baselined
from a performance perspective, for example, by measuring the response
time or transaction rate.
2. Based on the traffic characteristics recorded when the baseline is performed
and selection of targeted applications, implement the QoS techniques.
3. After implementation of QoS targeted application(s), verify performance to
see if QoS resulted in reaching the required goals. If required, further QoS
adjustments can be performed.
Setting up the QoS in the networking environment is not a one-time action. QoS
has to evolve alongside changes that happen in the networking infrastructure
and it must be adjusted accordingly. QoS needs to become an integral part of
network design.
7.4 QoS on b-type networking products

In the following sections we describe how QoS and rate limiting is implemented
in IBM b-type networking products.
IBM b-type networking products use two software platforms: FastIron (s-series
and g-series) and NetIron (m-series and c-series). QoS and rate limiting
implementation depends on the software used.
7.4.1 FastIron QoS implementation
In FastIron, the QoS feature is used to prioritize the use of bandwidth in a switch.
When QoS is enabled, traffic is classified when it arrives at the switch and it is
processed based on the configured priorities. Based on that, traffic can be:
򐂰 Dropped
򐂰 Prioritized
򐂰 Guaranteed delivery
򐂰 Subject to limited delivery
When the packet enters the switch, it is classified. After a packet or traffic flow is
classified, it is mapped to a forwarding priority queue.
FastIron based devices classify packets into eight traffic classes with values
between 0 to 7. Packets with higher priority get precedence for forwarding.
Classification
Processing of classified traffic is based on the trust level, which is in effect on the
interface. Trust level is defined based on the configuration setup and if the traffic
is switched or routed. Trust level can be one of the following possibilities:
򐂰 Ingress port default priority
򐂰 Static MAC address
򐂰 Layer 2 Class of Service (CoS) value: This is the 802.1p priority value in the
tagged Ethernet frame. It can be a value from 0 – 7. The 802.1p priority is
also called the Class of Service.
򐂰 Layer 3 Differentiated Service Code Point (DSCP): This is the value in the six
most significant bits of the IP packet header’s 8-bit DSCP field. It can be a
value from 0 – 63. These values are described in RFCs 2472 and 2475. The
DSCP value is sometimes called the DiffServ value. The device automatically
maps a packet's DSCP value to a hardware forwarding queue.
򐂰 ACL keyword: An ACL can also prioritize traffic and mark it before sending it
along to the next hop.
Because there are several criteria, there are multiple possibilities as to how the
traffic can be classified inside a stream of network traffic. Priority of the packet is
resolved based on criteria precedence.

Precedence follows the schema as shown in Figure 7-1.
Figure 7-1 Determining trust level
As defined in Figure 7-1, the trust criteria are evaluated in the following order:
1. ACL defining the priority; in this case, the ACL marks the packet before
sending it along to the next hop.
2. 802.1p Priority Code Point (PCP) when the packet is tagged according to
802.1Q definition.
3. Static MAC address entry
4. Default port priority
If none of the foregoing criteria is matched, the default priority of 0 is used.
After the packet is classified, it is mapped to an internal forwarding queue, based

on the internal forwarding priority that is defined in the process of classification.
Default QoS mappings are shown in Table 7-1.
Table 7-1 Default QoS mappings

DSCP 802.1p (COS) Internal Forwarding
value value forwarding queue
priority
0-7 0 0 0 (qosp0)
8 - 15 1 1 1 (qosp1)
16 - 23 2 2 2 (qosp2)
24 - 31 3 3 3 (qosp3)
32 - 39 4 4 4 (qosp4)
40 - 47 5 5 5 (qosp5)
48 - 55 6 6 6 (qosp6)
56 - 63 7 7 7 (qosp7)
The mapping between the DSCP value and forwarding queue cannot be
changed. However, the mapping between DSCP values and the other properties
can be changed as follows:
򐂰 DSCP to Internal Forwarding Priority Mapping: Mapping between the DSCP
value and the Internal Forwarding priority value can be changed from the
default values shown in Table 7-1. This mapping is used for CoS marking and
determining the internal priority when the trust level is DSCP.
򐂰 Internal Forwarding Priority to Forwarding Queue: The internal forwarding
priority can be reassigned to a different hardware forwarding queue.

QoS queues
Table 7-2 shows the available queues on the FastIron based switches.
Table 7-2 QoS queues

QoS priority level QoS queue
0 qosp0 (lowest priority queue)
1 qosp1
2 qosp2
3 qosp3
4 qosp4
5 qosp5
6 qosp6
7 qosp7 (highest priority queue)
The queue names can be changed for easier management.
Assigning QoS priorities

If traffic is not assigned the priority based on DSCP or 802.1p values, its priority
can be assigned based on the incoming port priority or static MAC entry as
shown in Figure 7-1 on page 164.
If those priorities are not set, all traffic will be by default placed in a “best-effort
queue”, which is the queue with priority 0 (qosp0).
It is possible that if a packet qualifies for an adjusted QoS priority based on more
than one criteria, the system will always give a packet the highest priority for
which it qualifies.
QoS marking
QoS marking is the process of changing the packet’s QoS information for the
next hop.
In the marking process, the 802.1p (Layer 2) and DSCP (Layer 3) marking
information can be changed and this is achieved by using ACLs. It is possible to
mark Layer 2 802.1p (CoS) value, Layer 3 DSCP value or both values. Marking is
not enabled by default.
Marking can be used in cases when traffic is coming from the device that does
not support QoS marking and we want to enable the use of QoS on the traffic.
DSCP based QoS
FastIron devices supports basic DSCP based QoS also called Type of Service
(ToS) based QoS.
FastIron also supports marking of the DSCP value. FastIron devices can read
Layer 3 QoS information in an IP packet and select a forwarding queue for the
packet based on the information. It interprets the value in the six most significant
bits of the IP packet header’s 8-bit ToS field as a Diffserv Control Point (DSCP)
value, and maps that value to an internal forwarding priority.
The internal forwarding priorities are mapped to one of the eight forwarding
queues (qosp0 – qosp7) on the FastIron device. During a forwarding cycle, the
device gives more preference to the higher numbered queues, so that more
packets are forwarded from these queues. So, for example, queue qosp7
receives the highest preference; while queue qosp0, the best-effort queue,
receives the lowest preference. Note the following considerations:
򐂰 DSCP based QoS is not automatically enabled, but can be, by using ACLs.
򐂰 On g-series switches, DSCP is activated on a per port basis.
򐂰 On s-series switches, DSCP is activated with the use of ACLs.
QoS mappings
To achieve more granular QoS management, it is possible to change the
following QoS mappings:
򐂰 DSCP to internal forwarding priority
򐂰 Internal forwarding priority to hardware forwarding queue
The default mappings are shown in Table 7-1 on page 165.
Note: On s-series switches when sFlow is enabled, QoS queue 1 (qosp1) is

reserved for sFlow packets. In this situation only 7 priorities are available for
the rest of the traffic. Any non-sFlow packets which are assigned to QoS
queue 1 will be directed to QoS queue 0.
On g-series switches, QoS queue 1 (qosp1) can be used to prioritize traffic

even when sFlow is enabled. In this situation, QoS queue 1 will be shared
between sFlow and the rest of the traffic.
QoS on g-series stackable switches

When stacking is activated on g-series stackable switches, stacking links are
carrying two types of data packets:
򐂰 Control information for stacking protocol
򐂰 User data packets

In IronStack stackable topology, the priority of stacking a specific control packet
is elevated above all other traffic. This prevents loss of control packets and timed
retries that affect the performance. This prioritization also prevents stack
topology changes, which can occur in the event that enough stack topology
information packets are lost.
IronStack function reserves one QoS profile for providing higher priority for stack
topology and control traffic. Internal priority 7 is reserved for this purpose and it
cannot be reconfigured for any other purpose.
Note: When a g-series switch is operating in non-stackable mode, this

restriction does not apply.
QoS on IronStack in Layer 2 (802.1p)

Layer 2 trusting is enabled by default. Because priority 7 is reserved for stacking
control packets, any ingress data traffic with priority 7 is mapped to internal
forwarding queue 6. All other priorities are mapped to their corresponding
queues.
QoS on IronStack in Layer 3 (DSCP)

When Layer 3 trusting is enabled, packets arriving with values from 56 to 63 are
mapped to internal forwarding queue 6. All other DSCP values are mapped to
their corresponding queues.
QoS on IronStack with Port or VLAN priority

If a port priority is set to 7, all traffic is mapped to internal forwarding queue 6.
QoS on IronStack for 802.1p marking

In stacking mode by default 802.1p marking is not enabled. In the VLAN tag
outgoing traffic is not marked with 802.1p based on an internal queue into which
ingress traffic was classified.
QoS with IP based ACLs

The QoS option in the ACLs enables you to perform QoS for packets that match
the defined ACL. Using ACL is one of the options available to perform QoS as
opposed to, for example, directly setting internal forwarding priority based on the
incoming port, or VLAN membership.
The following QoS ACL options are supported:

򐂰 dscp-cos-mapping
򐂰 dscp-marking
򐂰 802.1p-priority-marking and internal-priority-marking
򐂰 dscp-matching
Note: By default, the b-type devices do the 802.1p to CoS mapping. If you
want to change the priority mapping to DSCP to CoS mapping, this can be
achieved with the ACL.
On s-series switches, marking and prioritization can be done inside one ACL
rule. On the g-series switches only one option can be used inside one ACL rule:
򐂰 802.1.p-priority marking
򐂰 dscp-marking
򐂰 Internal-priority.marking
ACL for DSCP CoS mapping (dscp-cos-mapping)

This option maps the DSCP value in incoming packets to a hardware table that
provides mapping of each of the 0 – 63 DSCP values, and distributes them
among eight traffic classes (internal priorities) and eight 802.1p priorities.
Note: The dscp-cos-mapping option overrides port-based priority settings.
Note: Do not use the dustup-cost-mapping option when assigning an 802.1p

priority. To assign an 802.1p priority to a specific DSCP (using dustup-match),
re-assign the DSCP value as well.
ACL for DSCP marking (dscp-marking)

The dscp-marking option for extended ACLs allows you to configure an ACL that
marks matching packets with a specified DSCP value. You also can use DSCP
marking to assign traffic to a specific hardware forwarding queue.
ACL for 802.1p and internal priority marking

802.1p-priority-marking re-marks the packets of 802.1Q traffic with newly defined
802.1p priority, or if the packet is non-802.1Q traffic, it will be marked with 802.1p
priority. Marking is applied on the outgoing 802.1Q interface.
The 802.1p priority mappings are shown in Table 7-3.
Table 7-3 Forwarding queues to 802.1p priorities mappings

Forwarding queue 802.1p value
qosp0 0
qosp1 1
qosp2 2

Forwarding queue 802.1p value
qosp3 3
qosp4 4
qosp5 5
qosp6 6
qosp7 7
Internal-priority-marking assigns traffic to a specific hardware forwarding queue

(qosp0 - qosp7).
In addition to changing the internal forwarding priority, if the outgoing interface is

an 802.1Q interface, this parameter maps the specified priority to its equivalent
802.1p (CoS) priority and marks the packet with the new 802.1p priority.
Note: The internal-priority-marking overrides port-based priority settings.
Both markings can be applied in one ACL. Internal priority marking is optional
and if not specified separately it will default to the value 1, which means traffic will
be mapped to the qosp1 forwarding queue.
ACL for DSCP matching (dscp-matching)

The dscp-matching option matches on the packet’s DSCP value. This option
does not change the packet’s forwarding priority through the device or mark the
packet. It can be used to identify the packet with a specific DSCP value.
Scheduling
Scheduling is the process of mapping a packet to an internal forwarding queue
based on its QoS information, and servicing the queues according to some kind
of mechanism.
The following queuing methods are supported on all FastIron devices:

򐂰 Weighted Round Robin (WRR): This method ensures that all queues are
serviced during each cycle. A weighted fair queuing algorithm is used to
rotate service among the eight queues on s-series and g-series switches. The
rotation is based on the weights you assign to each queue. This method
rotates service among the queues, forwarding a specific number of packets in
one queue before moving on to the next one.
Note: In stacking mode, qosp7 queue is reserved as Strict Priority under
weighted queuing. Attempts to change the qosp7 setting will be ignored.
WRR is the default queuing method and uses a default set of queue weights.
The number of packets serviced during each visit to a queue depends on the
percentages you configure for the queues. The software automatically
converts the percentages you specify into weights for the queues.
Note: Queue cycles on the s-series and g-series switches are based on
bytes. These devices service a given number of bytes (based on weight) in
each queue cycle.
The default minimum bandwidth allocation for WRR is shown in Table 7-4.
Table 7-4 Default bandwidth for WRR

Queue Default minimum Default minimum
bandwidth without bandwidth with jumbo
jumbo frames frames
qosp7 75% 44%
qosp6 7% 8%
qosp5 3% 8%
qosp4 3% 8%
qosp3 3% 8%
qosp2 3% 8%
qosp1 3% 8%
qosp0 3% 8%
򐂰 Strict Priority (SP): This method ensures service for high priority traffic. The
software assigns the maximum weights to each queue, to cause the queuing
mechanism to serve as many packets in one queue as possible before
moving to a lower queue. This method biases the queuing mechanism to
favor the higher queues over the lower queues.
For example, strict queuing processes as many packets as possible in qosp3
before processing any packets in qosp2, then processes as many packets as
possible in qosp2 before processing any packets in qosp1, and so on.

򐂰 Hybrid WRR and SP: This combined method enables the s-series and
g-series switches to give strict priority to delay-sensitive traffic such as VOIP
traffic, and weighted round robin priority to other traffic types.
By default, when you select the combined SP and WRR queueing method,
the s-series and g-series switches assigns strict priority to traffic in qosp7 and
qosp6, and weighted round robin priority to traffic in qosp0 through qosp5.
Thus, the switches schedule traffic in queue 7 and queue 6 first, based on the
strict priority queueing method. When there is no traffic in queue 7 and queue
6, the device schedules the other queues in round-robin fashion from the
highest priority queue to the lowest priority queue.
When the SP and WRR combined queuing method is used, the system
balances the queues as shown in Table 7-5.
Table 7-5 Default bandwidth for combined SP & WRR queuing

Queue Default bandwidth
qosp7 Strict priority (highest priority)
qosp6 Strict priority
qosp5 25%
qosp4 15%
qosp3 15%
qosp2 15%
qosp1 15%
qosp0 15% (lowest priority)
Queue 7 supports only SP, whereas queue 6 supports both SP and WRR, and
queues 5 - 0 support only WRR as the queuing mechanism.
The type of the queuing method is defined globally on the device. Queues can be
renamed and weight of the queues can be modified to meet specific
requirements.
When the weight of the queues is modified the total amount must be 100%.
Minimum bandwidth percentage of 3% for each priority. When jumbo frames are
enabled, the minimum bandwidth requirement is 8%. If these minimum values
are not met, QoS might not be accurate.
7.4.2 FastIron fixed rate limiting and rate shaping
In this section, we discuss FastIron rate limiting and rate shaping.
g-series rate limiting

g-series switches provide port-based fixed rate limiting on inbound and outbound
ports. With fixed rate limiting, you can specify the maximum number of bytes a
given port can send or receive. All exceeded bytes will be dropped. Rate limiting
applies for all the traffic on the given port.
The maximum number of bytes is specified in kilobits per second (Kbps). The
fixed rate limiting policy applies to one-second intervals and allows the port to
send or receive the number of bytes specified in the policy. All additional bytes
are dropped. Unused bandwidth is not carried over from one interval to another.
Inbound minimum and maximum rates are shown in Table 7-6.
Table 7-6 Rates for inbound rate limiting

Port type Minimum rate Maximum rate
GbE 65 Kbps 1000000 Kbps
10-GbE 65 Kbps 10000000 Kbps
Outbound minimum and maximum rates are shown in Table 7-7.
Table 7-7 Rates for outbound rate limiting

Port type Minimum rate Maximum rate Granularity
GbE 65 Kbps 1000000 Kbps 65 Kbps
10-GbE 2500 Kbps 10000000 Kbps 2500 Kbps
Table 7-8 shows where inbound and outbound rate limiting is supported or not
supported.
Table 7-8 Rate limiting support matrix

Port type/interface Supported Unsupported
GbE Inbound/outbound
10-GbE Inbound/outbound
Trunk ports Inbound/outbound
LACP enabled ports Inbound/outbound

Port type/interface Supported Unsupported
Virtual interfaces Inbound/Outbound
Loopback interfaces Inbound/Outbound
Note: Because of the hardware architecture of the g-series switches, the

effect of outbound rate limiting differs on GbE ports compared to 10 GbE
ports. For example, applying the same rate limiting value on GbE and 10 GbE
ports will produce different results.
Outbound rate limiting: On g-series switches, outbound rate limiting can be

applied in two ways:
򐂰 Port-based: With this approach, outbound traffic is limited on individual
physical port or trunk port. Only one policy can be applied to a port.
򐂰 Port-and-priority-based: With this approach, outbound traffic is limited on an
individual 802.1p priority queue and an individual physical port or trunk port.
One policy per priority queue for a port can be specified. This means that
eight port-and-priority-based policies can be configured on a given port.
Note: Both outbound port-base rate limiting and outbound

port-and-priority-based rate limiting can be configured on a single physical
port or trunk port. However, if a priority-based limit for a given port is greater
than the port-based rate limit, then the port-based rate limit will override the
priority-based rate limit. Similarly, if the port-based rate limit is greater than the
priority-based limit, then the priority-based rate limit will override the
port-based rate limit.
s-series rate limiting

s-series switches provide port-based fixed rate limiting on inbound ports. With
fixed rate limiting you can specify the maximum number of bytes a given port can
receive. All exceeded bytes will be dropped. Rate limiting applies for all the traffic
on the given port.
The maximum number of bytes is specified in bits per second (bps). The fixed
rate limiting policy applies to one-second intervals and allows the port to receive
the number of bits specified in the policy. All additional bytes are dropped.
Unused bandwidth is not carried over from one interval to another.
s-series rate shaping
Outbound rate shaping is a port-level feature that is used to shape the rate and to
control the bandwidth of outbound traffic. The rate shaping feature smooths out
excess and bursty traffic to the configured maximum before it is sent out on a
port. Packets are stored in available buffers and then forwarded at a rate which is
not greater than the configured limit. This approach provides better control over
the inbound traffic on the neighboring devices.
Note: It is best not to use fixed rate limiting on ports that receive route control
traffic or Spanning Tree Protocol (STP) control traffic. Dropped packets due to
the fixed rate limiting can disrupt routing or STP.
Hardware based rate limiting

The s-series and g-series switches support line-rate limiting in hardware. The
device creates entries in Content Addressable Memory (CAM) for the rate
limiting policies. The CAM entries enable the device to perform the rate limiting in
hardware instead of sending the traffic to the CPU. The device sends the first
packet in a given traffic flow to the CPU, which creates a CAM entry for the traffic
flow. A CAM entry consists of the source and destination addresses of the traffic.
The device uses the CAM entry for rate limiting all the traffic within the same flow.
A rate limiting CAM entry remains in the CAM for two minutes before aging out.
An s-series switch has one global rate shaper for a port and one rate shaper for
each port priority queue. Rate shaping is done on a single-token basis, where
each token is defined to be 1 byte.
The following rules apply when configuring outbound rate shapers:

򐂰 Outbound rate shapers can be configured only on physical ports, not on
virtual or loopback ports.
򐂰 For trunk ports, the rate shaper must be configured on individual ports of a
trunk.
򐂰 When outbound rate shaping is enabled on a port on an IPv4 device, the
port’s QoS queuing method (QoS mechanism) will be strict mode. This
applies to IPv4 devices only. On IPv6 devices, the QoS mechanism is
whatever method is configured on the port, even when outbound rate shaping
is enabled.
򐂰 You can configure a rate shaper for a port and for the individual priority
queues of that port. However, if a port rate shaper is configured, that value
overrides the rate shaper value of a priority queue if the priority queue’s rate
shaper is greater than the rate shaper for the port.
򐂰 Configured rate shaper values are rounded up to the nearest multiple of 651
Kbps. The maximum configurable limit is 2665845 Kbps.

How rate limiting works
Fixed rate limiting counts the number of bytes that a port receives or sends in
one second intervals. In case that the number of bytes exceeds the maximum
number which was specified when the rate was configured, the port will drop all
remaining packets for the rate-limited direction for the duration of the one-second
interval.
After the one-second interval is complete, the port clears the counter and
re-enables traffic.
Figure 7-2 shows an example of how fixed rate limiting works. In this example, a
fixed rate limiting policy is applied to a port to limit the inbound traffic to 500000
bits (62500 bytes) a second. During the first two one-second intervals, the port
receives less than 500000 bits in each interval. However, the port receives more
than 500000 bits during the third and fourth one-second intervals, and
consequently drops the excess traffic.
Figure 7-2 Fixed rate limiting operation
Bytes are counted by polling statistics counters for the port every 100
milliseconds, which gives 10 readings per second. With such a polling interval,
the fixed rate limiting policy has 10% accuracy within the port’s line rate. As a
result, it is possible that in same cases, policy allows more traffic than the
specified limit, but the extra traffic is never more than 10% of the line rate for the
port’s line rate.
ACL based rate limiting policy
The g-series and s-series switches support IP ACL based rate limiting of inbound
traffic. For s-series switches this is available for Layer 2 and Layer 3.
The ACL based rate limiting is achieved with traffic policies which are applied on
ACLs. The same traffic policies can be applied to multiple ACLs. Traffic policies
become effective on the ports to which ACLs are bound.
7.4.3 FastIron traffic policies

On s-series and g-series switches, traffic policies can be used as follows:
򐂰 Have a rate limit of inbound traffic
򐂰 Count the packets and bytes per packet to which the ACL permit or deny
clauses are applied
Traffic policies consists of the policy name and policy definition as follows:
򐂰 Traffic policy name: Identifies traffic policy and can be in the form of a string
with up to 8 alphanumeric characters.
򐂰 Traffic policy definition (TPD): Can be any of the following policies:
– Rate limiting policy
– ACL counting policy
– Combined rate limiting and ACL counting policy
The following rules apply to traffic policies:

򐂰 They are supported on Layer 2 and Layer 3 code.
򐂰 They apply on to IP ACLs.
򐂰 10 GbE interfaces are not supported.
򐂰 The maximum number of policies is as follows:
– s-series:
• 1024 on Layer 2. This is a fixed value which cannot be changed.
• 50 on Layer 3. This can be changed and depends on the available
system memory.
– g-series:
• 256 on Layer 2. This is a fixed value which cannot be changed.
• 50 on Layer 3. This can be changed and depends on the available
system memory.

򐂰 g-series switches do not support the traffic policy for ACL counting only.
򐂰 The same traffic policy can be referenced in more than one ACL entry within
an access list.
򐂰 The same traffic policy can be referenced in more than one ACL.
򐂰 Before deleting the active policy, it has to be removed from the ACL first.
򐂰 When TPD is defined, explicit marking of CoS parameters, such as traffic
class and 802.1p priority, are not available on the device. In the case of a TPD
defining rate limiting, the device re-marks CoS parameters based on the
DSCP value in the packet header and the determined conformance level of
the rate limited traffic, as shown in Table 7-9.
Table 7-9 Outgoing CoS parameters with traffic policies

Packet conformance Original packet DSCP Value for outgoing traffic
level value class and 802.1p priority
0 (Green) 0-7 0 (lowest priority queue)

or
1 (Yellow) 8 - 15 1
16 - 23 2
24 - 31 3
32 - 39 4
40 - 47 5
48 - 55 6
56 - 63 7 (highest priority queue)
2 (Red) N/A 0 (lowest priority queue)
򐂰 After TPD is defined and referenced in the ACL entry, then applied on the ACL
to a VE in the Layer 3 router code, the rate limit policy is accumulative for all of
the ports in the port region. If the VE/VLAN contains ports that are in different
port regions, the rate limit policy is applied per port region.
Traffic policies for ACL based rate limiting

This type of traffic policy is only available on s-series switches and can be used
on Layer 2 and Layer 3 code.
When a traffic policy for rate limiting is configured, the device automatically
enables rate limit counting, similar to the two-rate three-color marker (trTCM)
mechanism described in RFC 2698 for adaptive rate limiting, and the single-rate
three-color marker (srTCM) mechanism described in RFC 2697 for fixed rate
limiting. This counts the number of bytes and trTCM or srTCM conformance level
per packet to which rate limiting traffic policies are applied.
ACL based rate limiting can be defined on the following interface types:
򐂰 Physical Ethernet interfaces
򐂰 Virtual interfaces
򐂰 Trunk ports
򐂰 Specific VLAN members on a port
򐂰 Subset of the ports on a virtual interface
Two types of ACL based rate limiting are available:

򐂰 Fixed rate limiting
򐂰 Adaptive rate limiting
Fixed rate limiting

This policy enforces a strict bandwidth limit. The switch forwards traffic that is
within the limit but either drops all traffic that exceeds the limit, or forwards all
traffic that exceeds the limit at the lowest priority level, according to the action
specified in the traffic policy.
Adaptive rate limiting

This policy enforces a flexible bandwidth limit that allows for bursts above the
limit. Adaptive Rate Limiting policy can be configured to forward, modify the IP
precedence of and forward, or drop traffic based on whether the traffic is within
the limit or exceeds the limit.
Table 7-10 shows configurable parameters for ACL based adaptive rate limiting.
Table 7-10 Adaptive rate limiting parameters

Parameter Definition
Committed Information Rate The guaranteed kilobit rate of inbound traffic that is
(CIR) allowed on a port.
Committed Burst Size (CBS) The number of bytes per second allowed in a burst
before some packets will exceed the committed
information rate. Larger bursts are more likely to
exceed the rate limit. The CBS must be a value
greater than zero (0). It is best that this value be
equal to or greater than the size of the largest
possible IP packet in a stream.

Parameter Definition
Peak Information Rate (PIR) The peak maximum kilobit rate for inbound traffic
on a port. The PIR must be equal to or greater than
the CIR.
Peak Burst Size (PBS) The number of bytes per second allowed in a burst
before all packets will exceed the peak information
rate. The PBS must be a value greater than zero
(0). It is best that this value be equal to or greater
than the size of the largest possible IP packet in the
stream.
If a port receives more than the configured bit or byte rate in a one-second
interval, the port will either drop or forward subsequent data in hardware,
depending on the action specified.
Over the limit action

The following actions can be applied in fixed rate limiting and adaptive rate
limiting policies when packet exceeds configured fixed rate limit or the adaptive
rate limiting values for CIR, CBS, PIR, and PBS:
򐂰 Drop packets that exceed the limit
򐂰 Permit packets that exceed the limit; forward them at the lowest priority level
Traffic policies for ACL counting

Such traffic policies are only available on s-series switches; they can be used on
Layer 2 and Layer 3 code.
ACL counting enables the switch to count the number of packets and the number
of bytes per packet to which ACL filters are applied.
Traffic policies for ACL counting with rate limiting

Such traffic policies are available on s-series and g-series switches; they can be
used on Layer 2 and Layer 3 code.
ACL counting enables the switch to count the number of packets and the number
of bytes per packet to which ACL filters are applied.
Rate limit counting counts the number of bytes and conformance level per packet
to which rate limiting traffic policies are applied. The switch uses the counting
method similar to the two-rate three-color marker (trTCM) mechanism described
in RFC 2698 for adaptive rate limiting, and the single-rate three-color marker
(srTCM) mechanism described in RFC 2697 for fixed rate limiting. Rate limit
counting is automatically enabled when a traffic policy is enforced (active).
7.4.4 NetIron m-series QoS implementation
In this section we cover QoS implementation on m-series routers. The NetIron
m-series QoS processing can be divided into two major areas:
򐂰 Ingress traffic processing through an m-series router
򐂰 Egress traffic processing exiting m-series router
Ingress traffic processing through m-series router

The QoS operation on ingress traffic of an m-series router involves reception and
processing of packets based upon priority information contained within the
packet. As the packets are processed through the router, there are several
opportunities to influence the priority.
Here we outline the steps of configuring QoS for the packet:

1. Derive priority and drop precedence from the packets PCP (IEEE 802.1p)
value. The Priority Code Point (PCP) is a 3-bit field within an IEEE 802.1Q
tagged frame that is used to convey the priority of the frame. By using a
mapping table, the 3-bit PCP field can be decoded to derive priority and drop
precedence information.
Note: The PCP field was formerly called IEEE 802.1p.
2. Derive priority and drop precedence from the packets EXP value.
3. Derive priority and drop precedence from the packets DSCP value.
4. Merge or force the priorities defined in steps 1 through 3.
5. Merge or force the priority and drop precedence value based on the value
configured for the physical port.
6. Merge or force the priority value based on the value configured for the VLAN.
7. Merge or force the priority value based on an ACL look-up. This is used for
setting a specific priority for L2, L3 or L4 traffic flow.

The flow for ingress QoS processing is shown in Figure 7-3.
Figure 7-3 Logical flow of ingress QoS processing
Recognizing and mapping inbound packet priorities

1. In the first step, priority and drop precedence information from various
portions of the packet header are collected. The following sources are
available:
– If a packet’s EtherType matches 8100, or the port’s EtherType, derive a
priority value and drop precedence by decoding the PCP value.
– If the use of Drop Eligible Indicator (DEI) for QoS is configured then the bit
between VLAN ID and PCP in VLAN tag will be interpreted as a drop
precedence and priority value.
– For MPLS packets, derive a priority value and drop precedence by
decoding the EXP bits.
– For IPv4/v6 packets, derive a priority value and drop precedence by
decoding the DSCP bits.
The derived values for PCP,EXP and DSCP are mapped to a default map or
to a configured ingress decode policy map.
2. In the second step, the router determines if the priority value must be forced
or merged. The following actions are possible in this step:
– If a packet’s EtherType matches 8100, or the port’s EtherType, derive a
priority value and drop precedence by decoding the PCP value
– If PCP forcing is configured on the port, the priority and drop precedence
values are set to the value read from the PCP bits
– If EXP forcing is configured on the port, the priority and drop precedence
values are set to the value read from the MPLS EXP bits
– If DSCP forcing is configured on the port, the priority and drop precedence
values are set to the value read from the DSCP bits
– If there is no forcing configured on the port the following rules apply:
• For IPv4/v6 packets - priority and drop precedence values are obtained
as a merge of the decoded PCP and decoded DSCP values.
• For MPLS packets - priority and drop precedence values are obtained
as a merge of the decoded PCP and decoded EXP values.
Ingress decode policy map

After a packet’s ingress priority has been recognized for the PCP, DSCP, and
EXP values, those values are matched against a policy map to determine the
priority and drop precedence values that will be assigned to the packet internally
within the router. The maps used can be either of the following possibilities:
򐂰 Default policy maps
򐂰 User configured policy maps
The following user configured maps can be defined:

򐂰 PCP decode map
򐂰 DSCP decode map
򐂰 EXP decode map
Forcing or merging the priority of a packet

After a packet’s ingress priority has been mapped, the values that will be used for
processing on the router are determined by either forcing or merging.
There are several ways of how to “force” the priority of a packet based on the
following criteria:
򐂰 Forced to a priority configured for a specific ingress port
򐂰 Forced to a priority configured for a specific VLAN
򐂰 Forced to a priority that is obtained from the DSCP priority bits
򐂰 Forced to a priority that is obtained from the EXP priority bits
򐂰 Forced to a priority that is obtained from the PCP priority bits
򐂰 Forced to a priority that is based on an ACL match

If forcing is specified on multiple levels, the following precedence is used:
1. ACL match
2. VLAN priority
3. Physical port priority value
4. DSCP value in an incoming IPv4/v6 packet
5. EXP value in an incoming MPLS packet
6. PCP value in a tagged frame PCP field and DE (Drop Eligibility)
Forcing or merging the drop precedence of a packet

After a packet’s ingress drop precedence has been mapped, the values that will
be used for processing on the router are determined by either forcing or merging.
There are several ways how to “force” the drop precedence of a packet based on
the following criteria:
򐂰 Forced to a priority configured for a specific ingress port
򐂰 Forced to a priority configured for a specific VLAN
򐂰 Forced to a priority that is obtained from the DSCP priority bits
򐂰 Forced to a priority that is obtained from the EXP priority bits
򐂰 Forced to a priority that is obtained from the PCP priority bits
򐂰 Forced to a priority that is based on an ACL match

1. ACL match
2. VLAN priority
5. EXP value in an incoming MPLS packet
6. PCP value in a tagged frame PCP field and DE (Drop Eligibility)
Egress traffic processing through m-series router

The QoS operation on egress traffic of a m-series router involves marking
packets as they leave a router on the egress port. As the packets are prepared to
exit the router you can set the PCP, DSCP, and EXP values in the packet
headers.
Traffic processing is shown in Figure 7-4.
Figure 7-4 Logical flow of egress QoS processing
Egress decode policy map: The QoS value that a packet carries in its header
when it exits an m-series router on an egress interface is determined by a
specified mapping. Unless configured, this value once determined is placed in an
internal queue by using one of the default maps. Alternately, the following
alternate mappings can be defined:
򐂰 PCP decode map: This map defines how to map internal priority and drop
precedence value of a packet into the PCP code point.
򐂰 DSCP decode map: This map defines how to map internal priority and drop
precedence value of a packet into the DSCP code point.
򐂰 EXP decode map: This map defines how to map internal priority and drop
precedence value of a packet into the EXP code point.
Default QoS mappings

If a user defined map is not created or applied to ingress or egress traffic, the
m-series router uses a default map to decode PCP, DSCP, and EXP priority to
internal priority and drop precedence, or to encode PCP, DSCP, and EXP priority
values based on internal priority and drop precedence. The following tables
describe the default QoS mapping values:
򐂰 PCP Encode Table
򐂰 PCP Decode Table
򐂰 DSCP Encode Table
򐂰 DSCP Decode Table
򐂰 EXP Encode Table
򐂰 EXP Decode Table
DEI bit for ingress and egress processing

In the IEEE 802.1ad specification, two types of tag are defined:
򐂰 Customer VLAN tag (C-TAG)
򐂰 Service VLAN tag (S-TAG)

The semantics and structure of the S-TAG is identical to that of the C-TAG, with
the exception that bit 5 in octet 1, the Drop Eligible Indicator (DEI) bit, is used to
indicate if the packet is drop eligible. This allows all 3 bits in the PCP ID to be
used for indicating priority of the packet with the drop precedence indicated by
the DEI bit. IEEE 802.1ad requires that if this capability is provided, it must be
independently manageable for each port.
On the m-series router, it can be configured at the port level to allow a

drop-precedence value for the incoming packet to be computed based on the
DEI bit. Additionally, if this is configured, then a drop-eligible parameter will be
encoded in the DEI bit of transmitted frames. If the internal drop precedence of
the packet is 2 or 3, the DEI will be transmitted as 1; otherwise it will be
transmitted as 0.
Configuring support for super aggregate VLANs

In a super-aggregate VLAN application, it is possible to optionally configure an
untagged interface to copy the QoS bits from the tag value set by the edge device
to the tag value set by the core device. This is only supported if the incoming
packet has ETYPE 0x8100.
QoS configuration on LAG ports

For use of QoS on LAG ports, the following rules apply:
򐂰 Configurations where QoS values are applied directly to the port:
– Each port in the LAG must have the same priority, priority force,
drop-precedence and drop-precedence force configuration. In cases
where ports have different settings, LAG cannot be formed.
– Configuration can be changed by altering the primary port.
– If LAG is deleted, all ports will inherit configuration from the primary port.
򐂰 Configurations for all other QoS settings:
– Secondary ports in the LAG must not have any QoS values configured.
– All commands configured on the primary port are applied to all ports in the
LAG.
– After LAG is formed, QoS configuration for all ports can be changed by
changing the configuration the primary port
– If the LAG is deleted, the QoS configuration will only be retained on the
primary port.
Configuring QoS
To successfully implement QoS in m-series routers, you need to perform the
following procedures on ingress and egress QoS processing.
Ingress QoS procedures

The following ingress procedures are available:
򐂰 Creating ingress decode policy map: To be used if a specific ingress mapping
for priority and drop precedence is required.
򐂰 Binding ingress decode policy map: After a specific policy map is created, it
has to be bound either globally or to a specified interface.
򐂰 Configuring a force priority: In cases where multiple QoS values can be used
to determine QoS level of a packet, merging or forcing can be used to define
the priority and drop precedence.
򐂰 Enable the port to use the DEI bit: To be used when it is required that DEI bit
is used when computing the drop precedence.
򐂰 Support for super aggregate VLANs: To be used when using enhanced QoS
with super aggregate VLANs.
򐂰 Support for QoS configuration on LAG ports: To be used when using
enhanced QoS on ports within a LAG
Egress QoS procedures

The following egress procedures are available:
򐂰 Creating egress decode policy map: To be used if a specific egress mapping
for priority and drop precedence is required.
򐂰 Binding egress decode policy map: In cases where a specific policy map is
created, it has to be bound either globally or to a specified interface.
򐂰 Enable the port to use the DEI bit: To be used in encoding process when it is
required that DEI bit is defined in outgoing packet.
򐂰 Support for super aggregate VLANs: To be used when using enhanced QoS
with super aggregate VLANs.
Weighted Random Early Discard (WRED)

On the m-series router, queues are provided to buffer traffic levels that exceed
the bandwidth of individual ports. For each output port, a set of eight priority
queues is allocated on each inbound traffic manager. When traffic exceeds the
bandwidth of a port, packets are dropped randomly as long as the congestion
persists. Under these conditions, traffic of greater priority can be dropped instead
of traffic with a lesser priority.

Instead of being subject to such a random process, you can configure an
m-series router to monitor traffic congestion and drop packets according to a
WRED algorithm. This algorithm enables the system to detect the onset of
congestion and take corrective action. In practice, WRED causes a router to start
dropping packets as traffic in the router starts to back up. WRED provides
various control points that can be configured to change a system's reaction to
congestion. The following variables are used when calculating whether to drop or
forward packets:
򐂰 Statistical Average-Q-Size: The statistical average size of the queue
calculated over time on the router.
򐂰 Current-Q-Size: The current size of the queue as calculated on the router.
򐂰 Wq: Specifies the weights that must be given to the current queue size and
the statistical average-q-size when calculating the size for WRED
calculations.
򐂰 Max-Instantaneous-Q-Size: The maximum size up to which a queue is
allowed to grow. Packets that cause the queue to grow beyond this point are
unconditionally dropped. This variable is user configured.
򐂰 Min-Average-Q-Size: The average queue size below which all packets are
accepted. This variable is user configured.
򐂰 Max-Average-Q-Size: The average queue size above which all packets are
dropped. This variable is user configured.
򐂰 Pmax: The maximum drop probability when queue-size is at
Max-Average-Q-Size. This variable is user configured.
򐂰 Pkt-Size-Max: The packet size to which the current packet's size is compared
as shown in the algorithm below. This variable is user configured.
The WRED algorithm is applied to the traffic on all individual internal queues
(0-7) based upon parameters configured for its assigned queue type. When
traffic arrives at a queue, it is passed or dropped as determined by the WRED
algorithm. Packets in an individual queue are further differentiated by one of four
drop precedence values which are determined by the value of bits 3:2 of the
TOS/DSCP bits in the IPv4 or IPv6 packet header as shown in Figure 7-5.
Figure 7-5 ToS/DSCP bits in the packet header
Scheduling traffic for forwarding
If the traffic being processed by an m-series router is within the capacity of the
router, all traffic is forwarded as received.
When the point is reached where the router is bandwidth constrained, it becomes
subject to drop priority or traffic scheduling if so configured.
The m-series routers classify packets into one of eight internal priorities. Traffic
scheduling allows you to selectively forward traffic according to the forwarding
queue that is mapped to, according to one of the following schemes:
򐂰 Strict priority-based scheduling: This scheme guarantees that higher-priority
traffic is always serviced before lower priority traffic. The disadvantage of
strict priority-based scheduling is that lower-priority traffic can be starved of
any access.
򐂰 WFQ weight-based traffic scheduling: With WFQ destination-based
scheduling enabled, some weight based bandwidth is allocated to all queues.
With this scheme, the configured weight distribution is guaranteed across all
traffic leaving an egress port and an input port is guaranteed allocation in
relationship to the configured weight distribution.
򐂰 Mixed strict priority and weight-based scheduling: This scheme provides a
mixture of strict priority for the three highest priority queues and WFQ for the
remaining priority queues.
Egress port and priority based rate shaping

Rate shaping is a mechanism to smooth out the variations in traffic above a
certain rate. The primary difference between rate shaping and rate limiting is that
in rate limiting, traffic exceeding a certain threshold is dropped. In rate shaping,
the traffic that exceeds a threshold is buffered so that the output from the buffer
follows a more uniform pattern. Rate shaping is useful when “burstiness” in the
source stream needs to be smoothed out and a more uniform traffic flow is
expected at the destination.
Note: Because excess traffic is buffered, rate shaping must be used with
caution. In general, it is not advisable to rate shape delay-sensitive traffic.
The m-series routers support egress rate shaping. Egress rate shaping is
supported per port or for each priority queue on a specified port.

Port-based rate shaping
When setting rate shaping for a port, it is possible to limit the amount of
bandwidth available on a port within the limits of the port’s rated capacity. Within
that capacity, it is possible to set the bandwidth at increments within the ranges
as shown in Table 7-11.
Table 7-11 Port based rate shaping intervals

Range Increment supported within the range
0 - 10M 8,333
10 - < 100M 20,833
100M - < 1G 208,333
1G - < 10G 2,0833,333
Note: The egress rate shaping burst size for a port-based shaper is 10000
bytes.
Port and priority-based rate shaping

When setting rate shaping for a priority queue, the amount of bandwidth available
for a specified priority can be limited, within the limits of the capacity of the port
that the priority is configured on. The limit for the priority can be set to any value
from one to the port’s maximum rating and the router will automatically round-up
the value to the next increment supported.
Note: The egress rate shaping burst size for a port and priority-based shaper
is 3072 bytes.
7.4.5 NetIron m-series traffic policies

The m-series router provides line-rate traffic policing in hardware on inbound
ports and outbound ports.
The m-series router can be configured to use one of the following modes of traffic
policing policies:
򐂰 Port-based: Limits the rate on an individual physical port to a specified rate.
Only one inbound and one outbound port-based traffic policing policy can be
applied to a port. These policies can be applied to inbound and outbound
traffic.
򐂰 Port-and-priority-based: Limits the rate on an individual hardware forwarding
queue on an individual physical port. Only one port-and-priority-based traffic
policing policy can be specified per priority queue for a port. These policies
can be applied to inbound and outbound traffic.
򐂰 VLAN-based: Untagged packets as well as tagged packets can be
rate-limited. Only one rate can be specified for each VLAN. Up to 990
VLAN-based policies can be configured for a port under normal conditions or
3960 policies if priority based traffic policing is disabled. These policies can
be applied to inbound and outbound traffic.
򐂰 VLAN group based: Limits the traffic for a group of VLANs. Members of a
VLAN group share the specified bandwidth defined in the traffic policing policy
that has been applied to that group. Up to 990 VLAN group-based policies
can be configured for a port under normal conditions or 3960 policies if
priority-based traffic policing is disabled. These policies can only be applied to
inbound traffic.
򐂰 Port-and-ACL-based: Limits the rate of IP traffic on an individual physical port
that matches the permit conditions in IP Access Control Lists (ACLs). Layer 2
ACL-based traffic policing is supported. Standard or extended IP ACLs can
be used. Standard IP ACLs match traffic based on source IP address
information. Extended ACLs match traffic based on source and destination IP
address and IP protocol information. Extended ACLs for TCP and UDP also
match on source and destination TCP or UDP addresses. and protocol
information. These policies can be applied to inbound and outbound traffic.
Up to 990 Port-and-ACL based policies can be configured for a port under
normal conditions or 3960 policies if priority-based traffic policing is disabled.
򐂰 Rate limiting for copied-CPU-bound traffic: The rate of Copied-CPU-bound
packets from applications such as sFlow, ACL logging, RPF logging, and
source MAC address learning (with known destination address) can be
limited. Copied-CPU-bound packets are handled and queued separately from
packets destined to the CPU such as protocol packets and using this feature
they can be assigned to one of eight priority queues which has a rate limit
assigned to it. The queue and rate are assigned by port and apply to all of the
ports that are supported by the same packet processor.

Table 7-12 describes which ports are associated with which packet processor.
Table 7-12 Ports per packet processor

Interface module Ports per packet processor
PPCR1 PPCR2
4 x 10 Gbps 1-2 3-4
20 x 1 Gbps 1 - 20
Traffic is initially traffic policed by a Committed Information Rate (CIR) bucket.

Traffic that is not accommodated in the CIR bucket is then subject to the Excess
Information Rate (EIR) bucket.
The CIR bucket

The CIR rate limiting bucket is defined by two separate parameters: the CIR rate,
and the Committed Burst Size (CBS) rate. The CIR rate is the maximum number
of bits a port is allowed to receive or send during a one-second interval. The rate
of the traffic that matches the traffic policing policy cannot exceed the CIR rate.
The CIR rate represents a portion of an interface's line rate (bandwidth),
expressed in bits per second (bps) and it cannot be larger than the port’s line
rate. CIR-defined traffic that does not use the CIR rate available to it accumulates
credits that it can use later in circumstances where it temporarily exceeds the
CIR rate.
When traffic exceeds the bandwidth that has been reserved for it by the CIR rate
defined in its policy, it becomes subject to the CBS rate. The CBS rate provides a
rate higher than the CIR rate to traffic that exceeded its CIR rate. The bandwidth
in the CBS rate is accumulated during periods of time when traffic that has been
defined by a policy does not use the full CIR rate available to it. Traffic is allowed
to pass through the port for a short period of time at the CBS rate.
When inbound or outbound traffic exceeds the bandwidth available for the
defined CIR and CBS rates, it is either dropped, or made subject to the
conditions set in the EIR bucket.
The EIR bucket

The EIR bucket provides an option for traffic that has exceeded the conditions set
by policy for the CIR bucket. In the EIR bucket, there are two parameters that
define the traffic that is available: the Excess Information Rate (EIR) and the
Excess Burst Size (EBS) rate. The EIR and EBS operate exactly like the CIR and
CBS except that they only act upon traffic that has been passed to the EIR
bucket because it cannot be accommodated by the CIR bucket.
Like the CIR, the EIR provides an initial bandwidth allocation to accommodate
inbound and outbound traffic. If the bandwidth provided by the EIR is insufficient
to accommodate the excess traffic, the defined EBS rate provides for burst traffic.
Like the CBS, the bandwidth available for burst traffic from the EBS is subject to
the amount of bandwidth that is accumulated during periods of time when traffic
that has been allocated by the EIR policy is not used.
In addition, to providing additional bandwidth for traffic that exceeds that available
for the CIR bucket, traffic rate limited by the EIR bucket can have its excess
priority and excess dscp values changed. Using this option, priority parameters
are set following the EBS value that change the priority of traffic that is being rate
limited using the EIR bucket.
Configuration considerations
The following considerations apply:
򐂰 Only one type of traffic policing policy can be applied on a physical port. For
example, port and-ACL-based and port-based traffic policing policies cannot
be applied on the same port.
򐂰 When a VLAN-based traffic policing policy is applied to a port, all the ports
controlled by the same packet processor are rate limited for that VLAN. It is
not possible to apply a VLAN-based traffic policing policy on another port of
the same packet processor for the same VLAN ID.
򐂰 The Multi-Service IronWare software supports VLAN-based traffic policing
that can limit tagged and untagged packets that match the VLAN ID specified
in the policy. Untagged packets are not subject to traffic policing.
򐂰 The maximum burst in a traffic policing policy cannot be less than the average
rate and cannot be more than the port’s line rate.
򐂰 Control packets are not subject to traffic policing.
򐂰 Source MAC address with Virtual Leased Line (VLL) endpoints are not
subject to traffic policing.
Rate limiting ARP packets

This feature allows you to rate-limit ARP traffic that requires CPU processing on
the m-series routers, such as ARP request traffic, and ARP response addressed
to the router. It is set globally and applies to all ARP traffic received at the router.
Using this command you can apply a defined policy map to all ARP traffic bound
for the CPU.

Note: When vlan-cpu-protection is configured, ARP request packets are
switched within a VLAN by the hardware and thus cannot be rate-limited by
the IP rate limit. To rate-limit ARP packets that are forwarded by the hardware,
the use of interface-level, Layer 2 inbound ACLs is required.
7.4.6 NetIron c-series QoS implementation

QoS implementation in the c-series device complies with the IETF-DiffServ and
IEEE 802.1p standards. A typical QoS model deployment is based on the
following elements:
򐂰 At the network edge, the packet is assigned to a QoS service. The service is
assigned based on the packet header information (that is, packet is trusted) or
on the ingress interface configuration (packet is not trusted).
򐂰 The QoS service defines the packet’s internal QoS handling (for example,
traffic class and drop precedence) and optionally the packet’s external QoS
marking, through either the IEEE 802.1p User Priority and/or the IP header
DSCP field.
򐂰 Subsequent c-series devices within the network core provide consistent QoS
treatment to traffic, based on the packet’s IEEE 802.1p, or DSCP marking. As
a result, an end-to-end QoS behavior is provided.
򐂰 A c-series device can modify the assigned service if a packet stream exceeds
the configured profile. In this case, the packet can be dropped or reassigned
to a lower QoS service.
򐂰 The c-series device incorporates the required QoS features to implement
network-edge as well as network-core switches/routers.
򐂰 The c-series device provides flexible mechanisms to classify packets into
different service levels.
򐂰 The packet header can have its User Priority fields set to reflect the QoS
assignment.
򐂰 Service application mechanism is based on eight egress priority queues per
port (including the CPU port), on which congestion-avoidance and
congestion-resolution policies are applied.
Traffic types
The c-series device uses the following traffic types:
򐂰 Data: The data packets can be either Network-to-Network traffic or traffic from
the CPU. Network-to-Network traffic is considered data traffic. QoS
parameters can be assigned and modified for data traffic.
򐂰 Control: Packets to and from the CPU is considered control traffic. The QoS
parameters for this traffic are preassigned and not configurable.
Packet header QoS fields

The c-series device supports setting or modifying the packet header IEEE 802.1p
User Priority and/or IP-DSCP.
Packet QoS attributes

Every packet classified as Data is assigned a set of QoS attributes that can be
modified by each ingress pipeline engine.
Each of the ingress pipeline engines contain several Initial QoS Markers that
assign the packet’s initial QoS attribute.
The ingress pipeline engine also contains a QoS Remarker that can modify the
initial QoS attributes.
Even though c-series device supports four drop precedence values 0,1,2 and 3
internally 1 and 2 are assigned the same drop precedence level. The four levels
are kept for CLI compatibility with other products of NetIron family. Three internal
level of drop precedence are 0, {1,2} and 3. in terms of commonly used color
based terminology: 0 represents Green (lowest drop precedence}, 1 and 2
represents yellow (higher drop precedence) and 3 represents Red (highest drop
precedence).
The packet QoS parameters are described in Table 7-13.
Table 7-13 Packet QoS parameters

QoS parameter Description
TC (Traffic Class) This is the priority level assigned to the packet. When the
TxQ enqueues the packet, it uses this field to select the
appropriate priority queue.
DC (Drop Precedence) The TxQ uses this field for congestion resolution. Packets
with higher drop precedence are more likely to be discarded
in the event of congestion.
Ingress traffic processing through c-series device

The QoS operation on ingress traffic of a c-series device involves reception and
processing of packets based upon priority information contained within the
packet. As the packets are processed through the device, there are several
opportunities to influence the priority.

Here we outline the steps of configuring QoS for the packet:
1. Derive priority and drop precedence from the packets PCP (IEEE 802.1p)
value. The Priority Code Point (PCP) is a 3-bit field within an IEEE 802.1Q
tagged frame that is used to convey the priority of the frame. By using a
mapping table, the 3-bit PCP field can be decoded to derive priority and drop
precedence information.
Note: The PCP field was formerly called IEEE 802.1p.
2. Derive priority and drop precedence from the packets DSCP value.
3. Force the priority and drop precedence value based on the value configured
for the physical port.
4. Force the priority value based on an ACL look-up. This is used for setting a
specific priority for and L2, L3 or L4 traffic flow.
Note: DEI value will remain 0 regardless of PCP or DSCP value.
Recognizing and mapping inbound packet priorities

In the first step, priority and drop precedence information from various portions of
the packet header are collected. The following sources are available:
򐂰 If a packet’s EtherType matches 8100, or the port’s EtherType, derive a
priority value and drop precedence by decoding the PCP value.
򐂰 For IPv4/v6 packets, derive a priority value and drop precedence by decoding
the DSCP bits.
The derived values for PCP, and DSCP are mapped to a default map.
In the second step, the router determines if priority value must be forced. The
following actions are possible in this step:
򐂰 If a packet’s EtherType matches 8100, or the port’s EtherType, derive a
priority value and drop precedence by decoding the PCP value
򐂰 if PCP forcing is configured on the port, the priority and drop precedence
values are set to the value read from the PCP bits
򐂰 If DSCP forcing is configured on the port, the priority and drop precedence
values are set to the value read from the DSCP bits
򐂰 If there is no forcing configured on the port the following rules apply:
– For tagged and untagged IPv4 packets - priority and drop precedence
values are obtained from decoded DSCP values.
– For tagged non-IPv4 packets - priority and drop precedence values are
obtained from decoded PCP values.
– For untagged non-IPv4 packets - priority and drop precedence values
obtained from priority and drop precedence assigned on a port. If no
priority and drop precedence is assigned on a port default value of Priority
0 and Drop Precedence 0 is picked.
Forcing the priority of a packet

After a packet’s ingress priority has been mapped, the values that will be used for
processing on the device are determined by forcing.
There are several ways to “force” the priority of a packet based on the following
criteria:
򐂰 Forced to a priority configured for a specific ingress port.
򐂰 Forced to a priority that is obtained from the DSCP priority bits.
򐂰 Forced to a priority that is obtained from the PCP priority bits.
򐂰 Forced to a priority that is based on an ACL match.

1. ACL match
4. PCP value in a tagged frame PCP field
Forcing the drop precedence of a packet

After a packet’s ingress drop precedence has been mapped, the values that will
be used for processing on the router are determined by forcing.
There are several ways how to “force” the drop precedence of a packet based on
the following criteria:
򐂰 Forced to a priority configured for a specific ingress port.
򐂰 Forced to a priority that is obtained from the DSCP priority bits.
򐂰 Forced to a priority that is obtained from the PCP priority bits.
򐂰 Forced to a priority that is based on an ACL match.
If forcing is specified on multiple levels the following precedence is used:

1. ACL match
4. PCP value in a tagged frame PCP field

Egress traffic processing through c-series device
The QoS operation on egress traffic of a c-series device involves marking
packets as they leave a device on the egress port.
The QoS value that a packet carries in its header when it exits a c-series device
on an egress interface is determined by a specified mapping. Unless configured,
this value, once determined, is placed in an internal queue by using one of the
default maps. This can be either enabled or disabled.
QoS configuration on LAG ports

For use of QoS on LAG ports, the following rules apply:
򐂰 Configurations where QoS values are applied directly to the port:
– Each port in the LAG must have the same priority, priority force,
drop-precedence and drop-precedence force configuration. In case ports
have different settings LAG cannot be formed.
– Configuration can be changed by altering the primary port.
– If LAG is deleted all ports will inherit the configuration from the primary
port.
򐂰 Configurations for all other QoS settings:
– Secondary ports in the LAG must not have any QoS values configured.
– All commands configured on the primary port are applied to all ports in the
LAG.
– After LAG is formed, QoS configuration for all ports can be changed by
changing the configuration the primary port
– If the LAG is deleted, the QoS configuration will only be retained on the
primary port.
Configuring QoS
To successfully implement QoS in an m-series router the following procedures
need to be performed on ingress and egress QoS processing.
Ingress QoS procedures

The following ingress procedures are available:
򐂰 Configuring a force priority: To be used when multiple QoS values can be
used to determine QoS level of a packet, merging or forcing can be used to
define the priority and drop precedence.
Egress QoS procedures
The following egress procedures are available:
򐂰 Egress encode policy on of off: To be used to map the internal priority to a
packet header when it exits the device.
Scheduling traffic for forwarding

If the traffic being processed by m-series routers is within the capacity of the
router, all traffic is forwarded as received.
After the point is reached where the router is bandwidth constrained, it becomes
subject to drop priority or traffic scheduling if so configured.
The m-series routers classify packets into one of eight internal priorities. Traffic
scheduling allows to selectively forward traffic according to the forwarding queue
that is mapped to, according to one of the following schemes:
򐂰 Strict priority-based scheduling: This scheme guarantees that higher-priority
traffic is always serviced before lower priority traffic. The disadvantage of
strict priority-based scheduling is that lower-priority traffic can be starved of
any access.
򐂰 WFQ weight-based traffic scheduling: With WFQ destination-based
scheduling enabled, some weight based bandwidth is allocated to all queues.
With this scheme, the configured weight distribution is guaranteed across all
traffic leaving an egress port and an input port is guaranteed allocation in
relationship to the configured weight distribution.
򐂰 Mixed strict priority and weight-based scheduling: This scheme provides a
mixture of strict priority for the three highest priority queues and WFQ for the
remaining priority queues.
Egress port and priority based rate shaping

Rate shaping is a mechanism to smooth out the variations in traffic above a
certain rate. The primary difference between rate shaping and rate limiting is that
in rate limiting, traffic exceeding a certain threshold is dropped. In rate shaping,
the traffic that exceeds a threshold is buffered so that the output from the buffer
follows a more uniform pattern. Rate shaping is useful when “burstiness” in the
source stream needs to be smoothed out and a more uniform traffic flow is
expected at the destination.
Note: Because excess traffic is buffered, rate shaping must be used with
caution. In general, it is not advisable to rate shape delay-sensitive traffic.

The m-series routers support egress rate shaping. Egress rate shaping is
supported per port or for each priority queue on a specified port.
Port-based rate shaping

When setting rate shaping for a port, it is possible to limit the amount of
bandwidth available on a port within the limits of the port’s rated capacity. Within
that capacity, it is possible to set the bandwidth at increments of 1 Kbps.
Port and priority-based rate shaping

When setting rate shaping for a priority queue, the amount of bandwidth available
for a specified priority can be limited, within the limits of the capacity of the port
that the priority is configured on. The limit for the priority can be set to any value
from one to the port’s maximum rating and the router will automatically round-up
the value to the next increment supported. Increment is the same as with
port-based rate shaping.
Note: The egress rate shaping burst size for a port and priority-based shaper
is 4096 bytes.
7.4.7 NetIron c-series traffic policies

The c-series router provides line-rate traffic policing in hardware on inbound
ports and outbound ports.
The c-series router can be configured to use one of the following modes of traffic
policing policies:
򐂰 Port-based: Limits the rate on an individual physical port to a specified rate.
Only one inbound and one outbound port-based traffic policing policy can be
applied to a port. These policies can be applied to inbound and outbound
traffic.
򐂰 Port-and-ACL-based: Limits the rate of IP traffic on an individual physical port
that matches the permit conditions in IP Access Control Lists (ACLs). Layer 2
ACL-based traffic policing is supported. Standard or extended IP ACLs can
be used. Standard IP ACLs match traffic based on source IP address
information. Extended ACLs match traffic based on source and destination IP
address and IP protocol information. Extended ACLs for TCP and UDP also
match on source and destination TCP or UDP addresses. and protocol
information. These policies can be applied to inbound and outbound traffic.
Up to 990 Port-and-ACL based policies can be configured for a port under
normal conditions or 3960 policies if priority-based traffic policing is disabled.
Traffic is initially traffic policed by a Committed Information Rate (CIR) bucket.
Traffic that is not accommodated in the CIR bucket is then subject to the Excess
Information Rate (EIR) bucket.
The CIR bucket

The CIR rate limiting bucket is defined by two separate parameters: the CIR rate,
and the Committed Burst Size (CBS) rate. The CIR rate is the maximum number
of bits a port is allowed to receive or send during a one-second interval. The rate
of the traffic that matches the traffic policing policy cannot exceed the CIR rate.
The CIR rate represents a portion of an interface's line rate (bandwidth),
expressed in bits per second (bps) and it cannot be larger than the port’s line
rate. CIR-defined traffic that does not use the CIR rate available to it accumulates
credits that it can use later in circumstances where it temporarily exceeds the
CIR rate.
When traffic exceeds the bandwidth that has been reserved for it by the CIR rate
defined in its policy, it becomes subject to the CBS rate. The CBS rate provides a
rate higher than the CIR rate to traffic that exceeded its CIR rate. The bandwidth
in the CBS rate is accumulated during periods of time when traffic that has been
defined by a policy does not use the full CIR rate available to it. Traffic is allowed
to pass through the port for a short period of time at the CBS rate.
When inbound or outbound traffic exceeds the bandwidth available for the
defined CIR and CBS rates, it is either dropped, or made subject to the
conditions set in it EIR bucket.
The EIR bucket

The EIR bucket provides an option for traffic that has exceeded the conditions set
by policy for the CIR bucket. In the EIR bucket, there are two parameters that
define the traffic that is available: the Excess Information Rate (EIR) and the
Excess Burst Size (EBS) rate. The EIR and EBS operate exactly like the CIR and
CBS except that they only act upon traffic that has been passed to the EIR
bucket because it cannot be accommodated by the CIR bucket. Like the CIR, the
EIR provides an initial bandwidth allocation to accommodate inbound and
outbound traffic. If the bandwidth provided by the EIR is insufficient to
accommodate the excess traffic, the defined EBS rate provides for burst traffic.
Like the CBS, the bandwidth available for burst traffic from the EBS is subject to
the amount of bandwidth that is accumulated during periods of time when traffic
that has been allocated by the EIR policy isn’t used.
In addition to providing additional bandwidth for traffic that exceeds that available
for the CIR bucket, traffic rate limited by the EIR bucket can have its excess
priority and excess dscp values changed. Using this option, priority parameters
are set following the EBS value that change the priority of traffic that is being rate
limited using the EIR bucket.

Configuration considerations
The following considerations apply:
򐂰 Only one type of traffic policing policy can be applied on a physical port. For
example, port and-ACL-based and port-based traffic policing policies cannot
be applied on the same port.
򐂰 The maximum burst in a traffic policing policy cannot be less than the average
rate and cannot be more than the port’s line rate.
򐂰 Control packets are not subject to traffic policing.
򐂰 Source MAC address with Virtual Leased Line (VLL) endpoints are not
subject to traffic policing.
8
Chapter 8. Voice over IP

In this chapter we introduce design considerations for Voice over IP (VoIP)
implementation. We cover the basics of VoIP and show how the IBM b-type
network portfolio can be utilized to build an efficient VoIP infrastructure.

8.1 Overview
Voice over IP is one of the key technologies used in modern enterprises. By
combining and aggregating data and VoIP into a single network infrastructure,
enterprises can achieve much higher savings and return of investment (ROI).
In the past, the enterprise networking infrastructure was mainly used to

exchange the data between various computer systems. In recent years, more
and more value add applications are starting to utilize the same networking
infrastructure. Because of these new applications starting to utilize the existing
networking infrastructure, the infrastructure has to be able to provide adequate
resources and service levels.
VoIP is one of those value add applications bringing additional resource and
service requirements to the existing networking infrastructure. Certain key
elements must be provided by the networking infrastructure for a successful VoIP
implementation.
The networking infrastructure has to provide the following key requirements:

򐂰 Support for Power over Ethernet (PoE): With PoE, single cabling
infrastructure provides data and power access for the VoIP devices. Because
the networking cabling infrastructure is usually in place where VoIP will be
deployed with PoE, there is no need for additional power cabling to support
VoIP devices. PoE also provides intelligence power management of the VoIP
devices.
򐂰 Support for Quality of Service (QoS): QoS is an essential feature to provide
the required service levels for VoIP traffic. It is important that VoIP traffic can
be protected from the effect of high volume data traffic. VoIP requires high
quality service with low jitter and latency.
򐂰 Support for enhanced security and storage features: With enhanced security,
phone policies can be enforced (for example, to provide compliance with
CALEA - Communications Assistance for Law Enforcement Act in the USA).
With storage features, VoIP traffic can be collected and monitored for billing
and accounting.
IBM s-series and g-series switches, with their support for PoE, enhanced QoS,
enhanced VoIP features such as dual-mode and voice VLANs, enhanced
security and storage features, are well positioned for successful VoIP
implementations.
8.2 Architecture overview
In this section we describe the architecture for the VoIP infrastructure using IBM
b-type networking switches.
There are two main ways that the traffic flows in VoIP environments:
򐂰 VoIP traffic inside the enterprise: In this scenario, all parties in the VoIP call
are using devices that are connected to the enterprise networking
infrastructure.
򐂰 VoIP traffic for external calls: in this case, one side of the VoIP call is outside
the enterprise networking infrastructure. The outside path can be either over
a PSTN gateway to classical phone exchange, or by a VoIP gateway to
another VoIP gateway.
8.2.1 VoIP call flow inside enterprise

Figure 8-1 shows the call flow with an internal VoIP call.
Figure 8-1 VoIP internal call flow
We can see all the components of the typical network infrastructure with VoIP
infrastructure elements:
򐂰 VoIP phones: The IP phones are used to initiate or receive the calls.
Chapter 8. Voice over IP 205

򐂰 Access layer switches (wiring closet): The VoIP endpoint devices are
connected here. Those switches typically provide PoE for the IP phones.
򐂰 Aggregation/core layer: On this layer usually high performance switches and
routers are used.
򐂰 Call manager, billing and accounting servers: Call manager is used to
co-ordinate the VoIP calls and billing and accounting servers (not shown on
the picture) are used for billing and accounting
򐂰 PSTN gateway: The PSTN gateway is used to provide calls from and to
outside phone exchanges
The typical call flow between two IP phones within the enterprise will have the
following steps, also shown in Figure 8-1 on page 205:
1. Phone A requests a call setup to the call manager. The call setup message is
sent from phone A to the call manager. The call manager identifies the IP
address of phone B.
2. The call manager initiates a call to phone B. In this step the call manager
sends a call setup message to phone B and A confirming the call.
3. During the call, phone A talks to phone B.
When the call is established, traffic flow is local, depending where the phones are
physically connected. There are two possible scenarios:
򐂰 Both phones connected to the same physical access layer switch: In this case
VoIP traffic during the call is processed inside the access switch.
򐂰 Phones are connected to different physical access layer switches: In this case
VoIP traffic during the call is traveling across the aggregation/core layer.
Note: Communication to the call manager is only active during call setup.
The VoIP network infrastructure must use QoS to provide low jitter and latency
for the VoIP traffic. It is important to understand that the right QoS is used to
prioritize both call setup and actual call traffic throughout the entire network
infrastructure.
To achieve this configuration, apply QoS features such as honoring priority

classification and ACLs across the entire infrastructure. An example is shown in
Figure 8-1.
IBM s-type and g-type switches provide all required capabilities for providing QoS
to server VoIP calls. QoS functions are also available on m-type routers which
can be used in aggregation/core layer of the infrastructure.
8.2.2 VoIP call flow for external calls
Figure 8-2 shows the call flow with an external VoIP call.
Figure 8-2 VoIP external call flow
The infrastructure for the external call is the same as described in 8.2.1, “VoIP
call flow inside enterprise” on page 205.
The typical call flow between the IP phone in the enterprise and external entity
will have the following steps, as shown in Figure 8-1 on page 205:
1. Phone A requests a call setup to the call manager. Call setup message is sent
from phone A to the call manager. Call manager identifies that this is an
external call.
2. Call manager initiates a call setup message over the gateway to the external
(PSTN) network. After this, call manager sends a call setup message to
phone A confirming the call.
3. Phone A can now make a call to the external (PSTN) network. During the call,
VoIP traffic is forwarded to the external (PSTN) network.
Just as for VoIP calls inside the enterprise, the network infrastructure must
provide adequate resources. In cases of external calls, the VoIP traffic is flowing
from the endpoint of the VoIP phone to the gateway that connects to the external
(PSTN) network. VoIP traffic must be prioritized on the whole path.

8.3 Resources
IBM b-type s-series and g-series provide the following key elements of a robust
VoIP infrastructure:
򐂰 QoS: Quality of service for VoIP traffic prioritization
򐂰 VoIP traffic statistics
򐂰 PoE: Power over Ethernet for providing power to PoE enabled VoIP devices
򐂰 VLAN features for VoIP traffic
򐂰 Security features for VoIP traffic
8.3.1 QoS for VoIP traffic prioritization

Because it is very important to provide low jitter, low latency, and high quality
traffic for VoIP, the VoIP traffic needs to be correctly prioritized. Such prioritization
can be achieved using QoS. Correctly designed QoS ensures that VoIP traffic is
not affected by high data traffic volumes.
IBM FastIron based switches (s-series and g-series) provide a rich set of features
in their QoS set which can help to manage the QoS correctly and thus enforce
strict QoS policies.
There are several possible scenarios regarding how to handle VoIP traffic:
򐂰 VoIP devices are already marking the QoS priority (802.1p) and network
equipment is honoring this marking.
򐂰 VoIP devices are marking the QoS priority, but network managers want to
change the priority on the path through their environment.
򐂰 VoIP devices are not marking the QoS priority and network managers need to
provide priority for the traffic.
Based on these conditions, in the following sections we describe the options that
are available on IBM FastIron based switches.
Default behavior
In this case, the IBM FastIron based switches will honor the 802.1p priority set by
the VoIP device. There is no need for additional QoS setup if the VoIP device
priority setting is sufficient to provide good enough quality for the VoIP traffic.
When a tagged packet on Layer 2 or Layer 3 enters the switch, it is queued

according to the CoS value in the 802.1p header.
When a packet enters and exits the switch by a tagged port, its 802.1p value will
be maintained. For example, if the packet had an 802.1p value of 5, it will exit the
switch with the same value (if there was no QoS configuration applied on this
packet).
When the packet is untagged, it has a default priority of zero and it will be queued
to the queue with the lowest priority. In FastIron switch terminology, this queue is
called qosp0.
Even if this packet will exit on the tagged port, the 802.1p value will be set to
zero, unless QoS configuration was applied to modify the 802.1p priority.
The FastIron switches have eight internal hardware queues (each queue has a
value that ranges from 0-7), and each 802.1p will map the packet in one of the
queues so that the switch can forward the traffic according to the queue priority.
This mapping is shown in Table 8-1.
Table 8-1 802.1p priority to internal queue mapping

802.1p priority Internal queue
7 qosp7
6 qosp6
5 qosp5
4 qosp4
3 qosp3
2 qosp2
1 qosp1
0 qosp0
Port priority
Port priority can be applied on a per port basis on all s-series and g-series
switches. With this setting, all traffic coming into the port, with port priority set,
will be subject to this internal priority mapping.
The internal priority mapping maps the traffic to the internal queues, and these
are the same as described in Table 8-1.
The port priority setting will never affect the DSCP value of the packet. It is only
used for internal prioritization for egress queuing and to set the 802.1p value
when the packet comes in untagged and leaves by a tagged interface.

802.1p remarking using ACLs
Sometimes the 802.1p priority provided by VoIP devices does not fit the company
network QoS policy, or perhaps the company is using different VoIP models that
provide different 802.1p priorities. In such cases, the 802.1p priority can be
remarked when VoIP packets enter the switch by the use of ACLs.
To remark a VoIP phone 802.1p priority, FastIron based switches offer two ways
to identify the device:
򐂰 VoIP phone IP address
򐂰 VoIP phone DSCP value
Based on these parameters, the packet 802.1p priority can be remarked to the
desired value. On top of this, the internal priority (which will be used in the switch)
can be modified.
The 802.1p packet priority will also be used when the packet leaves the switch.
Honoring DSCP values

The 802.1p priority values are honored by default regardless of the running
image on the switch (Layer 2, base Layer 3, and full Layer 3). To honor DSCP
values, ACLs must be configured.
The mapping of DSCP to internal switch queues is shown in Table 8-2.
Table 8-2 DSCP to internal queue mapping

DSCP values Internal queue
0-7 qosp0
8 - 15 qosp1
16 - 23 qosp2
24 - 31 qosp3
32 - 39 qosp4
40 - 47 qosp5
48 - 55 qosp6
56 - 63 qosp7
DSCP remarking using ACLs
If required, the DSCP values can be remarked similarly to the *02.1p priority
values. In the remarking process, VoIP devices can be identified in two possible
ways:
򐂰 VoIP phone IP address
򐂰 VoIP phone DSCP value
8.3.2 VoIP traffic statistic

IBM s-series and g-series FastIron based switches provide traffic policy counters
for VoIP traffic. These counters can be used, for example, for statistical or billing
purposes.
VoIP traffic statistics can be by using traffic policy, or by using extended ACLs for
a VoIP phone matching IP or traffic.
8.3.3 PoE for VoIP devices

IBM s-series and g-series switches provide Power over Ethernet (PoE) capability
to power devices. After PoE is enabled on the particular port, this port will deliver
15.4W of power. There are several ways to reduce the power which is delivered
to the port. It is best to optimize the usage of the power on the ports.
Power reduction
The following options are available for the power reduction:
򐂰 It is possible to configure exact value of delivered power in milliwatts
򐂰 For devices which are 802.3af compliant the power can be configured by
802.3af power class:
– For class 1 devices, 4 watts will be available
– For class 2 devices, 7 watts will be available
– For class 3 devices, 15.4 watts will be available

Legacy PoE devices
All devices that do not comply to 802.3af standards are considered as “legacy
PoE devices”. IBM s-series and g-series switches automatically support the
power for legacy PoE devices.
IBM s-series switches can be configured not to support legacy PoE devices.
Power priority
It is possible to configure power port priority. With this you are able to specify that
some ports get a higher priority in a case when not enough PoE power is
available. Such a situation can happen when not enough PoE power is available
because of PoE power supply failures, or power line failures. With such a setup
most critical VoIP devices can get higher priority over less critical VoIP devices.
Power requirements from the device

Some of the devices have the capability to request the exact amount of power
required. IBM s-series and g-series switches listen for such requests and will
then assign the requested power. The device sends the request using CDP.
There is no need for any additional configuration.
After the request is specified, the switch will allocate the required power if it has
enough power available. If the switch does not have enough power resources to
support the request, it will not allocate any power to the port.
If there is explicitly defined power delivery, by milliwatts or power class, this will
take precedence over the device power request. For example, if the port is
defined to support class 3 PoE devices (15.4W) and the device requests 10W,
then the port still delivers 15.4W of power.
8.3.4 VLAN features for VoIP traffic

Several VLAN features are provided for efficient VoIP implementations, including
support for voice and data on the same port and the flexibility to move VoIP
devices between different switch ports.
Dual mode ports
Usually VoIP provides a physical connection to the computer device. With such a
setup, only one switch port is required to connect the VoIP device and the
computer. An example is shown in Figure 8-3.
Figure 8-3 Dual mode ports
In this setup, traffic from the VoIP device is tagged and the traffic from the
computer device is untagged. In some cases it will be required that VoIP device
traffic is part of its own VLAN, and computer traffic is part of its own VLAN. In our
example in Figure 8-3, VoIP traffic is part of VLAN 20 and computer traffic is part
of VLAN 10.
With the dual mode port feature FastIron switches can support tagged and
untagged frames in different VLANs. In our example in Figure 8-3 port e1 is
configured in dual mode. In this setup the e1 port can transmit and receive
tagged VoIP traffic (which is part of VLAN 20) and at the same time it can classify
untagged traffic received on the port from the computer device to VLAN 10.
Dual mode can be configured on a per interface basis. If the VLAN ID parameter
is omitted any untagged packet received on the port will be classified to VLAN 1,
otherwise the packet will be classified as specified in VLAN ID.

Voice VLANs
Voice VLANs are used to automatically reconfigure VoIP devices when they are
moved from one port to another port on a switch. This ensures a seamless
transition when a VoIP device is moved to another port.
When a port is configured to use voice VLAN and it receives a query from a VoIP
device it will respond back with the voice VLAN configured ID. After this the VoIP
device will reconfigure itself to use this VLAN ID for sending VoIP traffic.
Note: If the VoIP device does not generate a query to the switch, or it does not
support autoconfigure of VoIP VLAN ID, it cannot support the voice VLAN
feature.
Figure 8-4 shows the steps when the VoIP device is moved to another port:
1. The VoIP device is moved to another port on the same device.
2. The VoIP device sends query message to the switch.
3. The switch responds with the voice VLAN ID.
4. The VoIP device auto reconfigures itself for the use of provided VLAN ID.
Figure 8-4 Voice VLAN
8.3.5 Security features for VoIP traffic
Several security features are available in VoIP implementations using IBM
FastIron based switches:
򐂰 BPDU guard
򐂰 Port flap dampening
򐂰 MAC filter override for 802.1x enable ports
򐂰 CALEA compliance with ACL based mirroring
򐂰 Tracking VoIP data to multiple servers
BPDU guard
The Bridge Protocol Data Unit (BPDU) guard is used to remove loops which are
created by devices on a Layer 2 network. Some of the VoIP devices are known to
put themselves into loopback mode as shown in Figure 8-5.
Figure 8-5 BPDU guard
When loopback happens, the whole Layer 2 network can be affected.
BPDU guard is a spanning-tree feature that adds an extra degree of protection to

the switch in the event of the physical loop.
Port flap dampening

With the port flap dampening feature, it is possible to limit the number of times
when a port flaps (goes from an up state, to a down state, and back to an up
state) in a given time period. In cases where that port flaps more than specified,
it is disabled automatically. Additionally, the network is protected from frequent
state transitions that can destabilize the entire network.

MAC filter override for 802.1x enabled ports
There are situations when a VoIP device and a computer device are connected to
the same switch port, and 802.1x security is enabled as shown in Figure 8-6.
Figure 8-6 802.1x override
In this scenario, the VoIP device is not 802.1x compliant and it will fail
authentication. With MAC filter override, if a device’s MAC address (the VoIP
device in this case) matches a “permit” clause, the device is not subjected to
802.1x authentication and it is permitted to send the traffic. Similarly, if a device
matches a “deny” clause it is blocked before being subjected to 802.1x
authentication.
CELEA compliance with ACL based mirroring

The ACL based mirroring feature can be used to achieve CELEA compliance.
With this, network administrators can create ACL based mirror to capture a
particular VoIP stream from the VoIP device IP address. After the VoIP stream is
mirrored and captured it is sent out to an external CELEA compliance server.
Tracking VoIP data to multiple servers

Sometimes it is required that the VoIP stream be replicated to multiple servers,
for example, for CELEA compliance, and also for accounting purposes.
Replication can be achieved with the combination of ACL based mirroring and
using the MAC learning disabled feature.
A setup for tracking of VoIP data to multiple servers is shown in Figure 8-7.
Figure 8-7 Tracking of VoIP data to multiple servers
The following steps describe the entire process:

1. VoIP and data traffic are forwarded to a single switch port.
2. ACL based mirroring is used to mirror VoIP device IP VoIP traffic.
3. Mirrored port (e13) is physically connected to the port with MAC-learning
disabled (e20). This port is on a separate VLAN, in this example, VLAN 20.
4. As MAC-learning is disabled on port e20, VoIP traffic will be flooded in VLAN
20. All servers connected to VLAN 20 will receive VoIP traffic from the
mirrored VoIP device.

9
Chapter 9. Security
In this chapter we delve into the security options available in the IBM b-type
Ethernet product range, including the following options:
򐂰 Console access
򐂰 Port security
򐂰 Spanning-tree security
򐂰 VLAN
򐂰 ACLs (Layer 2 and Layer 3 or Layer 4)
򐂰 Anti-spoofing protection
򐂰 Route authentication
򐂰 Remote access security

9.1 Security in depth
As computing became more pervasive in society, the ability for computing
equipment to be used for unauthorized purposes became an issue. Best practice
today is to ensure that security is enabled at every tier of the network, from the
connectivity tier to the access tier. Security is being deployed on a firewall and
Intrusion Prevention System (IPS) for WAN connections, personal firewalls on
user workstations, restricting available services, and various forms of IPS for the
application servers. Although firewalls and IPS also provide protection for the
network, it is nevertheless important for every network device to be secured.
Network device security can be considered to encompass the five (5) lowest OSI
layers:
򐂰 Layer 1, physically secure the equipment.
򐂰 Layer 2, secure link protocols such as MAC & spanning tree
򐂰 Layer 3, secure logical access, as well as restricting IP connectivity
򐂰 Layer 4, restrict the traffic available in segments
򐂰 Layer 5, secure the available services on network equipment
Not all security options described below are available on all products. Be sure to
check your product configuration guide for availability of the security options that
you require.
9.2 Layer 1 security

Physical, Layer 1 security is required for any asset, and is just as true for network
equipment as it is for any other asset. All network devices must be physically
secured from general access and controls must be in place regarding who can
gain physical access and when they can gain access.
Note: How to provide physical security is beyond the intended scope of this
book.
9.2.1 Secure console access

The console port is, by necessity, a very powerful access point. It also uses
typical serial port connectivity so any workstation with a serial port can access
the network device. A few simple security settings can make reconfiguration from
the console port outside of easy access for an inquisitive person with physical
access to the facility.
Enable password
Although anyone with physical access to the devices can connect and see the
data in user mode, all sensitive information is displayed in encrypted format. To
ensure that configuration changes are not possible, an enable password must be
set. Without the enable password no configuration changes are possible. This
level of authorization allows configuration of interfaces.
Super-user password
To allow a console user to configure all parameters on the device the super-user
password is used.
Console time-out
By default, the console port will allow a user to remain logged in and active
indefinitely. This can result in a console being left connected in enable mode
unintentionally. It is preferable to set a reasonable time-out value for example 2
minutes. This will ensure that the console will not be left in enable mode by
mistake. The relevant product configuration guides provide more information
about this setting.

Layer 2 security can cover areas such as network equipment insertion, MAC
address spoofing, VLAN circumvention, and data flooding.
9.3.1 MAC port security

Well meaning end-users can sometimes connect another switch to their LAN port
with the intent of allowing multiple machines to use their single LAN connection.
Other times a user might bring their own computer into the office and connect it
to the LAN. It might be preferable not to allow unknown systems to connect to the
enterprise network.
Port security allows the network administrator to restrict the number of MAC
addresses permitted on each port. The approved or secured, MAC addresses
can either be manually configured or learnt dynamically. In a static environment it
is even possible to have the learnt MAC addresses saved into the device startup
configuration file. It is important to ensure that your port security settings match
your business needs. With the deployment of VoIP it is not unusual to have two
(2) MAC addresses per interface, some computers also run virtualization
software that can also apply a unique MAC address for every virtual instance.
Desk flexibility will also create an environment where the MAC address will
change when a user connects their laptop to the port.
Chapter 9. Security 221

The network device can be configured to retain a static list of MAC addresses, or
a more dynamic list that will age out the secure MAC addresses over time. In an
environment where things do not change often, a static list might be acceptable.
For example, if a call center has workstations that are fixed to the desk or a highly
secure facility where workstations do not move about.
However, even in those cases, the workstation will need to be changed at some
point during technology refresh cycles. The environment that requires static MAC
addresses over lengthy periods of time is assumed to be an exception in today’s
dynamic workplace. Therefore, IBM advise the majority of environments who
chose this security option, to always configure a suitable age out timer for these
secure MAC addresses.
The MAC port security setting also has a default action to drop traffic from MAC
addresses not in the secure MAC list. Another configurable option is to shutdown
the interface for a defined time period, this will halt all data communications to
that port until the time period has expired. Regardless of which option is best for
your network, IBM advises the use of a suitable time-out value.
Consider the case of the mobile workforce and drop-in desks. Assume a fairly
dynamic workforce where it is possible for a desk to be used by more than four
(4) people in a work day (8 hours). In this case, setting port security to 5 dynamic
MAC addresses, with a time out of 120 minutes (2 hours), and a violation action
to shut down the port for 5 minutes can allow the port to be used for the expected
daily operations of the business. Any violation or change to the business will shut
the port down for only five (5) minutes.
Note: If using Port Security, ensure that the following precautions are in place:
򐂰 Suitable maximum MAC addresses per port are configured.
򐂰 Time out MAC addresses if using dynamic learning.
򐂰 Set a time period for the violation action.
9.3.2 Multi-device port authentication

Another option for port security is multi-device port authentication, also called
MAC Address Authentication. This feature works by using the MAC address as
the username and password for authentication with a Radius server. During the
initial ARP setup, the switch sends an authentication request to the Radius
server. If the Radius server provides an Access-Accept response the port is
activated and placed in the configured VLAN.
In the case where the MAC address has not been registered, the Radius server
will reply with an Access-Reject message indicating the authentication has failed.
If the MAC address is not authenticated the default action is to drop all packets in
hardware. Alternatively, the network device can be configured to place the port
into a different, untrusted VLAN. For example, if a visitor to the site connects their
workstation to the port, they can be provided access to a guest VLAN that can
then provide limited access to the network.
The authentication feature can also be used to configure different devices into
different VLANs. For example the VoIP handset can be authenticated and
configured into the VoIP VLAN, while the workstation can be configured into the
corporate VLAN or the Guest VLAN.
9.3.3 802.1x supplicant authentication

The 802.1x standard defines a protocol for the supplicant (user) to provide their
username and password to the system for authentication. The 802.1x protocol
allows communications only between the workstation and the Radius
authentication server. This feature provides an alternative to utilizing the MAC
address for authentication in a secured LAN environment.
9.3.4 STP root guard

Within a spanning tree environment, it might be preferable for the network
administrator to both force the spanning tree root device as well as be alerted to
any change of STP root. This can allow the network administrator to control the
topology of the Layer 2 network.
STP root guard allows a device to be connected to a spanning tree port to

participate in spanning tree, but not become the root. All other functions of
STP/RSTP/MSTP will operate as normal. However, if the protected port receives
a superior Bridge Protocol Data Units (BPDU) packet it will block the port and
send SNMP and Syslog alerts. This protects the spanning tree topology from
rogue or misconfigured systems being connected in to the Layer 2 environment.
The port remains in blocked state until the superior BPDU packets are no longer
seen and the time out value has expired. The port will then be placed back into
forwarding mode.
Important: STP Root Guard must only be deployed at the edge of the
network, it must never be deployed in the core.

9.3.5 BPDU guard
Related to STP root guard, but providing a different function is BPDU guard.
Spanning tree protocols (STP/RSTP/MSTP) utilize BPDU to exchange
information within the tree. Many systems can participate in STP including end
stations, switches and other Layer 2 devices. Any of the participating systems
can initiate a BPDU packet which can potentially reconfigure the spanning tree.
BPDU guard restricts an end station from inserting BPDU packets into the
spanning tree topology. For an end station to attempt to participate in BPDU, it is
considered an attack, or a Layer 2 device has been inserted into the wrong port.
For this reason BPDU guard will block the port if a BPDU packet is received and
will not automatically enable the port. Instead the network administrator must
enable the port from the command line after ensuring the initiator of the rogue
BPDU has been fixed or identified.
Note: The network administrator must manually unblock a port where BPDU
Guard has blocked it. This is to allow the network administrator to identify the
cause of the unexpected BPDU packet and rectify it.
9.3.6 VLAN security

VLANs are not security controls, they are simply the logical separation of Layer 2
segments. Depending on the environment and the requirements of your network
security teams, the deployment and mixing of VLANs can be restricted. Current
industry standards do not deploy an extranet VLAN on the same physical
hardware as a corporate VLAN. However, different extranet VLANs can be
permissible on the same physical hardware with the hardware function specified
as “Extranet Only”. Separately, different corporate (intranet) VLANs can be
permissible on the same physical hardware with the hardware function specified
as “Intranet Only”
A similar industry best practice is to only allow VLAN tags to appear on

infrastructure ports while discarding packets from access ports if they contain
VLAN tagging information.
The IBM Ethernet switch LAN configuration allows VLANs to be defined with
tagged or untagged ports. Tagged ports allow 802.1Q tagged packets to be
received. Untagged ports will drop any packet with an 802.1Q tag.
Dual mode ports: With the growing deployment of VoIP, it is common for an
access port to have a VoIP handset connected as well as a workstation
connected to the VoIP handset. Industry standard practice is to configure the
VoIP handset into one VLAN and the workstation in another VLAN. Furthermore,
the VoIP handset will typically tag the traffic with the VoIP VLAN ID using 802.1Q.
In these cases, the IBM Ethernet products allow each port to be configured as a
dual mode port.
9.3.7 VSRP
The Virtual Switch Redundancy Protocol (VSRP) can be configured to provide a
highly available Layer 2 switch environment as described in Chapter 6, “Network
availability” on page 141.
Depending on where VSRP is deployed and the topology of your network, it

might be preferable to deploy authentication for your VSRP configuration.
Because VSRP uses the connected networks to communicate it can be feasible
for unauthorized VSRP packets to be present, or simply to ensure that multiple
instances of VSRP do not accidently interfere with each other.
Each VSRP can be configured to use authentication there are two types of
authentication available:
None The default; does not use authentication
Simple Utilizes a simple text-string as a password
Whichever authentication method you use, all switches within that virtual group
must be configured with the same VRID and authentication method.
VSRP authentication is configured from the VLAN with the command:

vsrp auth-type simple-text-authentication yourPassWordHere
9.3.8 Layer 2 ACLs

The IBM Ethernet router products (c-series and m-series) support the use of
Access Control Lists (ACLs) at Layer 2. An interface can be configured to use
either Layer 2 ACLs or IP ACLs, not both.
Layer 2 ACLs filter incoming traffic based on any of the following Layer 2 fields in
the MAC header:
򐂰 Source MAC address
򐂰 Destination MAC address
򐂰 VLAN ID
򐂰 Ethernet type.

For more information, see the specific Product Configuration Guide for the IBM
Ethernet Routers.

Layer 3 security can be considered to cover the IP layer, items such as protecting
route tables and IP addresses are of concern in this layer.
9.4.1 Dynamic ARP Inspection

The man-in-the-middle (MIM) attack known as ARP cache poisoning occurs
when a malicious user sends an ARP packet that contains incorrect information.
The information might be advertising that the malicious user has the IP address
of a target system, such as the default router. Such information can then be
propagated to all systems connected to the VLAN. In the case where the default
router information is poisoned, all systems send their data to the MAC address of
the poisoned ARP entry.
The IBM Ethernet products can be configured with Dynamic ARP Inspection
(DAI). After DAI is enabled on a VLAN, the IBM Ethernet product will intercept
and examine all ARP packets within that VLAN. The ARP packet will be
inspected by the CPU and discarded if the information is found to contain invalid
IP to MAC address bindings.
9.4.2 DHCP snooping

One potential instance of a man-in-the-middle (MiM) can consist of a malicious
user acting as a DHCP server, by sending out DHCP response packets. These
response packets can direct a computer to use a different IP address which can
force the IP traffic to the MiM system, which then acts as a router. This MiM
system then is able to see every packet passing being routed through it, with the
potential to extract sensitive data if it is not strongly encrypted on the LAN.
To overcome this MiM attack, the network administrator can configure DHCP
snooping. This feature works by blocking any DHCP server packets from user
(end-point) ports and only permitting DHCP server packets from a trusted port.
DHCP snooping is configured per VLAN and will accept a range of VLANS.
There are two steps to configuring DHCP Snooping:
1. Enable DHCP snooping per VLAN:
ip dhcp-snooping vlan 123
2. Set the trusted port:
interface ethernet 1/1
dhcp-snooping-trust
9.4.3 IP Source Guard

IP Source Guard prevents IP spoofing by ensuring the source IP of a packet
matches that of the IP and MAC binding table. It is typically used in conjunction
with both Dynamic ARP Inspection (see 9.4.1, “Dynamic ARP Inspection” on
page 226) and DHCP snooping (see 9.4.2, “DHCP snooping”). If an IP packet is
received that has a different source IP than the one expected, that packet is
dropped.
When first enabled IP Source Guard only permits DHCP traffic to flow through
the untrusted ports. It then learns the IP addresses from the ARP table. IP traffic
is only passed through the network device after the IP address is learnt and the
source address of the packet matches the learnt source address.
9.4.4 VRRP authentication

As discussed in Chapter 6, “Network availability” on page 141 Virtual Router
Redundancy Protocol (VRRP) allows multiple routers to be configured in a group
for a standby to take over the IP and MAC addresses of the primary router in the
event of a failure. When using VRRP configurations, authentication will ensure
that no rogue routers attempt to join the VRRP group.
VRRP authentication is an interface level command. After the VRRP has been
defined, the network administrator can then configure the VRRP interface with
the authentication type and password. There are two (2) authentication modes
available today:
None This is the default and does not utilize any authentication.
Simple The port can be configured with a simple text password, this
password will be used for all communications from that port
Just as the VRRP group of routers will all need the same VRID configured, the
interfaces assigned to the VRRP on each router need the same authenticating
method and password. Failure to set the same authentication method and
password will result in VRRP not activating resulting in a single point of failure.
Note: If VRRP Authentication is configured on one interface of the VRRP

group, all interfaces participating in the VRRP group must have the same
VRID and authentication method configured.

9.4.5 OSPF authentication
Protecting your route table can ensure that only authorized networks are routed
in your network. We don’t want a malicious, or even accidental, injection of an
unexpected route. For example, a malicious user might set up a fake website
internal to your organization with the intent to obtain user names and passwords
for the target site. While poisoning the DNS is also possible, and a topic for
another book, protecting your route table will make it more difficult to redirect
traffic to an unauthorized network.
OSPF can be configured with authentication types:

None No Authentication
Simple Plain text password, up to 8 alphanumeric characters.
MD5 An MD5 hash is used with a key length up to 16 alphanumeric
characters.
When using a simple authentication, the default is for the password to be stored
and displayed in an encrypted format. The network administrator can optionally
configure to display and save the password in plain text. When authenticating the
simple password is decrypted and passed in clear text.
The MD5 hash key is stored as the hash and up to 255 different keys can be
defined.
Regardless of the authentication method the password or key is included in every

OSPF packet. For each router in the OSPF area, the same authentication
method and same password or MD5 key must be configured. If a router received
an OSPF packet that does not contain the valid authentication credentials that
packet is dropped.
9.4.6 BGP password

The BGP password is used to authenticate neighbors though the use of an MD5
hash. The BGP password is defined for each neighbor and has a length up to 80
alphanumeric characters, no spaces are permitted in the password and the first
character cannot be a digit.
By configuring a password, you ensure that only authorized neighbors are

inserting routes into your BGP route tables. If a peer device is replaced by an
unauthorized device, for example, by IP Spoofing or other methods to steal an IP
address, the device will not be able to peer as it will not have the password for the
MD5 hash creation.
By default, the IBM Ethernet routers assume that the BGP password is being
entered in clear text and it is stored and displayed in encrypted format. It is
possible to enter a password that is already encrypted. Regardless of password
entry method the password is decrypted before use.

Security in Layer 4 is primarily focused on protocol based controls. Firewalls are
the typical control point at this layer. The IBM Ethernet products discussed in this
book are not Firewalls, but can provide some Layer 4 protection as described
below.
9.5.1 IP ACLs
To restrict packets based on a common rule, the use of Access Control Lists
(ACLs) is suggested. These do not replace other security devices such as a
firewall or Intrusion Prevention System (IPS).
It is important to remember that the intent of a router is to route traffic, so the

default configuration does not have any ACLs and all traffic is transmitted
according to the route table. However, once an ACL is defined as a security
function, only permitted traffic will be permitted to be transmitted, the default
action will be to deny any traffic not explicitly permitted.
As with most implementations of an ACL based filter there are two types of ACLs
available, Standard and Extended.
Standard ACLs
The Standard ACLs are numbered from one (1) to ninety-nine (99). These are
used for IP control and as such are really a Layer 3 security function. However
we will cover them here with the other IP ACLs.
Standard ACLs can provide coarse security controls by controlling data flows
from the defined source IP address. The IP address can be a host or a network.
Standard ACLs are typically used to deny a single IP host or network, or in Policy
Based Routing.
Extended ACLs
The Extended ACL can be defined numerically from 100 to 199. Extended ACLs
allow more detailed control of IP traffic by allowing the definition of any field in the
TCP/IP packet.

The following fields can be defined:
򐂰 Protocol
򐂰 Source Address
򐂰 Destination Address
If the IP protocol is TCP or UDP, the Extended ACL can also define restrictions
based on the following values:
򐂰 Source Port(s)
򐂰 Destination Port(s)
򐂰 TCP Flags
If the IP protocol is ICMP, the Extended ACL can also define restrictions based
on the following values:
򐂰 ICMP Type
򐂰 ICMP Code
Named ACL
Both types of ACLs can be defined by a name. If this method is chosen the
configuration commands are slightly different as the administrator must define
whether the ACL is a standard or extended.
The IBM Ethernet Routers can have up to 100 named ACLs defined as standard
ACLs, and up to 500 named ACLs defined as extended ACLs.
9.5.2 Per-port-per-VLAN ACL application

Traditional applications of an ACL for security purposes define the ACL on the
port so that all traffic on that port is filtered, which is acceptable when there is
only one service on each physical port. However, as we have already seen in
Chapter 8, “Voice over IP” on page 203, many services today allow more than
one service per port, such as dual-mode or 802.1q tagged interfaces. The
per-port-per-vlan instruction allows the network administrator to enable an
ACL to be applied to a single VLAN on a port.
This feature can be used on an access switch to secure traffic from an

end-station on one VLAN, while allowing VoIP traffic to pass through on a
separate VLAN.
Another use of this function is to centralize security functions on a single switch.

In this case a number of access switches can utilize VLAN tags to pass traffic
through a single distribution or aggregation switch. This central distribution switch
can then apply the same ACL to the tagged VLANs on each port, thus reducing
potential errors in manually replicating ACLs to multiple access switches.
9.5.3 Protocol flooding
The IBM Ethernet products can pass traffic at wire speed due to their
architectural design. However, not all network devices can process traffic at wire
speed, nor is it always preferable to allow traffic to flood the network. Whether the
traffic is broadcast, multicast or unicast traffic flooding can be detrimental to
devices and systems.
The IBM Ethernet switches can be configured to restrict the number of flood
packets, or bytes, per second. Protocol Flooding can be enabled at the port level.

Security at Layer 5 is concerned with the session, for the focus of this book we
will look at the available options when initiating a session to manage the IBM
Ethernet device. There are many known attacks on standard management
methods and there are simple mitigations to deter these attacks. However, the IS
security industry still record attempts to access network devices for malicious
reasons on a regular, almost daily basis.
The IBM Ethernet products allow for common mitigations and strongly advise all
network deployments to enable simple security measures to reduce the risk of a
network device being used for a malicious purposes.
9.6.1 System security with ACLs

As with most current network devices, ACLs are available to restrict access to
system functions as well as to restrict traffic passing through the device. Network
management functions must always communicate with known networks. Of
course the preference is for management functions to communicate with a
restricted set of known IP addresses, this is not always feasible and the best that
can be achieved is a known network range.
ACLs can, and must, be configured to restrict access to Telnet, SSH, HTTP, SSL
and SNMP functions.
9.6.2 Remote access

While it is possible for each network device to have local users defined within the
configuration file, this method is not scalable and can lead to different passwords
on different devices. This is not ideal as it can lead to delays in the rectification of
any issues or deployment of new services. Therefore most network infrastructure
deployments will utilize a centralized authentication server.

The central authentication server provides Authentication and Authorization
functions, Accounting can also be provided on the same server or a different
server. Together Authentication, Authorization, and Accounting (AAA) can
provide security controls for various purposes.
User authentication can be provided by RADIUS, TACACS, or TACACS+ servers.

Some configuration settings can be defined to use different servers for different
functions.
Authentication
Authentication is the process of confirming the access attempt has credentials
that are recognized. This is usually thought of as the username and password
that you enter to access a system.
Authorization
After being authenticated, network devices will then also check the authorization
of the user. Authorization defines the instructions the user has been permitted to
use once authenticated.
Accounting
Tracking what a user does and when they do it can provide valuable data both in
forensics and in problem determination. Accounting logs can be used to identify
when a change was made that might have caused a fault. Similarly, it can be
used after an event to discover who accessed a system and what they did.
9.6.3 Telnet / SSH

While Telnet is available for network management its greatest vulnerability is that
the password is passed in clear text. Similarly any configuration commands also
traverse the network in clear text. Therefore anyone with access to sniff the
network traffic can also capture the password, or configuration commands. While
the standard response is that anyone with access to sniff the network probably
already have the password, it is still preferable to utilize SSH.
SSH provides all the functionality of Telnet, but provides encryption of the entire
communications stream. Furthermore, it can also be configured to only permit
connectivity from known systems that have the system’s public key uploaded to
the Network Device.
9.6.4 HTTP / SSL
In case your deployment also requires the use of the web interface, be aware that
HTTP has similar vulnerabilities to Telnet. The HTTP protocol transmits all data
in clear text. Rather than HTTP, it is preferable to enable the SSL protocol to
ensure that system administration data is encrypted wherever possible. The
relevant product system configurations guides provide more detail on enabling
this feature.
9.6.5 SNMP
Simple Network Management Protocol (SNMP) versions 1 and 2 utilize
community strings to provide simple authentication, in order to restrict what
systems can access SMTP. However, devices always set a default community
string which is well known. In addition to the use of ACLs discussed in 9.6.1,
“System security with ACLs” on page 231, the network administrator must
change the SNMP community string to a non-trivial string.
The SNMP community string is transmitted in clear text, however the IBM
Ethernet devices display the SNMP community string in an encrypted format
which is displayed when a network administrator has read-only access to the
device.

10
Chapter 10. Stacking the g-series IBM

Ethernet Switch
In this chapter we discuss the stacking configuration options of the IBM g-series
Ethernet Switch. Through examples, we demonstrate the redundancy of the
different stack configurations and provide best practice advice when using a
stack.

10.1 IBM g-series stacking overview
The IBM g-series Ethernet Switch supports both Linear and Ring stacking
topologies. Stacking is used to provide flexibility through the growth of the
enterprise. As an enterprise expands, the stack can be expanded as well. We
discuss both these topologies in further detail in 10.2, “Stack topology” on
page 238.
10.1.1 IBM g-series stacking technology

A stack is a set of interconnected discrete units that operate as a single system.
The IBM B50G product has the following features:
򐂰 Support of up to eight discrete units in a single stack
򐂰 Management by a single IP address
򐂰 Easy and secure stack setup
򐂰 Flexible stacking ports
򐂰 Support for either Linear or Ring stack topology
򐂰 Active Controller, Backup Controller, and Member units in a stack
򐂰 Active Controller to manage the entire stack
򐂰 Active Controller download of software images to all member units
򐂰 Active Controller maintenance of information base for all member units
򐂰 Backup Controller to provide redundancy
򐂰 Hardware packet switching between ports in a Stack Unit
򐂰 Seamless protocol support
10.1.2 IBM g-series models

There are two models in the IBM g-series range, however, only one currently
supports stacking, the B50G. This product comes with 2 x 10 GbE CX4 stacking
ports:
򐂰 IBM Ethernet Switch B50G (4002-G5A)
򐂰 48x 1 GbE (RJ45 including 4-port combination SFP) plus 2x 10 GbE CX4
ports for stacking
Note: Stacking is only available in Layer 2 Switch mode. It is not available in

Layer 3 Routing mode.
10.1.3 IBM g-series stacking terminology
Here we describe some of the terminology associated with the IBM g-series:
Active Controller This is the unit that manages the stack and configures
all the units as a single system.
Future Active Controller If a unit is configured with the highest stack priority
(see 10.2.1, “Stack priority” on page 238), that unit will
become the Active Controller after the next reload. To
prevent disruption to the stack, the Active Controller
does not change due to a configuration change.
Standby Controller The unit in the stack with the second highest priority
after the Active Controller. This unit will become the
Active Controller in the event of a failure with the
configured Active Controller.
Stack Member A unit within the stack that is neither the Active
Controller nor the Standby Controller
Stack Unit A unit functioning within the stack, including the Active
Controller and Standby Controller.
Upstream Stack Unit The Upstream Stack Unit is connected to the first
stacking port on the Active Controller. The first port is
the left hand port as you face the stacking ports as
shown in Figure 10-1
Downstream Stack Unit The Downstream Stack Unit is connected to the
second stacking port on the Active Controller. The
second port is the right hand port as you face the
stacking ports, as shown in Figure 10-1.
Figure 10-1 g-series B50G front view
Chapter 10. Stacking the g-series IBM Ethernet Switch 237

10.2 Stack topology
The B50G IBM Ethernet Switch product can be stacked in either a Linear or Ring
topology. This section will describe the logical configuration of both options and
the redundancy benefits of a Ring topology.
Note: Because either of the 10 GbE ports can be used in any order for
stacking ports, your cable configuration might differ from those shown in the
figures.
The B50G ships with a single 0.5 meter CX4 cable for stacking. To deploy a ring
topology another CX4 of sufficient length (usually either 1 or 3 metres depending
on the number of Stack Units) must be ordered.
Note: Multiple inter-switch stacking connections are permitted within a stack.

If required, these switches can be physically distant. However, the total length
of the combined stacking cables must not exceed 1000 meters (3,281 feet).
10.2.1 Stack priority

Each Stack Unit is given a Stack ID at boot up, the ID ranges from 1 through to 8.
The Active Controller is selected as the unit with the lowest stack ID. This Active
Controller propagates the configuration throughout the stack.
Each stack also has a Standby Controller, this unit takes over as the Active
Controller in the event of a failure with the initial Active Controller. Through
configuration a Standby Preference can be set, again from 1 through to 8. If the
Standby Preference is not set the Standby Controller is the unit with the lowest
MAC address.
10.2.2 Linear Stack
A Linear Stack is created by connecting each switch only to the switch above or
below itself as shown in Figure 10-2. The top and bottom units in the stack only
use a single stack connection. This other connection can be used as a data port.
Figure 10-2 Linear Stack Connectivity
Redundancy in a Linear Stack: In the Linear Stack, there are three main
possible failure options:
򐂰 If the Active Controller fails, the Secondary Controller takes over as the Active
Controller. All Stack Units that can communicate with the new Active
Controller continue operation with minimal interruption.
򐂰 If a Stack Member fails between the Active Controller and the Secondary
Controller, then the Secondary Controller becomes the Active Controller for
the Stack Units it can communicate with. While the Active Controller remains
in control of the Stack Units that it can still communicate with.
򐂰 If a Stack Member fails in such a way that the Master and Secondary
Controllers are still able to communicate. Any Stack Unit that can
communicate with the Active Controller will continue operations without
interruption. Whereas all other Stack Units will become non-functioning and
require manual intervention to regain network and stack connectivity.

10.2.3 Ring Stack
A Ring Stack is created by connecting each Stack Unit to the unit above and
below it. The top most Stack Unit is then connected to the bottom most Stack
Unit creating a ring as shown in Figure 10-3.
Figure 10-3 Ring Stack Connectivity
Redundancy in a Ring Stack: In a Ring Stack, there are also three possible
failure options, each failure scenario results in the stack operating in a Linear
Stack mode until the failed Stack unit is repaired.
򐂰 If the Active Controller fails the Secondary Controller takes over as the Active
Controller. Due to the ring all Stack Units can still communicate with the new
Active Controller and continue operation with minimal interruption. A new
Secondary controller is elected. In case the old Active Controller recovers, it
will not take over the Active Controller function until the next stack reset.
򐂰 If the Secondary Controller fails the stack elects a new Secondary Controller.
All Stack Units can still communicate with the Active Controller and continue
operation without interruption.
򐂰 If a Stack Member fails, all other Stack Units can still communicate with the
Active Controller and continue operation without interruption.
10.2.4 Stacking ports
The B50G does not have dedicated stacking ports. Therefore, either of the
10 GbE ports can be utilized as stacking ports or data ports. The first unit in the
stack will have both 10 GbE ports automatically configured to stacking ports,
however this can be changed manually. In the Linear Stack topology, doing this
allows two 10 GbE ports in the stack to be used as data ports.
10.3 Stack Unit roles and elections

In the topics that follow, we discuss the Stack Unit roles and elections.
10.3.1 Active Controller

The Active Controller maintains the running configuration and stores the saved
configuration file for the entire Stack. The Active Controller also redirects the
console port for all other Stack Members to its own console port. For example, if
you connected to the console port of a Stack Member that was not the Active
Controller, the console port is redirected to the Active Controller’s console port
allowing you to configure the stack as if you were connected directly to the Active
Controller’s console port.
The Active Controller can reset other Stack Members as required. However, it will
not reset itself. If you require the Active Controller to reset, perhaps to force
another Stack Member to become the Active Controller, you must manually reset
the Active Controller.
If the Active Controller fails, the Standby Controller waits thirty (30) seconds
before taking over as the Active Controller. It will then reset the entire stack,
including itself, before resuming operation as the Active Controller.
Note: If the Active Controller loses connectivity with the stack, it might take up
to twenty (20) seconds for the connection to age out, then the Standby
Controller will wait another thirty (30) seconds before taking over control of the
stack. This can result in a delay up to fifty (50) seconds before the Standby
Controller takes over control of the stack.
The Active Controller copies its startup configuration with the rest of the Stack
Members, including the Standby Controller.

10.3.2 Standby Controller
The Standby Controller is elected by the stack, based on the configuration, in
order of preference the highest Stack Priority, or Lowest Stack ID. If they are not
configured the Standby is elected based on the lowest MAC address.
The configuration of the Standby Controller is copied from the Active Controller
on each reboot.
10.3.3 Stack Member

All other units are defined as Stack Members. They can only be configured from
the Active Controller. If a Stack Member is removed from a Stack, the
configuration options will be limited until the stack configuration data is cleared.
This will return the Stack Member to a standard switch state.
10.4 Secure Stack

The IBM B50G can be configured in Secure Stack mode. This mode adds
security and simplicity to the stack configuration. The Secure Stack configuration
adds three features to restrict stack membership to authorized units only:
򐂰 MD-5 based port authentication to verify the packets being received are from
a genuine IBM B50G (or compatible) Stack Unit.
򐂰 Password protection for adding a new Stack Unit to the stack.
򐂰 Other units in the network are restricted from forcing a secure Stack Unit from
joining a stack with the implementation of the stack disable command.
Secure Stack also allows units to be reallocated Stack IDs as well as allowing the
addition, removal or replacement of a Stack Unit.
The Stack Unit that is issued the stack secure-setup command becomes the
Active Controller with a priority of 128. All other units joining the stack are
assigned a stack priority below 128. In the case where the secure setup
discovers another Stack Unit with a Stack Priority of 128 or higher, it reconfigures
the Stack Priority on this unit to 118.
The stack disable command is typically used on a standalone unit, not

intended for use as a Layer 2 Stack Unit. This command will disable all stack
functions on the unit and will not allow the unit to attempt to join or be forced to
join another stack.
For example, a connection from an Access Layer Linear Stack group can use the
end unit’s 10 GbE CX4 ports for uplinks to the Distribution Layer, while these
Distribution units might also be B50G products, they are not intended to be part
of the current stack, configuring stack disable on the distribution layer will
prevent these units from joining the stack.
This secure setup polls upstream and downstream units using a proprietary,
authenticated, discovery protocol to identify units connected to this Active
Controller. The stack administrator is then presented with a proposed
configuration, including the Stack Units discovered, and asked to confirm the
topology and then the proposed configuration and Secure Stack Membership.
10.5 Adding, replacing, or removing units in a stack

The ability to add, replace, or remove units to the stack enhances the ability for
the network to change with the business. As more network ports are required,
the stack can be increased (up to a total of eight (8) Stack Units). If network ports
are no longer required at one site, but are required at another site, it might be
preferable to remove a unit from a stack and redeploy it at the site requiring more
network ports. Although it is not strictly required to connect the stack ports of the
new unit into the stack before applying power, it is best to do so. Installing a
powered up unit into the stack might have unpredictable results causing service
interruption while the configuration is restored.
Best practice: Cable up the stack ports before applying power to a new or
replacement Stack Unit.
10.5.1 Adding a new unit to the stack

IBM B50G products can be added to. the stack by simply following these steps:
1. Identify the Active Controller
2. Connect the stack ports of the new unit to the stack ports in the required
topology.
3. Power up the new unit.
4. Run any required setup configuration commands on the Active Controller.
5. Save the new configuration on the Active Controller.

The new unit can be added either through running the stack secure-setup
command from the Active Controller, or through a static configuration. The
implementers need to be aware of three potential configuration options when
installing a new unit into the stack using the static configuration option:
򐂰 If the Active Controller does not have configuration information for the new
unit, the Active Controller will dynamically learn the configuration. This
configuration will be lost on reboot or if the unit leaves the stack. In order to
retain the configuration the stack administrator must run the write memory
command on the Active Controller.
򐂰 If the Active Controller does have configuration information for the new unit
and the base module information of the new unit matches the configuration
information, no more needs to be done. This situation can occur when
removing a unit for a field upgrade and replacing it, or when one unit is
replaced with another identical unit.
򐂰 If the Active Controller does have configuration information for the new unit, a
configuration mismatch will occur if does not match the base module of the
new unit. For example if the new unit is running L3 routing code instead of L2
switch code.
10.5.2 Removing a Stack Unit from the stack

A Stack Unit can be removed from the stack by simply disconnecting the cables
from the stacking ports. If this unit is not going to be replaced then the stack
administrator must ensure the stack topology is maintained. If the configuration
for the removed Stack Unit was learned dynamically and not saved, the Active
Controller will lose the configuration for this unit.
If the unit was powered up when the cables were removed it will maintain its
stack configuration and will need to have the stack information cleared before
being redeployed as a separate unit or into another stack.
Note: Unless the removed Stack Unit is intended to be installed in the same
position in the same stack, it is advisable to clear the stack configuration.
10.5.3 Removing the Active Controller from the stack
If the Active Controller is removed from the stack, the Standby Controller will wait
30 seconds before taking over as the Active Controller within the remaining
stack. The removed unit will be able to function as a stack of one unit even
without clearing the stack configuration. While it is not required to clear the stack
configuration in this instance, it is advisable to clear the stack configuration to
ensure a deterministic operation of the unit. After being cleared, the stack
enable command can be reentered to allow the unit to create a new stack in the
future.
10.5.4 Moving a Stack Unit to another stack

If a Stack Unit is moved from one stack to another, without clearing the stack
configuration, it will insert into the new stack retaining its Stack ID provided no
other Stack Unit has been assigned that ID. This can result in the new Stack ID
not being sequential. To avoid this, it is good practice to clear any previous stack
configuration prior to inserting a new unit into the stack.
10.5.5 Replacing a Single Stack Unit in a stack

If a single Stack Unit needs to be replaced in the stack, ensure that the new unit
has the same configuration as the old unit. Then, after removing the old unit,
install the new unit. The Active Controller will add the new Stack Unit and it will
automatically become active in the stack, maintaining the original stack topology.
10.5.6 Replacing multiple Stack Units

If more than one Stack Unit is to be replaced, it is important to remember that the
lowest Stack ID will be assigned first. For example, if two Stack Units with ID 7
and ID 8 are to be replaced, then when the two units are removed, if the first new
unit installed is physically located where Stack ID 8 was intended to be, it is
assigned Stack ID 7 because it was the first new Stack Unit connected.
To avoid nonsequential Stack IDs the simplest method is to replace one unit at a
time. Also, Secure Setup can be used to reassign stack IDs in a sequential order.
Note: In a Linear Stack topology, adding or removing a Stack Unit in any other
position except the top or bottom of the stack will cause some Stack Units to
lose connectivity with the Active Controller, causing a reboot on these Stack
Units. In a Ring Stack topology, adding or removing a Stack Unit must not
interrupt any of the other Stack Units.

10.6 Merging stacks
It is possible to combine multiple smaller stacks together, provided that the total
number of Stack Units is eight (8) or less. For example, if two business units,
previously on separate stacks of four (4) units each, were relocated around the
floor, it might be preferable to create a single stack of eight (8) units and rely on
other technologies, such as VLANs to maintain logical separation.
Before merging stacks, it is important to ensure none of the stacking ports have
been configured as data ports. Recall in a Linear Stack it is possible to
reconfigure the unused end stacking ports as data ports. Secure Stack does not
work across stack boundaries, so it is not possible to use the Secure Stack
process to merge stacks.
When stacks are merged, the existing Active Controllers undergo a reelection
process. The winner of this process retains both its configuration and the Stack
IDs of its previous Stack Members. The new Stack Members only retain their
Stack IDs if their existing Stack ID does not conflict, otherwise a new Stack ID is
assigned to the new Stack Member. This might result in the Stack IDs no longer
being sequential. However, now that all the units are in a single Stack, the Secure
Stack process can be utilized to reassign Stack IDs and ensure the stack
members are the expected units.
Following are some other examples of when merging stacks might be required:
򐂰 If a Linear Stack had a failure in such a way that both parts of the stack
remained active, upon repairing the failure the two stacks can be merged
again. In this case, the new stack has the same configuration as the original
stack because all the Stack Units had unique Stack IDs.
򐂰 If two Linear Stacks are connected it is important to ensure the unused end
units are configured so that both ports are stacking ports
10.7 Best practices in a stack
In the topics that follow we describe some best practices as they relate to stacks.
10.7.1 Stack topology

IBM advises all stack configurations to utilize the Ring Stack configuration only.
This allows the stack to fail back to a Linear Stack that must only be maintained
until the fault can be rectified and a Ring Stack configuration restored. Ring
Topology also allows Stack Units to be inserted anywhere within the Stack
Topology without resulting in an outage due to units being unable to
communicate with the Active Controller and therefore resetting.
10.7.2 Stack Unit changes

Always clear the stack configuration when removing, installing or replacing a
Stack Unit. This will avoid various configuration issues such as nonsequential
stack IDs or incorrect Stack Unit configurations.
Similarly, replace one unit at a time to avoid unexpected configurations in case

multiple Stack Units are replaced in an unexpected order.

11
Chapter 11. Network design for the data

center
In this chapter we consider the various tiers of the data center network (DCN)
and identify IBM Ethernet products best suited to each tier. We propose a series
of architectural decisions for each tier to assist the architect in selecting the best
product for their requirements.
We focus on the current industry standard DCN design. However, the

architectural decisions that we discuss can also assist architects in designing
new DCN architectures for the future.

11.1 Key architectural decisions
Here are the key decisions to be taken into consideration:
Design decisions These decisions impact the network from an
architectural design method perspective. After these
decisions are made, they have implications on the
other decisions as well.
Technology decisions These decisions are more detailed because they refer
to specific methods used to match a certain design.
Security decisions These decisions are focused on access to and the
protection of the networking infrastructure, and they
depend on the overall architecture design.
Management decisions These decisions refer to the way the entire networking
infrastructure is managed, using the network both as
the medium with which it is managed and as the target
of the management process.
11.2 Data center network

As discussed in Chapter 4, “Market segments addressed by the IBM Ethernet
products” on page 125 the network design in the data center must follow a
multi-tier model.
Because the data center houses servers that host the corporate data and
applications, the data center network (DCN) design must continue operations in
the event of a single device failure. In some instances there might be more
difficult requirements such as continued operation in the event of multiple device
failure, and in such cases the architecture can change, but the device selection
criteria detailed in the topics that follow will still remain valid.
11.2.1 Hierarchical design

We discuss the hierarchical design in the following topics.
Type
Design decision
Problem statement and questions
What hierarchical structure will best support your DCN? Does it consider future
directions in dynamic infrastructure or cloud computing environments?
Assumptions
We make the following assumptions:
򐂰 Single tier network architectures are not suited for the DCN style network.
򐂰 Tiers can be either physical or logical.
Alternatives
Here we discuss the various alternatives:
Three tier design This design is the traditional design and consists of
core, distribution, and access layers in the network.
The design provides scalability for fast growing
networks; each tier assigns specific roles to each
device in the network infrastructure.
Collapsed tier design This design is also called a two tier model and
requires less equipment and less connections. The
main drawback is that it is not as scalable, however.
The current DCN network offerings have exceptional
port density compared to network devices of
previous generations. Combined with server
virtualization, the collapsed tier design is becoming
more prevalent, and it is the one we examine in this
chapter.
Differentiated distributionThis design is a three tier model that further
delineates areas of responsibility within the
distribution layer. The distribution layer devices can
be separated into specific functions for user access,
data center, and WAN.
Considerations
Observe the following considerations:
򐂰 The data center cable design might constrain the hierarchical design.
򐂰 The data center rack layout might constrain the hierarchical design.
򐂰 Virtualization of network components (see Chapter 5, “IBM Ethernet in the
green data center” on page 133) might challenge traditional hierarchical
designs.
򐂰 The deployment of 10 GbE uplinks from the access layer might impact the
overall architecture.
Chapter 11. Network design for the data center 251

Decision
In this book we focus on the collapsed tier architecture. We assume that network
architects have the necessary skills to determine whether this differentiated
distribution architecture is appropriate for their requirements. If not, the network
architects can utilize the data in this book to assist their decision making process.
Figure 11-1 shows the Data Center Network architecture. Further differentiation
is enabled on the connectivity tier with WAN, LAN, and multiple ISP connectivity
modules, which are discussed in this section also.
Figure 11-1 Data Center Network
11.3 DCN server access tier

Although Figure 11-1 depicts separate systems for the servers, keep in mind that
it is showing a logical view; these servers can be virtualized within a single
physical device.
11.3.1 Access switch placement

We discuss access switch placement in the following topics.
Type
Design decision

Where must the server access switch be placed? What media must be used to
connect servers to the server access switches? Can I have a store of cables, or
fibres, to connect new servers to an access switch (or replace faulty cables)?
Assumptions
򐂰 Servers can be stand-alone or rack based servers.
򐂰 Media can be copper or fiber.
򐂰 Speeds can range from 10 Mbps to 10 Gbps, either over time or in the current
environment.
Alternatives
Top of Rack (ToR) This option connects the servers within a rack to the
access switch in the same rack, usually at the top of the
rack. This allows standard cable lengths to be
pre-purchased for the data center for server connectivity.
The server access switches can be connected to the
distribution tier by high-speed uplinks, reducing the
number of connections exiting the rack.
End of Row (EoR) This option connects the servers within a rack to a large
access switch at the end of the row of racks, typically a
high port density modular chassis. The chassis can either
be cabled to patch panels at the top of each server rack or
directly to each NIC in a server. While patch panels can
take additional space in the server rack, the benefits of
doing so include better cable management as well as a
decrease in devices needing to be managed.
Middle of Row (MoR) This option is similar to EoR, except that being in the
middle row, the maximum cable distance between the
server and access switch is less. This is more desirable
when doing direct cable runs from a high port count
chassis directly to the servers.

Considerations
򐂰 The data center cable design might dictate the placement of the server
access switch.
򐂰 The data center rack layout might constrain the placement of the server
access switch.
򐂰 Determine performance considerations such as the level of over-subscription
that is acceptable for server connectivity.
򐂰 Determine availability required to meet Service Level Agreements both in
software features as well as hardware components such as redundant
management, power supplies, fans, and so on.
򐂰 Other data center controls might constrain the placement of the server access
switch; for example, if only server teams were allowed to open server racks, or
only network teams were allowed to open network equipment racks.
Decision
A Top of Rack (ToR) switch at the access layer is more traditional and gives the
flexibility to use fixed-length network cables to connect to servers within the rack.
The capability to have minimal high speed uplinks (10 GbE) to a distribution tier
also helps with cable management. A pre-tested rack solution containing servers,
storage, and a ToR switch might also be easier to deploy.
However, End of Row (EoR) and Middle of Row (MoR) solutions that utilize
higher port density, modular devices typically have higher levels of availability
and performance. Most modular chassis have N+M redundancy for
management, power supplies, and fans. A backplane provides connectivity to all
ports within the chassis, providing 1:1 (1 GbE) full mesh connectivity and either
1:1 or 4:1 (10 GbE) full mesh connectivity depending on options selected.
Connecting up to 7 times the number of devices to a single chassis minimizes the
number of devices that need to be managed. A patch panel can also be used at
the top of each server rack to help with cabling.
With virtualized servers becoming more and more prevalent and increasingly
saturating network connections, EoR and MoR solutions might merit serious
consideration, though many network designers might be more familiar with a ToR
solution.
11.3.2 Server access switch selection
Here we discuss server access switch selection using Top of Rack (ToR).
Type
Technology decision

How can I achieve the best port density in ToR? How much space do I need to
find for the server access switch in ToR? Will congestion be a major player in my
ToR access switch? What is the power consumption of the options?
Assumptions
򐂰 Uplinks to the distribution tier are required to be scalable to a greater speed
than the links to their servers.
򐂰 There is typically some over-subscription when connecting the access layer to
the distribution layer.
򐂰 The architect is able to determine prior to final product selection, the type of
media required to connect the servers to the server access layer.
Alternatives
Table 11-1 shows the alternatives and features between the DCN ToR products.
The primary difference between the two c-series switches is the available media.
The c-series (C) models have primarily copper, 10/100/1000 MbE RJ45 ports for
the servers. The c-series (F) models have primarily 100/1000 MbE (hybrid fiber)
SFP ports.
Some models have combination ports which are shared ports between the first
four 10/100/1000 MbE RJ45 ports and 100/1000 MbE SFP ports found on the
devices. These allow network designers the flexibility to use optical transceivers
for long distance connectivity.
Table 11-1 A comparison of DCN ToR deployment.

x-series c-series (C) c-series (F) B48G & B50G
B24X B24C & B48C & B24C & B48C &
B50C B50C
Number of 4 24 or 48 B24C has 4 48

10/100/1000 combination ports
MbE (RJ45)

x-series c-series (C) c-series (F) B48G & B50G
B24X B24C & B48C & B24C & B48C &
B50C B50C
Number of 0 B24C has 4 24 or 48 0

100/1000 MbE combination ports
(SFP)
Number of 10 24 0 0 0
GbE (SFP+)
Number of 10 0 B50C has 2; B50C has 2; B48G has optional

GbE (XFP) optional 2-port optional 2-port 2-port module;
module for B24C module for B24C B50G has 2-port
10 GbE CX4
module built-in
Size 1 RU 1 RU 1 RU 1.5 RU
Network Buffer 2 MB 192 MB 192 MB 32 MB
Primary Power 1 x 300 W 1 x 500 W 1 x 500 W 1 x 600 W
Redundant 1 x 300 W 1 x 500 W 1 x 500 W 1 x 600 W

Power
Considerations
򐂰 Redundant links or dual homing systems can be achieved by installing a
second ToR unit.
򐂰 To increase port utilization cables can be run from an adjacent rack, this will
still follow the cables of a known length rule.
򐂰 In an environment where traffic coming from the distribution to the access
layer is expected to be bursty, network buffers on the server access device
become more important. This is because it might be possible for the 10 GbE
links to the distribution layer to burst near maximum capacity. In this scenario
the traffic needs to be buffered on the access switch for delivery to the slower
server connections. Without sufficient network buffers some data might be
dropped due to traffic congestion. Increasing uplink bandwidth can mitigate
some of these issues.
Decision
The IBM Ethernet B50C products have more available network buffer capacity to
allow bursty traffic from a 10 GbE uplink to be delivered to the slower
10/100/1000 Mbps RJ45, or the 100/1000 Mbps SFP server connections, which
is useful because the over-subscription ratio can be as high as 48:20 (2.4:1).
The B50C also requires less rack space (1RU each) and less power, making a
good choice for a ToR server access switch. Additional software features can
also help build advanced data centers, including support for Multi-VRF to
virtualize routing tables and MPLS/VPLS capabilities to directly extend remote
Layer 2 networks to the server rack.
For virtualized server deployment, a high capacity network link might be required
to support a greater number of applications running on a single physical server.
In this scenario, the x-series will enable high-bandwidth, 10 GbE server
connectivity. Up to eight links can be aggregated using LACP to provide a 80
Gbps trunk from the switch to the distribution layer.
11.4 DCN distribution tier

The distribution tier is sometimes referred to as aggregation layer. These terms
can be used interchangeably.
11.4.1 Functions of the distribution tier

The network devices in this tier connect to and act as a link “aggregator” for the
switches in the access tier and “distribute” traffic among these access switches.
In addition, the distribution tier connects to the core of the network.
Type
In a traditional design, the distribution tier connects only to other switches and
provides inter-connectivity between access tier switches and uplinks to the core
or WAN Edge of the network to enable connectivity out of the data center. This
can be seen in Figure 11-1 on page 252, where the distribution tier is also the
core tier. This can be done when devices used meet the requirements of the
core, such as Layer 3 routing and any advanced services needed such as MPLS,
VPLS, and VRF are satisfied by the devices chosen. The main decision on
whether to separate the distribution and core layers is whether additional device
scalability is required.
In a collapsed design, the distribution tier can connect directly to servers, making
it an access/distribution tier. Typically higher density, modular chassis are
deployed. This architectural decision can simplify management by reducing the
number of devices required while increasing the availability and resiliency of the
network.

A high-level diagram can be seen in Figure 11-2.
Figure 11-2 Collapsed access/distribution tier

The target DCN wishes to reduce cost and simplify network management. What
type of network infrastructure in case the network architect deploys?
Assumptions
򐂰 The network architect has the required skills and experience to select the best
option for the agreed business requirements.
򐂰 The DCN deployment is for a commercial business and requires commercial
grade equipment and manageability.
Alternatives
򐂰 Collapse the access and distribution tiers.
򐂰 Maintain physical tiers with lower cost devices.
Considerations
򐂰 Collapsed tiers are able to deliver the business requirements in a
standardized way for small environments.
򐂰 Lower cost devices, probably will not have the commercial manageability and
stability that a commercial business requires.
򐂰 Layer 3 routing might be required.
Decision
When the network architect is faced with this issue, the best solution is to
investigate collapsing tiers and maintain business grade equipment in the
commercial DCN. However, network architects might be more comfortable with
more traditional designs implementing a separate access layer with lower-cost
ToR switches connecting to the Distribution layer.
11.4.2 Access/distribution device selection

Within the IBM Ethernet product range there are high-density modular chassis
available: m-series, r-series, and s-series, with the following capabilities:
򐂰 m-series: Offers advanced multi-service features including Layer 3 routing,
MPLS, and VRF capabilities
򐂰 r-series: Offers the highest port density and Layer 3 routing
򐂰 s-series: Best price ratio for deployment into smaller DCN environments

Must the distribution tier be configured as a Layer 2 device or a Layer 3 device?
What other advanced functionality is required?
Assumptions
򐂰 The network architect has captured the customer’s business requirements
򐂰 The network architect has captured the customer’s logical network
preferences.
򐂰 The network architect has confirmed the application architect’s requirements.

Considerations
򐂰 If the server access tier is operating at Layer 2 and frequent routing between
VLANs is required, then the network architect can consider using Layer 3 at
the distribution tier.
򐂰 If the server access tier is operating at Layer 2 and the environment requires
the use of ACLs to provided limited security functions; it is always best to
place ACLs as close to the devices as possible, in this case, operating the
distribution tier at Layer 3 with ACLs is the closest device.
򐂰 In the case where your DCN is hosting multiple client DCNs, as might be the
case in a collocation facility. Then your access tier might actually be
connectivity to a client’s core layer. It might be preferable to provide routing
and ACLs at this point, so that your distribution tier becomes a Layer 3 tier.
Decision
There is no wrong answer here; the decision of running Layer 2 or Layer 3 at the
distribution tier is dependent on the rest of the network environment, the intended
communications between VLANs, and the security (ACL) requirements of the
DCN.
The s-series provides a cost effective distribution / aggregation layer allowing

multiple high speed uplinks from the server access, as well as the option to use
Link Aggregation (LAG) if required for the links up to the core.
11.5 Other tiers

In this section we discuss the DCN core tier and the DCN connectivity tier.
11.5.1 DCN core tier

We discuss the DCN core tier in the following topics.
Type
Technology decision
How can the core device provide maximum uptime (minimal disruptions to
service)? Will the core device be capable of peering with many BGP peers? How
can the core push MPLS deeper into the DCN? How can the core device connect
to my carrier’s POS? What products will support my IPv6 requirements, even if I
don’t run IPv6 today?
Assumptions
򐂰 The core device will run BGP with many peers.
򐂰 MPLS will be deployed deeper into the network, rather than just the
connectivity tier.
򐂰 The core can connect directly to a Packet Over Sonet (POS) circuit.
򐂰 The core will run L3 routing protocols.
Alternatives
The m-series devices must be placed at the DCN core tier. Table 11-2 on
page 263 shows a short comparison of the two products highlighting some of the
considerations the core tier might have.
Considerations
򐂰 The m-series Ethernet products have supported hitless upgrades for a
number of releases, this provides a mature hitless upgrade solution.
򐂰 If your site is housing multiple functions, or clients, that have a need for
separate routing and forwarding tables, VRF will provide this separation,.
򐂰 If there is a requirement to support IPv6, or perhaps deploy it in the near
future, the m-series supports dual stack (running IPv4 as a separate stack to
IPv6) out of the box.
Decision
The m-series Ethernet Routers provide proven support for Layer 2 and Layer 3
hitless upgrades, removing the need for downtime during upgrades. In addition
the m-series provides greater support for IP routing protocols with sufficient
memory to maintain 1 million BGP routes in the BGP RIB as well as 512,000
routes in the IPv4 route protocols FIB. As IPv6 becomes a requirement in more
data centers, the m-series provides dual stack support for both IPv6 and IPv4,
this avoids future downtime to upgrade blades to IPv6 capabilities if they are
required.

11.5.2 DCN connectivity tier
Providing connectivity to the edge of the DCN has three primary options,
Internet, WAN, or MAN. All of these options are typically provided by a carrier
who provides either Ethernet or OC type physical connectivity.
Internet connectivity through one or more ISPs, brings up many unique issues,
especially in the security area. The IBM Ethernet products are not security
devices (for example, firewalls, IDS/IPS) the network designer must consider how
to provide security for his or her DCN requirements. This section covers only the
Ethernet connectivity considerations.
The WAN and MAN can be considered equal, because in these cases the client
owns the connecting infrastructure at the other end of the carrier circuit. There
might be other DCNs, Enterprise networks or both at the remote side of these
connections.
Type
Technology decision

How can my DCN peer with my ISP’s router to exchange BPG routes? My carrier
is providing me with either an OC12 or OC48 cable, what product do I use?
Assumptions
򐂰 Full Internet security will be provided by appropriate security devices such as
firewalls, IDS/IPS.
򐂰 The DCN has a requirement to provided Internet connectivity.
򐂰 The DCN can have more than one (1) ISP providing connectivity.
Alternatives
On the connectivity edge, the IBM Ethernet products provide the m-series ideal
candidates, Table 11-2 shows some of the design considerations for connectivity
to the Internet.
Table 11-2 Comparison of c-series and m-series for connectivity edge devices
Feature B04M B08M / B16M
BGP Route capability Supported out of the box supported out of the box
Maximum BGP Peers 256 512
BGP Route capacity 2M routes 2M routes
ACL Support Support for ingress and Support for ingress and
egress ACLs. egress ACLs.
Support for Ethernet Yes, refer to Chapter 2, Yes, refer to Chapter 2,

interfaces “Product introduction” on “Product introduction” on
page 31 for more details page 31 for more details
Support for OC12 or OC48 Yes, refer to Chapter 2, Yes, refer to Chapter 2,
interfaces “Product introduction” on “Product introduction” on
page 31 for more details page 31 for more details
Support for VRF Supported out of the box Supported out of the box
Considerations
򐂰 Determine how many BGP routes your DCN needs to see from the Internet.
򐂰 ACLs are important to ensure that BOGON lists and other basic security
functions can be addressed as close as possible to the source. Remember
that full security, such as firewalls and IDS/IPS devices, must be provided by
your DCN security infrastructure.
򐂰 Determine how many BGP peers you will have in your WAN or MAN.
Decision
The IBM m-series Ethernet product provides scalable interface connectivity and
allows the use of ACLs to provide security or route filtering as well as the option
to utilize VRF’s for route separation. The m-series supports IPv6 to future proof
your investment as well as large BGP route table capabilities.

12
Chapter 12. Network design for the

enterprise
In this chapter we consider the various tiers in the enterprise and identify IBM
Ethernet products best suited to each tier. We propose a series of architectural
decisions for each tier, assisting the architect in selecting the best product for
their customer requirements.
We focus on the current industry standards in enterprise network design.

However, the architectural decisions can also assist architects in designing new
enterprise networks of the future.

12.1 Enterprise network
The enterprise is a growing and ever changing environment. Between large
campuses with multiple buildings, to regional offices as well as small sites. The
basic requirement is always to connect end devices such as user workstations,
wireless access points, printers, and Voice over IP phones to a network in order
to provide access to the services within the data center.
As previously mentioned in this book, we will treat the data center as the main
hardened facility housing the systems and services that require high availability
and performance characteristics. The enterprise accesses the applications
housed in the data center through the links such as the WAN and LAN as shown
in Figure 12-1.
Figure 12-1 Enterprise component interaction with the data center
Focusing on the enterprise components and considering the variability within the
enterprise, we find there are a number of different options. Most of them can be
solutioned in similar ways or by adding minor components, and are discussed in
the following sections.
Figure 12-2 depicts an example of an Enterprise Campus with multiple buildings.
Figure 12-2 Enterprise campus high-level with multiple buildings
Figure 12-3 depicts a high-level diagram of how various end-devices interface

with the network infrastructure.
Figure 12-3 Enterprise Campus network device connectivity
Chapter 12. Network design for the enterprise 267

12.2 Enterprise site model
Enterprise sites come in many different sizes and have many different purposes.
However, basic network design really does not change, because we still work to
two to three tiers, access, distribution and core, depending on how scalable the
network needs to be. Some of these tiers can be collapsed together depending
on the specific needs of the site.
The type of work being done at the business might impact network design
decisions. For example, a site with mainly programmers might require a more
high-performance network to support code sharing and development, whereas a
call center might need lots of ports to support a large number of people, but less
bandwidth because most of the traffic might be just Voice over IP telephony.
Some tiers can be shared with the DCN; the core can be shared in a
differentiated distribution model, or for a smaller site with specific server access
needs, the network architect might determine that sharing the distribution device
is acceptable.
12.2.1 Enterprise site design

We assume that the network architect has the skills to determine which tiers
need to be physical and which can be logical in order to meet the business
requirements.
Type
Design decision

What model will best suit an enterprise site with users who are located on
different floors of a building? What model will best suit an enterprise site with
users located in different buildings on a campus?
Assumptions
򐂰 Single tier network architectures are not suited for the corporate enterprise
style network.
򐂰 Tiers can be either physical or logical.
Alternatives
Three tier design As discussed in Chapter 4, “Market segments
addressed by the IBM Ethernet products” on
page 125, this design is traditional and consists of
core, distribution, and access layers in the network.
The design provides scalability for fast growing
networks, each tier assigns specific roles to each
device in the network infrastructure.
Collapsed backbone This design is also called a two tier model, this can
be used in a smaller enterprise site. It requires less
equipment and less connections. The main drawback
is this model is not as scalable as a three tier design.
Differentiated distributionThis is a three-tier design that further delineates
areas of responsibility within the distribution layer.
The distribution layer devices can be separated into
specific functions for user access, perhaps by
department, for example, the finance department
can be physically separated from the call center staff.
Considerations
򐂰 The enterprise site cable design might constrain the hierarchical design.
򐂰 The enterprise site layout might constrain the hierarchical design.
򐂰 Virtualization of network components (see Chapter 5, “IBM Ethernet in the
green data center” on page 133) might challenge traditional hierarchical
designs.
򐂰 The deployment of 10 GbE uplinks from the user access layer might impact
the overall architecture.
򐂰 The distance between floors, or buildings, might impact the overall
architecture.
򐂰 Requirements for Power over Ethernet have to be taken into account.
Decision
Regardless of how they are created, most site designs follow a tiered model for
reliability and scalability reasons. While it is possible to collapse some tiers into a
single physical device, considerations for future expansion must be taken into
account. Most importantly, understand what your client needs are; what is their
business planning for this site? What design decisions that you make today will
impact the ability for the client to meet their business goals?

12.2.2 Enterprise site chassis versus stacking considerations
The IBM Ethernet switch product range comes in to styles, the s-series which is
chassis based and the g-series which is capable of being connected in a stack.
The decision is to determine which style product is best for which enterprise use.
Type
Management decision

How can the IBM products help simplify network management issues? Must my
enterprise consider stackable devices or only chassis devices?
Assumptions
򐂰 The network is being managed and monitored on a per device basis.
򐂰 The number of users at the site is expected to grow over time.
Alternatives
򐂰 Discrete devices can be managed individually and might show benefits at
certain tiers.
򐂰 Stacked devices include the IBM B50G product from the g-series which can
be connected in a stack allowing up to eight (8) units to be managed as a
single device through a single management IP address. This allows the
flexibility for businesses to pay as they grow and need more switches while
simplifying deployment.
򐂰 Chassis devices, provide scalability through the addition of modules, require
less power outlets, allow for hot swapping of modules, power supplies and fan
units and provide the ability for hitless upgrades which all allow greater
up-time.
Considerations
򐂰 Site availability plays a main factor in deciding between stackable and chassis
based products. Chassis products have the ability for hitless upgrades.
򐂰 Stackable solutions provide management and monitoring through a single IP
address for the entire stack. This can both simplify management and provide
scalability for installations that can accept outages for upgrades and might
increase user base.
򐂰 It is important to decide whether stacking is an appropriate option before
purchasing a g-series switch. For example, the B48G cannot be upgraded to
the stackable model, whereas the B50G is ready to be stacked or operate
independently.
Decision
Where greater up-time is required, for example, consider the chassis based
products and understand the hitless upgrade capabilities of each of those
devices. A single B16S chassis can support up to 384 devices while delivering
Class 3 PoE power.
For instance, where some scheduled down-time is acceptable for upgrading

code, the B50G is a suitable option as described in the following sections.
12.3 Enterprise access tier

In the enterprise case, a user can be a workstation, a VoIP device (often
connecting a workstation as well), a printer, or a wireless access point, to name
just a few devices.
12.3.1 Enterprise access device

There are two series of IBM Ethernet products that ideally suite the access tier,
the g-series and the s-series. The correct device for your enterprise will depend
on management decisions made earlier and the ability for the specific location to
expand with the business.
12.3.2 Product selection

In the following topics we discuss points regarding product selection.
Type
Technology decision

What device can support my expected number of users? How can the device
grow with my need? Can the device provide Power over Ethernet (PoE) for my
PoE devices?

Assumptions
򐂰 Cabling at the site is capable of supporting your requirements.
򐂰 All hard wired connections are within the physical cable run maximum
distance for Ethernet (100m).
򐂰 The closet for the access device has cabling, or space, for connectivity to the
distribution device.
Alternatives
At the access tier, there are two alternatives and each of these has scalability
options. Table 12-1 shows a comparison of the products at a series level.
Table 12-1 Comparison of g-series and s-series for the Enterprise access tier.
Feature g-series (B50G) s-series
Maximum 10/100/1000 48x per stack unit 24x per module

Mbps Ethernet ports 384x for a stack of 8 units 192x for the 8-slot chassis
384x for the 16-slot chassis
PoE capable Yes, class 3 support Yes, class 3 support

field upgradable option field upgradable option
System Power Supplies 1+1 per stack unit 1+1 for the 8-slot chassis
16 in a stack of 8 units 2+2 for the 16-slot chassis
Separate PoE Power System power supplies 2 for the 8-slot chassis
Supplies supply PoE power 4 for the 16-slot chassis
10 GbE support 2-port 10 GbE modules Yes, 2-port 10 GbE modules

available but might be available
occupied by stack use 20x for the 8-slot chassis
36x for the 16-slot chassis
Layer 2 / Layer 3 support No - Static routes only Yes

(Base Layer 3 image)
IPv6 support Not available Available as an option - all

modules need to be
purchased with IPv6
support. See product
introduction section for
more details.
Considerations
򐂰 Is PoE required or planned? Many VoIP handsets can utilize PoE saving the
need to deploy separate power supplies on individual desks. Also some
wireless access points can also utilize PoE, allowing the network to have AP’s
deployed in ideal wireless points without concern for power outlets.
򐂰 Is the location small but capable of growing? A small location might initially
only house up to 40 users, but have the ability to expand to adjoining office
space as the need arises. In this case the B50G can provide stacking
capabilities allowing low cost expansion.
Decision
The s-series provides scalability in a flexible low cost chassis format. It allows for
separate hot swapped PoE power supplies for growth and redundancy. Modules
allow growth in 24 port increments for the 10/100/1000 Mbps RJ45 or the
100/1000 SFP options. If your site requires the connections to the distribution to
be 10 GbE capable, these modules are available each with two ports. If you need
Layer 3 functions at the user access layer, the s-series has options for either IPv4
only or IPv4 and IPv6 support. This is the choice for the enterprise site that has
expansion plans or requires more than 48 user ports plus 10 GbE ports to
connect to the distribution layer.
The g-series is a good choice for the smaller site, or sites where greater
over-subscription between the user ports and the uplink to the distribution tier is
suitable. The B50G allows more units to be added to the stack as a site grows,
allowing the stack to be operated as a single unit, from a single management IP.
The B48G allows for the addition of a 10 GbE module if high speed uplinks to the
distribution layer are required and the site does not have the capability to expand,
or the convenience of operating a stack is not required.
12.4 Enterprise distribution tier

All user access devices must connect through a distribution tier. This allows
access devices to scale to the business needs at the site, while maintaining
architectural integrity allowing for greater port density at the user access tier, to
utilize less physical ports at the core tier.

12.4.1 Enterprise distribution location
There are a number of factors that can influence the physical location of the
distribution device, the layout of the building or campus being arguably the main
factor in deciding on the location.
Type
Design decision

There are multiple floors in each of my campus buildings, how can we efficiently
connect these building back to my campus core? How can my enterprise network
route peer-to-peer services (for example, VoIP) more effectively?
Assumptions
򐂰 Connectivity is available between floors and is terminated in a central location
for each building.
򐂰 Connectivity is available between buildings and is terminated at a central
location on the campus.
򐂰 Appropriate media is used between floors or buildings for the distances
involved.
򐂰 Each floor, or building, has a secure closet with sufficient power, cooling and
Alternatives
򐂰 Position the distribution devices in a central location for the campus so that all
floors, or buildings, can connect to the central distribution. This most likely
requires fiber covering the entire distance from the floor to the central
building, a rather inefficient use of fiber connectivity.
򐂰 Position distribution devices in a central part of the building so that all floors,
or local buildings, can connect to the building distribution device. This requires
cabling between the floors and the central location for the building. Then
further cabling between the building and the location for the campus core
devices.
򐂰 Collapsed access and distribution tier for a single building, where the
distribution switch is large enough to connect to all end devices and does
routing to the MAN, WAN, or Internet.
Considerations
򐂰 Efficient use of expensive cabling, such as connections between buildings.
These connections are typically laid underground and digging up the trench to
lay more connections adds cost due to the care required to ensure no existing
connections are severed with the back hoe.
򐂰 The cost difference between Layer 2 and Layer 3 devices has allowed for
collapsing the access and distribution tiers into a single device. This does
restrict the scalability but might be a viable option for buildings that cannot
scale further.
򐂰 Routing between VLANs will provide benefits within the building. For example,
if printers are on a separate VLAN to the workstations, the distribution tier can
enable routing between workstations and printers without the traffic needing
to leave the building.
򐂰 Peer-to-peer connectivity is also enhanced by deploying the distribution tier
within a building. Consider the following VoIP example: each department
device can be put on a separate VLAN for security reasons, for example,
marketing is on VLAN 100 while finance is on VLAN 200. If finance needs to
access some documents on marketing’s shared drive, they need to be routed
to another VLAN. Having a distribution switch within the same building that is
Layer 3 capable can prevent traffic needing to traverse back to the Core.
Decision
Distribution tier devices must be deployed to allow scalability of the enterprise.
These devices must be located at a central location for the building, or site
depending on layout. Deploying the distribution tier at the building allows routing
between VLANs within the building.
12.4.2 Enterprise distribution product selection

The Enterprise distribution device must have higher availability, port-density, and
support 10 GbE links to aggregate the edge access switches. Therefore either
the IBM Ethernet s-series or r-series is suitable as a distribution device. If the
distribution layer will act as a Core and connect directly to the MAN/WAN/Internet
and requires advanced services such as MPLS then the IBM m-series must be
strongly considered.
Type
Technology decision

What protocols can be run at the distribution tier? Can the network infrastructure
require IPv6?
Assumptions
򐂰 The enterprise campus is large enough to require a separate device for the
distribution tier.
Alternatives
򐂰 Layer 2, this might be beneficial if the access tier is running in Layer 3 and
has therefore made the initial routing decision.
򐂰 Layer 3 IPv4, this is the more traditional design where the access tier provides
Layer 2 functions and the distribution tier provides routing functions.
򐂰 Layer 3 IPv6, this might be a requirement in some government networks or
other corporations that have decided on supporting IPv6 throughout their
network.
Considerations
򐂰 In a small site it might be acceptable to collapse the access and distribution
tiers into one device.
򐂰 If IPv6 support is required, all modules in the s-series must be purchased with
IPv6 capability.
򐂰 How many end switches will need to be aggregated at this point?
Decision
Unless advanced routing services such as MPLS are required, the decision will
most likely be between the IBM s-series and IBM r-series devices. Both devices
are fully capable, resilient, modular chassis that support Layer 3 routing. One key
criteria might be how scalable the device needs to be. Table 12-2 is a snapshot of
the speeds/feeds of these devices:
Table 12-2 Speeds and feeds
r-series s-series
Maximum 10/100/1000 48x per module 24x per module

Mbps Ethernet ports 192x for 4-slot chassis 192x for the 8-slot chassis
384x for 8-slot chassis 384x for the 16-slot chassis
768x for 16-slot chassis
PoE capable No Yes, class 3 support

field upgradable option
System Power Supplies 2+1 for 4-slot chassis 1+1 for the 8-slot chassis
3+1 for 8-slot chassis 2+2 for the 16-slot chassis
5+3 for 16-slot chassis
Separate PoE Power Not applicable 2 for the 8-slot chassis

Supplies 4 for the 16-slot chassis
10 GbE support Yes, 4- or 16-port 10 GbE Yes, 2-port 10 GbE

modules available modules available
256x for 16-slot chassis
Layer 2 / Layer 3 support Yes Yes
IPv6 support Yes Available as an option - all

modules need to be
purchased with IPv6
support. See product
introduction section for
more details.
12.5 Enterprise core tier

In the following topics we discuss the enterprise core tier.
12.5.1 Core device

As discussed in section 12.2.1, “Enterprise site design” the core might or might
not be the same device as the DCN core.

12.5.2 Enterprise core product selection
The Enterprise Core devices act as an aggregation point for all other switches
within an Enterprise site and provides out-bound connectivity to the data center,
MAN, WAN, or Internet Service Provider. By centralizing outbound traffic network
administrators can have a single location where appropriate security measures
are applied.
Type
Technology decision

How can the enterprise core device provide maximum uptime (minimal
disruptions to service)? Will the core device be capable of peering with many
BGP peers in the enterprise? How can the core push MPLS deeper into the
enterprise? What products will support my IPv6 requirements, even if we don’t
run IPv6 today?
Assumptions
򐂰 The core device will run BGP.
򐂰 The core will run Layer 3 routing protocols.
Alternatives
Both the m-series and r-series devices can be placed at the enterprise core tier,
and in smaller environments the s-series as well. Table 12-3 shows a short
comparison of the products highlighting some of the considerations the core tier
might have.
Table 12-3 Product comparison
Feature r-series s-series m-series
Hitless upgrade L2 and L3 L2 (V 5.1 and L2 and L3

greater)
Memory 512 MB 512 MB 1 GB
IPv6 support Yes As option Yes
MPLS support Not Available Not Available Yes
POS capable Not Available Not Available Yes
Feature r-series s-series m-series
VRF Not Available Not Available Yes

configurable
Route Support 400k IPv4 routes 256K IPv4 routes 512K IPv4 routes
(FIB) (FIB) (FIB)
1M BGP routes 1M BGP routes 2M BGP routes
(RIB) (RIB) (RIB)
Both require the full
Layer 3 feature set
ACL Support Ingress ACLs only ingress ACLs only ingress and egress
ACLs available
Considerations
򐂰 The m-series and r-series devices have supported hitless upgrades for a
number of releases which provide a mature hitless upgrade solution.
򐂰 If your enterprise site is housing multiple functions, or business units, that
have a need for separate routing and forwarding tables, VRF will provide this
separation available on the m-series.
򐂰 If there is a requirement to support IPv6, or perhaps deploy it in the near
future, the m-series and r-series supports dual stack IP out of the box. The
s-series has different modules for IPv4 only and IPv4 plus IPv6 support,
therefore all modules that needs to run L3 IPv6 must be upgraded to the dual
IPv4 and IPv6 model.
Decision
The m-series and r-series provide proven support for L2 and L3 hitless upgrades,
removing the need for downtime during upgrades. The m-series provides greater
support for IP routing protocols with sufficient memory to maintain up to 2 million
BGP routes in the BGP RIB and as many as 512,000 routes in the IPv4 route
protocols FIB as well as support for advance services such as MPLS and VRF.
As IPv6 becomes a requirement in the enterprise, the m-series is ready to
provide dual stack support for both IPv6 and IPv4, this avoids future downtime to
upgrade modules to IPv6 capabilities in case IPv6 is required.
For larger Cores, the m-series must be strongly considered. Smaller Enterprise
sites can utilize an r-series or s-series device as appropriate.

13
Chapter 13. Network design for high

performance computing
In this chapter we consider the configuration options available with the IBM
Ethernet products when used in a high performance computing environment.
There are two current generations of HPC that we consider in this chapter.

13.1 High Performance Computing
Within the data center, a specialized environment has emerged to provide High
Performance Computing (HPC). There are two methods of deploying HPC today,
second and third generation HPC, known as HPC 2.0 and HPC 3.0 respectively.
Although there will be many debates as to which model of HPC is most

appropriate under differing circumstances, the IBM range of products can provide
solutions for both HPC 2.0 and HPC 3.0 installations. As of June 2009, the top
500 super computer project site reports that over 50% of super computers are
using gigabit Ethernet as their connectivity media.
Whereas IBM acknowledges the growth in the use of Infiniband for high speed,
low latency supercomputing environments, the use of gigabit Ethernet has been
increasing since June 2002 as shown in Figure 13-1.
Some authors have referred to the Ethernet connected supercomputer as the

“good enough” end of the market. The “good enough” expression is used
because both Ethernet and TCP/IP protocols can introduce extra processing
requirements, which add latency to the communications. This latency might not
be acceptable to the leading edge, high performance computing environment.
Note: For the Top 500 Super Computer Project, see the following website:
http://www.top500.org/
Figure 13-1 Connectivity media used by the top 500 supercomputers
13.1.1 HPC 2.0

The second generation of HPC is shown Figure 13-2. As can be seen, there are
three main connectivity components:
Storage Fabric This component is separate connectivity specifically for
the storage environment, often utilizing a specialized
infrastructure to enable high speed connectivity.
Cluster Fabric This component provides separate connectivity for use
between the cluster systems. The fabric is capable of
higher speeds and is also often deployed with a
specialized infrastructure enabling high speed data
communications, such as inter-processor signalling and
communications.
Chapter 13. Network design for high performance computing 283

Network connectivityThis component is provided through standard Ethernet/IP
network interfaces. The network connectivity provides
communication for the users of the HPC cluster.
Although HPC 2.0 allows for communications separation by function (that is,
storage, HPC communications and network / user access), it also increases the
complexity and cost to deploy the various fabrics, often leading to the creation of
specialized teams within the support organizations.
Furthermore, each system in the cluster must have the appropriate specialized
communications modules to be connected to each fabric, and this might increase
the complexity for system management teams. The combination of these added
support requirements is thought to restrict deployment of HPCs to organizations
that have the financial ability to support various technologies within their data
center.
With HPC 2.0, the architect can consider a combination of IBM Ethernet products
as well as the IBM storage products. The storage products are not covered in this
book; however, many Redbooks publications are available for those products.
Figure 13-2 HPC 2.0 showing separate storage and cluster fabric
13.1.2 HPC 3.0
To reduce the various teams required to support HPC 2.0 and with the availability
of 10 GbE, many organizations are now investigating HPC 3.0 as a viable
alternative for their business needs. HPC 3.0, shown in Figure 13-3, utilizes a
single fabric for all communications needs (that is, storage, compute
communications and network communications).
The major benefits in HPC 3.0 are expected to be the ability for more enterprises
to utilize HPC clusters by reducing the cost of support. The network support
team are now able to support the cluster fabric and storage fabric with little extra
knowledge than they otherwise need for their Ethernet networks today.
With HPC 3.0, the architect can choose from the IBM Ethernet product range to
provide network fabric. Storage devices are not covered in this book, however,
note that Fibre Channel over Ethernet (FCoE) or iSCSI devices can connect to
the IBM Ethernet products, as described in the following sections.
Figure 13-3 HPC 3.0 utilizes a single fabric for compute, storage, and network communications

13.2 HPC 2.0
As previously described in Figure 13-2 on page 284, HPC 2.0 utilizes different
fabrics for different purposes. For HPC 2.0 the IBM Ethernet products can be
deployed in the cluster fabric as well as the network access.
13.2.1 HPC 2.0 cluster fabric architecture

The network architect needs to consider the scalability of the HPC cluster fabric.
Simple considerations at this stage will allow greater flexibility in the future.
Type
Design decision

Does my client requirement need to be scalable? How must an Ethernet HPC
cluster fabric be designed to allow future growth? The current site planned for
HPC deployment is not large enough to grow past the initial requirements for a
small HPC, however the client is already looking at a new DCN that this HPC
might either connect to or be relocated to. How can the cluster fabric be flexible
enough for these growth considerations?
Assumptions
򐂰 The HPC is physically within the limits of Ethernet.
򐂰 Expansion to another HPC site utilizes high speed, low latency fiber for
site-to-site connectivity. (for example, a carrier’s MAN or DWDM)
Alternatives
򐂰 Flat HPC design, with a single device connecting all the systems together.
򐂰 Hierarchical design, with tiered connectivity allowing scalability and easier
future growth.
Considerations
򐂰 Existing DCN design might dictate the HPC design.
򐂰 Other data center controls might constrain design of the HPC environment.
򐂰 Other existing infrastructure might dictate the HPC design.
Decision
Just as in the DCN case for maintaining a hierarchical design for scalability, a
single HPC with any large number of systems connected to the one network
switch is not scalable. Instead, consider a simple two tier, hierarchical redundant
design. However, perhaps a flat topology might suit the HPC installation, if the
site is small and does not have room to expand to become a major HPC center.
13.2.2 IBM Ethernet products for the flat HPC 2.0

For the deployment of a smaller HPC, the architect can reuse existing compute
resources, thus giving the client the ability to work with an HPC environment,
perhaps as their new development direction, without the need for purchasing new
systems. There are many reasons both HPC 2.0 and a flat fabric topology might
be chosen. Let us look at the possible decisions for the technology platform.
Type
Technology decision

How can you solve the need to have multiple computers connected to form a
seamless cluster without retraining staff in new technology? How can the DCN
maintain common connectivity media?
Assumptions
򐂰 For HPC 2.0, we assume that the storage fabric exists, or is being created.
򐂰 The intended application has been designed with HPC in mind.
Alternatives
򐂰 High speed connectivity with Ethernet, this is considered in Table 13-1
because it represents over 50% of the HPC environments as of June 2009
according to the Top 500 project.
򐂰 High speed and low latency with other technology, such as Infiniband. This is
not considered here as many other references cover this topic.

Table 13-1 Comparison of IBM r-series and m-series for HPC
Feature r-series m-series
Maximum 1 GbE ports 768 on the 16 slot chassis 1536 on the 32 slot chassis
384 on the 8 slot chassis 768 on the 16 slot chassis
192 on the 4 slot chassis
Maximum 10 GbE ports With 4:1 oversubscription: With 1:1 subscription:

With 1:1 subscription:
Considerations
򐂰 Although other connectivity technology might be required for the client, the
architect also has to consider the impact of multiple types of connectivity
media within a data center. In certain cases, this is acceptable and the
business case will support this decision.
򐂰 Jumbo frame or custom packet sizes might be required to support HPC
control traffic. The IBM DCN devices support jumbo frames up to 9,216 bytes.
򐂰 High speed data transfer is the primary requirement of many HPC
applications such as geophysical data analysis. In these cases latency
introduced by Ethernet protocols is not a major consideration.
Decision
For HPC with GbE connectivity the m-series allows for greatest capacity with
1,536 1 GbE, 1:1 subscribed ports on the 32-slot chassis.
For HPC with 10 GbE connectivity requirements the r-series allows for greatest
capacity at 768 10 GbE ports but at 4:1 oversubscription. If 1:1 subscription is
required then the m-series with up to 128 ports can be used.
13.2.3 HPC connectivity in a tiered HPC 2.0

Just as in the DCN case for maintaining a hierarchical design for scalability, a
single HPC with any large number of systems connected to the one network
switch is not scalable. instead, consider a simple two tier, hierarchical redundant
design.
Type
Design decision

How can you solve the need to have multiple compute nodes connected to form a
seamless scalable cluster? How can the DCN maintain common connectivity
media?
Assumptions
򐂰 The application has been designed with HPC clusters in mind.
򐂰 The HPC environment fits the “good enough” segment where both high speed
and low latency are required.
򐂰 HPC is a requirement for the client application.
Alternatives
򐂰 Tiered HPC cluster fabric; utilizing various tiers to provide the scalability
required.
򐂰 Retain the flat HPC cluster fabric; purchase an IBM Ethernet product with
more slots to accommodate the current scalability requirements.
Considerations
򐂰 Although other connectivity technology might be required for the client, the
architect also has to consider the impact of multiple media within a data
center. In certain cases, this is acceptable and the business case will support
this decision.
򐂰 High speed data transfer is the primary requirement of many HPC
applications such as geophysical data analysis. In these cases, latency
introduced by Ethernet protocols is not a major consideration.
Decision
It is a best practice for the network architect to always design with scalability in
mind. In the case of HPC 2.0, this might allow an architect to start with an IBM
r-series to create a single tiered model for the initial deployment. Then as HPC is
embraced within that client, the design can scale up with the addition of an
m-series, especially with the large 1:1 10 GbE capacity of the device and ability
to create link aggregation groups of 32 ports for up to 320 Gbps of bandwidth
between two m-series devices.

13.3 HPC 3.0
As previously illustrated in Figure 13-3 on page 285, HPC 3.0 utilizes the same
fabric for all the various purposes. For HPC 3.0, the IBM Ethernet products can
be deployed to be used as the single fabric solution.
13.3.1 HPC 3.0 cluster fabric architecture

After the decision has been made to utilize an HPC 3.0 architecture, the network
architect will see many advantages in deploying a tiered architecture. Not only
will the tiered architecture provide scalability for the HPC cluster, it will also retain
the overall DCN tiered architecture.
Type
Design decision

How must the HPC 3.0 architecture be deployed within my DCN while still
retaining the tiered nature of the DCN? How can the HPC utilize storage on the
cloud?
Assumptions
We make the following assumption:
򐂰 The HPC is physically within the limits of Ethernet.
Alternatives
򐂰 Flat HPC design, with a single device connecting all the systems together.
While the HPC might look somewhat flat, the DCN will unlikely be a single tier.
򐂰 Hierarchical design, with tiered connectivity allowing scalability and easier
future growth. This alternative retains DCN architecture as well.
Considerations
򐂰 The design of the current DCN might influence the design of the HPC.
򐂰 The design of an existing HPC might influence the design of the new HPC.
Decision
HPC 3.0 must consider an hierarchical design for scalability, and interoperability
with the existing DCN architecture. HPC 3.0 must consider connectivity to
storage as well as the network, all through the same fabric. Current industry best
practice has proven that a tiered approach allows for greater flexibility and
scalability. This is no different in the case of an HPC 3.0 design.
13.4 HPC case study

Let us take a look into one potential growth path from a traditional DCN to an
HPC cluster environment. To do this, we will start by considering a data center
that already has a storage fabric and the decision is made to investigate how
HPC benefits the business. We then investigate a potential scenario of replacing
the storage on the fabric and migrating to HPC 3.0. This is not the only way to
achieve the desired results, but the intent is to show the scalability available
when the architect can look ahead to the future.
13.4.1 Deploying HPC 2.0 with an existing storage fabric

In this case the architect has been requested to investigate the ability to deploy
an HPC environment for the development team to utilize. This will allow the
development team to start working HPC into their application well before
deployment.
Type
Design decision

How can the HPC be designed to integrate with the existing infrastructure in the
data center? How can the HPC design allow scalability for the future? How can
the HPC be deployed without the need to train staff in new network technology?
Assumptions
򐂰 The data center already utilizes a separate storage fabric. We assume that
the corporate direction is to continue utilizing this storage infrastructure.
򐂰 Network delivery staff are already familiar with Ethernet technology and have
worked with various Ethernet connectivity options before (RJ45, Cat6, SFP,
XFP, Fiber).

򐂰 The environment is not expected to need more than 1 GbE connections for
the compute links, but will require full wire-speed bandwidth availability
throughout the HPC cluster.
򐂰 We assume an initial connectivity of 250 compute nodes with expected
growth to potentially 500 compute nodes over the next 12 to 18 months.
򐂰 IPv4 is sufficient for this environment.
򐂰 All lengths are within 300 meters of multi-mode fiber (used in this example to
show how modular building blocks can be defined).
Alternatives
򐂰 HPC 2.0 with 10/100/1000 MbE RJ45 interfaces for compute node
connectivity. Initially flat but scalable as the HPC grows, maintaining a
separate HPC fabric.
򐂰 HPC 2.0 with 10/100/1000 MbE RJ45 interfaces for compute node
connectivity, scalable to 500 compute nodes over time.
Considerations
򐂰 These two alternatives do not need to be independent from each other. In fact
alternative one can be used to create a basic building block for the scalability
required in alternative two.
򐂰 Distances between the equipment needs to be confirmed prior to the architect
defining a base building block, or at least made into a selectable component
of a building block.
Decision
In this case we will decide to start with the first alternative and define a building
block that can be used to scale to the second alternative.
Alternative 1
For this design, the architect can utilize a modular design where compute nodes
are connected to an IBM r-series, B08R, with 10/100/1000 MbE RJ45 modules
installed. This module assumes each server has three connections:
򐂰 1 x Storage fabric connection
򐂰 2 x Network connections (1 x server access connection, 1 x compute cluster
fabric connection)
The compute module might look something like Figure 13-4.
Figure 13-4 Compute module with integration into existing DCN server access and storage infrastructure
Table 13-2 shows the initial compute cluster fabric hardware purchase list. This
allows for up to 288 compute nodes to be connected to the single r-series switch.
The complete design also needs to account for sufficient cabling to connect the
compute nodes to the switch, which are not included in this example.
Table 13-2 Initial compute cluster fabric hardware list

Quantity Description
1 8-slot r-series, IBM Ethernet Switch B08R
1 Switch Fabric Modules - 1 extra for redundancy

2 included in base bringing configuration to 3
1 Management module - 1 extra for redundancy

1 included in base bringing configuration to 2
6 48 port 10/100/1000 MbE interface module (MRJ21)

Quantity Description
1 Power supply - 1 extra for redundancy

2 included in base bringing total to 3
2 Power cable
Alternative 2
To allow this HPC to scale up to 500 systems and beyond, the architect has to
decide upon a suitable time to split the infrastructure. For this example, we
assume that the next compute nodes were chosen to be used to expand the HPC
2.0 compute cluster fabric. For the next twenty systems to connect, the network
architecture looks similar to Figure 13-5.
Figure 13-5 HPC 2.0 with 500 compute nodes connecting to a scalable HPC cluster fabric
In this case the architect deploys another B08R with the same configuration but
orders two 4 port 10 GbE modules to populate the two remaining slots on each of
the devices. An 8 port LAG can be configured to allow up to 80 Gbps of traffic to
pass through the two devices.
13.4.2 HPC is an evolving art

Although designs similar to the DCN are utilized here, this is done simply for
clarity. The network architect who has a need to design an HPC must review the
current industry practices and apply the knowledge given in this chapter to select
the best devices for the architecture decided upon. At the time of writing, there
were many multi-dimensional designs being used for HPC; some are proprietary,
while others seem to be gaining acceptance.
The IBM Ethernet products provide solutions for the network designs of today
and the designs of the future. All the IBM Ethernet products have 10 GbE
capabilities today, as well as support for IPv6, both dependent on options.

Related publications
The publications listed in this section are considered particularly suitable for a
more detailed discussion of the topics covered in this book.
IBM Redbooks publications

For information about ordering these publications, see “How to get Redbooks
publications” on page 298. Note that some of the documents referenced here
might be available in softcopy only.
򐂰 Introduction to Storage Area Networks, SG24-5470
Online resources
These websites are also relevant as further information sources:
򐂰 IBM System Storage hardware, software, and solutions:
http://www.storage.ibm.com
򐂰 IBM System Networking:
http://www-03.ibm.com/systems/networking/
򐂰 IBM b-type Ethernet switches and routers:
http://www-03.ibm.com/systems/networking/hardware/ethernet/b-type/in
dex.html
򐂰 Brocade Resource Center:
http://www.brocade.com/data-center-best-practices/resource-center/in
dex.page
򐂰 Brocade and IBM Ethernet Resources:
http://www.brocade.com/microsites/ibm_ethernet/resources.html
򐂰 IBM System Storage, Storage Area Networks:
http://www.storage.ibm.com/snetwork/index.html

How to get Redbooks publications
You can search for, view, or download Redbooks publications, Redpapers
publications, Technotes, draft publications, and Additional materials, as well as
order hardcopy Redbooks publications, at this website:
ibm.com/redbooks
Help from IBM

IBM Support and downloads:
ibm.com/support
IBM Global Services:

ibm.com/services
Index
aggregation 33
Numerics aggregation switches 67
4003-M04 37, 53
algorithm 159, 188
4003-M08 37, 53
alternate path 147
4003-M16 37, 53
Anycast RP 52, 62
802.1d 143
application specific integrated circuit 123
802.1p 168
architecture 174
802.1Q 169, 224
ARP 77
802.1w 144
ARP cache poisoning 226
802.1x 223
ARP packet 226
802.1x supplicant authentication 223
ARP setup 222
802.3 115
ARP table 227
ARP traffic 193
A ARPANET 111
AAA 232 ASIC 123
Access Control Lists 41, 200, 225, 229 Asynchronous Transfer Mode 159
access layer 269 ATM 114, 156, 159
access point 151, 220 attacks 100, 224, 231
access port 225 authentication 225, 232
access tier 146, 268 authentication credentials 228
access tier availability 146 authentication request 222
Accounting 232 authentication server 232
ACL based mirror 216 Authentication, Authorization, and Accounting
ACL based rate limiting 177–178 (AAA) 232
ACL counting 180 authorization 232
ACL counting policy 177 authorized neighbors 228
ACL keyword 163 automated threat detection 101
ACL logging 191 auto-sensing 65, 86, 103
ACL look-up 196 auto-switching 65, 86, 103
ACL match 183 average rate 193
ACL rule 169
ACLs 41, 166, 191, 200, 225, 229
actions 180
B
backplane 40, 54
active link 149
backup interface priority 149
adaptive rate limiting 179
backup path 144
address 113, 156
backup router 148
Address Resolution Protocol 77
backup switches 144
address translation 139
bandwidth 46, 57, 114, 158, 161, 163, 173–175,
adjusted QoS 166
190, 193, 201
Advanced IronWare 82
bandwidth allocation 171
Advanced Layer 2 36
bandwidth constrained 189, 199
Advanced QoS 81
bandwidth limit 179
Advanced Research Projects Agency Network 111
bandwidth management 82
age out timer 222

bandwidth percentage 172 bus topology 118
baseline 162 business continuity 83
basic DSCP 167 bytes 113
best-effort network 156
best-effort queue 166–167
best-effort service 161
C
cable design 269
BGP 36, 49, 71, 147, 153, 278
cable management 39, 54
BGP Password 228
cabling 272
BGP password 228
CALEA 100
BGP RIB 279
CAM 175
BGP route tables 228
CAM entries 175
BGP routes 279
campus buildings 274
bidirectional communication 139
campus core 274
binding table 227
capacity 156
bits 113
Carrier Ethernet 69
bits-per-second 113
Carrier-Grade 69
blocked port 145
Cat5 112
Border Gateway Protocol 49
Cat6 112, 291
bound 177
Category 5 112
BPDU 51, 61, 143–144, 223
Category 6 112
BPDU Guard 100, 224
CBS 179–180, 192
BPDU guard 215
CBWFQ 159
BPDU packets 224
CELEA 216
bps 113
CELEA compliance 216
Bridge Protocol Data Unit 51, 61, 143, 215, 223
Charles Clos 39
bridged environment 120
Chassis 270
bridges 119
chassis 270
broadcast domain 41
chunks 114
broadcast domains 120
CIR 179–180, 192, 201
broadcast packet 115
CIR Bucket 201
broadcast packets 119
CIR bucket 192
broadcast traffic 120
CIR rate 192
broadcast video 82
CIR rate limiting 192, 201
broadcasts 144
circuit breaker 143
Brocade IronWare 79
Cisco EtherChannel 50, 61
Brocade Multi-Service IronWare 37, 53, 69
CISCO PVST 51, 61
b-type family 33
Class based weighted fair queueing 159
bucket 192
classification process 159
buffer traffic 187
classified packets 163
Buffer tuning 159
classified traffic 163
buffered 119, 161, 189, 199
classifying packets 189
buffers 40, 157, 175
clear tunnels 140
building management systems 95
CLI compatibility 195
burst size 190, 200
Clos architecture 39
burst traffic 193
Clos fabric 38
burstiness 189, 199
closed loop feedback 40
bursts 179
closed loop threat mitigation 101
bursty traffic 40, 160, 175
cluster 284
bus 111, 119
cluster fabric 283, 285–286, 289 counter 176
Cluster Fabric Architecture 286 CPU-bound traffic 191
cluster fabric architecture 286 credits 192
cluster systems 283 cross module trunking 50, 61
coaxial 111 crossbar 84
color based terminology 195 c-series 31, 123
Committed Burst Size 179, 192 c-series router 200
Committed Information Rate 179, 192, 201 c-series traffic policies 200
common connectivity 287 C-TAG 185–186
Communications Assistance for Law Enforcement Current-Q-Size 188
Act 100 Customer VLAN tag 185
community strings 233
compression 160
compromised port 101
D
DAI 226
compute nodes 292, 294
data center 33
conductor 113
data communications 113
configuration considerations 193
data flooding 221
configuring QoS 187
data flows 229
conformance level 178
data packets 167
congestion 157–158, 187–188
data traffic 194
congestion avoidance 159
data traffic type 194
congestion management 160
DCN core 277
Connectivity Check Messages 70
dead timer interval 146
Connectivity Fault Management 70
decode policy map 185
connectivity standards 28
decoded DSCP 183
connectors 111
decoded PCP 183
console port 220–221
dedicated bandwidth 156
console time-out 221
Dedicated link emulation 158
Content Addressable Memory 175
default gateway 122
contention 119
default mappings 167
Control 195
Default QoS mappings 185
control packets 168
default route 121
control traffic 168, 195
Deficit weighted round robin 159
converged 36, 49, 60, 95
DEI 182, 186
converged access layer 33
DEI bit 185, 187
converged applications 81
delay 156–157, 161
converged solution 95
delay budget 160
cooling system 43, 55
delays 157
copied-CPU-bound packets 191
delay-sensitive traffic 189
copper 97
Denial of Service 54
core tier 268, 273
Denial of Service (DoS) 41
corrupted 157
denial of service attack 41
CoS 163, 165
Dense 51, 62
CoS parameters 178
density 38, 53
cost difference 275
Dest Addr 115
cost effective 149
destination address 115, 230
cost reduction 33
destination IP address 122
cost requirements 33
destination MAC address 119
cost savings 136
Index 301
destination network 147 Dual Mode Ports 225
deterministic latency 39 dual stack 42, 49, 54, 60, 279
device reduction 138 DWRR 159
DHCP snooping 226 Dynamic ARP Inspection 226
differentiated service 161 dynamic infrastructure 33
differentiation 161 dynamic list 222
DiffServ 159, 163 dynamic routes 146
Diffserv 159 dynamic routing 147
Diffserv Control Point 167 dynamic routing protocols 147
digital assets 33
discarded 161
disruption 112
E
EBS 192
distance 269
edge aggregation 71
distributed queuing scheme 40
egress 71, 181, 185, 198
distribution board 143
egress interface 185, 198
distribution layer 273
egress procedures 199
distribution model 268
egress QoS procedures 187
distribution router 148
egress rate shaping 189, 200
distribution tier 148, 268, 275
egress traffic 184, 198
DNS 117
EIR 192
domain name system 117
EIR bucket 192, 201
DoS 41, 100
E-LAN 49, 70
dotted decimal format 116
election process 144–145
downtime 279
electrical connections 112
Drop Eligible Indicator 182, 186
electrical interference 112–113
drop precedence 181, 183–184, 194, 196–197
electrical properties 113
drop precedence values 188, 195
E-LINE 70
drop priority 189, 199
enable mode 221
dropped packets 176, 188
encrypted 221, 226
dropped traffic 163
encrypted format 233
dropping packets 160
encryption 140
drop-precedence 186, 198
endpoints 193
drop-precedence force 186, 198
engine 73
drop-precedence value 186
enhanced QoS 187, 198
DSCP 41, 163, 165, 167–168, 181, 196
enterprise access device 271
DSCP based QoS 167
enterprise access tier 271
DSCP decode 185
enterprise core tier 277
DSCP decode map 183
enterprise distribution location 274
DSCP Decode Table 185
enterprise distribution tier 273
DSCP Encode Table 185
enterprise site 273
DSCP field 163
EPL 49
DSCP forcing 183, 196
error rate 156
DSCP marking 169
Ethernet 114
DSCP priority bits 183
Ethernet intelligence 119
DSCP value 165, 167, 169, 196
Ethernet LAN 49
DSCP values 210
Ethernet Packet Delivery 118–119
dscp-matching 170
Ethernet Private Line 49
dual homing 146
Ethernet Switch 121
dual IPv4 and IPv6 279
Ethernet switches 33 fixed-size cells 39
Ethernet Virtual Private Line 49 flaps 215
Ethernet/IP routers 33 flat fabric 287
EtherType 182, 196 flat HPC 289
E-TREE 70 flat HPC 2.0 287
ETYPE 186 flexible bandwidth 179
EVPL 49 flood 231
Excess Burst Size 192 flood packets 231
Excess Information Rate 192 flooding 41
excess traffic 176, 189 floor loading 143
exchange information 224 flow 156
exhaust temperature 98 force 181
existing storage fabric 291 force priority 187, 198
EXP 181 forcing 183–184, 196–197
EXP decode map 183, 185 forwarded 144
EXP Decode Table 185 forwarding 199
EXP Encode Table 185 forwarding cycle 167
EXP forcing 183 Forwarding Information Base 49, 60
EXP priority bits 183 forwarding information base 83
Explicit congestion notification 159 forwarding mode 223
extended ACLs 169, 229 forwarding priority 165, 167, 170
Forwarding Queue 165
forwarding queue 167, 170
F Foundry Direct Routing 49, 60, 83
fabric 39
fragmentation 160
fabric elements 38
fragments 160
fan controllers 43, 55
frame 114
fan module 43, 55
frame relay 156, 159
fan spin detection 98
frames 114
fan tray 43, 55
FRR 49
fans 43, 55, 72
full bandwidth 161
fast 51, 61
future-proofing 79
fast forwarding 51, 61
fast port 51, 61
Fast Reroute 36, 49 G
FastIron 162–163, 166–167 gaming 158
FastIron QoS 163 gateway address 122
FastIron traffic policies 177 GbE connectivity 288
Fault detection 102 gigabit Ethernet 282
fault detection 102 Gigabits 113
fault domains 120 global rate shaper 175
faults 70 goals 162
faulty NIC 120 good enough 289
FDR 49, 60, 83 government 276
FIB 49, 60, 83, 279 graceful restart 37, 71
Fiber 291 g-series 31, 95, 123
FIFO 160–161 guaranteed allocation 199
first in first out 160 guaranteed delivery 163
fixed rate limiting 173, 175, 179 guaranteed service 161
Index 303
guarantees, QoS 156 ICMP Code 230
ICMP Type 230
IDS 119
H IEEE 802.1ad 185
half slot design 42, 54
IEEE 802.1p 181, 194, 196
hard QoS 71, 158, 161
IEEE 802.1Q 181, 196
hard wired 272
IEEE specification 151
hardware based rate limiting 175
IETF-DiffServ 194
hardware forwarding queue 167, 169
IGMP 51, 61
hardware table 169
inbound traffic 177
hash key 228
incoming packets 169
header 115
inelastic 158
header overhead 160
Infiniband 282
head-of-line blocking 39
ingress 71, 181, 185, 198
heat dissipation 134
ingress data traffic 168
hello packet 144
ingress decode 182
hierarchical design 288, 291
Ingress decode policy map 183
high performance cluster 36
ingress decode policy map 187
High Performance Computing 282
ingress drop precedence 184
high priority 160
ingress interface 194
high priority packets 160
ingress pipeline engine 195
high speed 289
ingress port 183
high speed data transfer 289
ingress priority 183, 197
high speed link 149
Ingress QoS procedures 187
highest preference 167
ingress traffic 181
highest priority 149
Ingress traffic processing 195
hitless management 37, 71
initial bandwidth allocation 193
hitless OS 51, 61
Initial QoS Markers 195
hitless upgrade 153
inspect sFlow 101
hitless upgrades 37, 153, 271, 279
intercept 100
hop 163, 166
interference 112
hot standby paths 36
interleaving 160
hot swapped 273
internal forwarding priority 165, 167–168
HPC 282, 285, 291
internal forwarding priority mapping 165
HPC 2.0 282–283, 286
internal forwarding queue 165
HPC 3.0 282, 285, 290–291
internal priorities 189, 199
HPC clusters 285, 289, 292
internal QoS handling 194
HTTP 233
internal queue 185, 198
hub and spoke 112
internal queues 188
hub-spoke model 161
internal-priority-marking 170
Hybrid WRR 172
Internet Group Management Protocol 51, 61
intrusion detection 101
I Intrusion Detection System 15
IBM Ethernet 285 Intrusion Prevention System 220
IBM Ethernet Router B04M 37, 53 Intrusion Protection System 15
IBM Ethernet Router B08M 37, 53 IntServ 159
IBM Ethernet Router B16M 37, 53 IP ACLs 177, 200, 229
IBM versus Brocade 33 IP address 28, 116, 122
ICMP 230 IP based ACLs 168
IP Differentiated services 159 Layer 3 Differentiated Service Code Point 163
IP header 116 Layer 3 IPv4 276
IP Integrated services 159 Layer 3 IPv6 276
IP phones 100 Layer 3 security 226
IP precedence 179 Layer 3 switches 123
IP routing 82 Layer 3 Switching 123
IP routing protocols 279 Layer 3 trusting 168
IP segment 148 Layer 4 security 229
IP Source Guard 227 Layer 5 security 231
IP Spoofing 228 layer solutions 33
IP-routed 156 layout 269
IPS 220 Leaky bucket 159
IPTV 71, 158 Legacy PoE devices 212
IPv4 292 LFS 102
IPv4 route protocols 279 limit action 180
IPv6 276 limited delivery 163
IronShield 360 security shield 101 line rate 176, 192
IronStack 97, 168 Link aggregation 50, 61
IronStack solution 98 Link Aggregation Group 70, 150
IronWare operating software 100 link capacity 161
isochronous state 157 link efficiency 160
Link Fault Signaling 102
link fragmentation 160
J link management 143
jitter 39, 81, 156–157, 161
LinkTrace Message/Response 70
jumbo 172
load 39
jumbo frames 51, 61, 114
load balancing 28, 148
Local Area Network 118
K Logical NIC sharing 28
Kilobits 113 loop 143
Loopback Message/Response 70
loss 161
L
L3 123 low latency 81, 289
Label Switching Router 36 low speed links 160
LAG 70, 150, 186, 198 lower capacity 161
LAG ports 186–187, 198 lower priority 160
LAN 118 lower queue 171
latency 81, 96, 156, 158, 289 lower ring ID 145
Lawful Intercept 100
Layer 1 Security 220 M
Layer 2 276 MAC 36, 113, 119, 122, 150
Layer 2 ACLs 225 MAC address 28, 41, 113, 118, 120, 163, 165
Layer 2 Class of Service 163 MAC Address Authentication 222
Layer 2 protection 144 MAC address spoofing 221
Layer 2 resiliency 137 MAC addresses 49, 51, 70, 75, 115, 119, 221
Layer 2 security 221 MAC authentication 51, 61
Layer 2 switching 51, 61 MAC entry 166
Layer 2 trusting 168 MAC filter override 216
Index 305
MAC filtering 51, 61 modules 270
MAC learning disabled 216 MPLS 36, 140, 159, 278
MAC port 51, 61 MPLS EXP 41
MAC Port Security 221 MPLS L3VPNs 49
MAC spoofing 114 MPLS packets 182–183
Maintenance End Points 70 MPLS Virtual Leased Line 49
Maintenance Intermediate Points 70 MPLS VPNs 49
malicious user 226 MPLS-TE 49
man-in-the-middle 226 MRP 49, 59, 70, 138, 145
mapping table 181, 196 MRP Master switch 138
marking 169–170 MRP-II 70
marking packets 184, 198 m-series 31, 123, 288–289
marking process 166 m-series Ethernet Routers 36, 52
master 148–149 m-series key features 36
matching packets 169 MSTP 51, 61, 144
Max-Average-Q-Size 188 MTBF 39, 54
maximum burst 193 MTTR 39, 54
maximum distance 272 Multicast 51, 61
Max-Instantaneous-Q-Size 188 multicast 82
MD5 hash 228 multicast packet 42
MD5 key 228 multicast support 71
Mean Time Between Failures 39, 54 multicast switching 71
Mean Time To Repair 39, 54 multicast traffic 71
mechanisms 159 multi-device port authentication 222
Media Access Control 113 multi-dimensional designs 295
MEF 49 multi-homing 146
MEF 14 70 multi-mode 292
MEF 17 70 multiple computers 287
MEF 9 70 multiple paths 147
MEF14 36 Multiple Spanning Tree Protocol 51, 61, 144
MEF9 36 multiple switching paths 39
Megabits 113 multipoint services 70
memory 44, 56, 65, 73, 89, 105, 279 Multiprotocol Label Switching 159
MEP 70 Multi-Ring Protocol 138
merge priority value 181, 183 Multi-Service IronWare 40, 54, 69, 71, 193
merging 183–184 multi-service networks 71
meshed environment 144 Multi-VRF 52
metrics 90
Metro Ethernet Forum 49
metro networks 70
N
NAC 100
Metro Ring Protocol 49, 59, 70, 145
Named ACL 230
microprocessor 123
Naming convention 33
MIM 226
NAT 139
Min-Average-Q-Size 188
neighbor 228
MIP 70
NetIron 162
mirror 100
NetIron c-series QoS implementation 194
modular architecture 40, 54
NetIron m-series QoS 181
modular building blocks 292
NetIron m-series traffic policies 190
module insertion 39, 54
network access 286 packet QoS attributes 195
network access control 100 packet QoS parameters 195
Network Address Translation 139 packet retransmission 157
network bus 112 packet size 39, 114
Network connectivity 284 packets 39, 114, 144, 171
network edge 67 parallel 113
network elements 158 passive backplane 44, 56, 90
network hub 112 PAT 139
Network Interface Backup 146 paths 143
network interface card 111, 113 pay-as-you-grow 95
network monitoring 143 PBR 139
network protocol 114 PBS 180
network range 231 PCP 165, 181, 196
network security 120 PCP decode 185
network traffic 163 PCP decode map 183
network wide QoS 161 PCP Decode Table 185
next hop address 147 PCP Encode Table 185
NIB 146 PCP forcing 183, 196
NIC 111, 113, 118 PCP ID 186
NIC sharing 28 PCP priority bits 183
NICs 115 PCP value 196
non-blocking 39 Peak Burst Size 180
non-rack installation 37, 53, 79 Peak Information Rate 180
non-stackable mode 168 peer device 228
non-stop operation 40, 54 peer-to-peer 274
peer-to-peer connectivity 275
Per VLAN Spanning Tree 51, 61
O performance 33, 44, 56, 65, 67, 73, 89–90, 105,
one-second interval 176
156
OSI Layer 2 114
performance management 162
OSI Layer 3 114, 123
performance metrics 65, 105
OSI layers 220
physical 42, 55
OSI model 146
physical access 220
OSPF 37, 71, 147, 153
physical location 274
OSPF Authentication 228
PIM 51, 62
Outbound rate limiting 174
PIR 180
outgoing interface 170
Pkt-Size-Max 188
output port 40
plug and play 82
overflow 160–161
Pmax 188
overheats 86
POE 271
overlapping rings 70
PoE 33, 43, 55, 86
over-subscription 273
PoE daughter card 87, 104
PoE for VoIP 211
P PoE IP surveillance 136
pace traffic 161 PoE power 86
packet 112, 114, 194 PoE Power Saving Features 135
packet dropping 156 PoE Power Supplies 87
packet header QoS fields 195 PoE Priority 150, 152
packet is dropped 227 PoE supply 152
Index 307
poisoned ARP entry 226 primary switch 144
policers 41 prioritization 168–169
policies 162 prioritize traffic 167
policing 159–161 prioritized traffic 163
Policy Based Routing 139, 229 priority 159, 181, 186, 198
policy maps 182–183 priority 0 166
polling statistics 176 priority based rate shaping 189, 199
port address translation 139 Priority Code Point 165, 181, 196
port default priority 163 priority devices 152
port densities 36 priority force 186, 198
port density 273 priority handling 159
port flap dampening 102, 215 priority mappings 169
Port Loop Detection 102 priority parameters 193
port priority 165–166, 168 priority queue 40, 163, 174–175, 187, 190
port rate shaper 175 priority value 196
port security setting 222 priority values 210
port-and-ACL-based 191 priority-based limit 174
port-and-priority-based 174, 191 priority-based scheduling 189, 199
port-based 173–174, 191 priority-based shaper 200
port-based fixed rate limiting 174 private address space 139
port-based priority 170 promiscuous mode 119
port-based priority settings 169 protected link group 149
port-based rate shaping 200 Protected Link Groups 102, 149
port-based shaper 190 protocol flooding 231
port-level feature 175 protocol packets 191
ports 44, 56, 65, 73, 89–90, 105 Protocol-Independent Multicast 51, 62
power circuit 143 protocols 49, 59, 66, 75, 93, 107, 158, 276
power consumption 86, 134, 136 PSE 151
power efficiencies 136
power failure 143
power feed 143
Q
QoS 40, 50, 60, 77, 96, 156
power loss 89
QoS adjustments 162
Power over Ethernet 33, 43, 55, 65, 86, 103, 150,
QoS architecture 158
271
QoS attributes 195
power parameters 86–87
QoS bits 186
power per port 137
QoS configuration 187
power port priority 212
QoS enabled protocols 158
power reduction 211
QoS goals 158
power redundancy 98
QoS information 170
power requirements 212
QoS management 162, 167
Power Source Equipment 151
QoS mappings 165, 167
power supply placement 88
QoS marking 166, 194
power usage scenarios 136
QoS mechanism 159, 175
power utilization 134
QoS operation 198
power-consuming devices 104
QoS policies 157
precedence 163
QoS priorities 166
predictable manner 147
QoS procedures 198
preference 167
QoS process 159
primary port 198
QoS queues 166 replicate data traffic 100
QoS Remarker 195 replication 41
QoS values 185, 198 resiliency 80, 112
qosp0 166 resilient networks 101
qosp1 167 Resource Reservation Protocol 159
Quality of Service 50, 60, 77, 96 response packet 139
queue 159 response time 162
queue cycles 171 restart time 40, 54
queue names 166 retraining staff 287
queue weights 171 retransmit 157
queueing method 172 reuse 287
queues 71, 157, 160, 166, 170 RFC 1918 116, 139
queuing 158 RFC compliance 67, 108
queuing algorithm 170 RFN 102
queuing mechanism 171 RHP 138, 145
queuing method 171–172, 175 RHP interval 146
queuing methods 170 RHP packet 138
RIB 49, 60
ring environment 145
R Ring Health Packet 145
RADIUS 232
Ring Hello Packet 138
random early detection 159
ring latency 146
random process 188
ring master 145
Rapid Spanning Tree Protocol 49, 59, 144
ring resiliency protocol 70
Rapid STP 51, 61
ring topology 138, 145
rate limit counting 179
ring-based topologies 70
rate limited 193
RIP v2 147
rate limiting 155, 162, 173–177, 180, 189, 191, 199
risk 33
rate limiting ARP packets 193
RJ45 112, 291
rate limiting bucket 192
rogue routers 227
rate limiting policy 173, 177
root guard 51, 61, 223–224
rate shaper 175
round robin 172
rate shaping 155, 173, 189, 199
round-robin fashion 172
rate-limited direction 176
route 146
rates 173
Route Distinguisher 140
RD 140
route table 229
recipient 119
routers 33, 149
RED 159
routes 279
Redbooks publications website 298
Routing 120, 275
Contact us xiv
routing conflicts 139
redundancy 37, 39, 43, 55
routing decisions 123
redundant design 288
Routing Information Base 49, 60
redundant fabric architecture 36
routing protocol updates 147
redundant power supplies 86, 152
routing protocols 69
refresh cycles 222
routing technology 71
region increments 113
RPF logging 191
reliability 70
RSTP 49, 59, 137, 144
Remote Access 231
RSVP 159, 161
Remote Fault Notification 102
RSVP-TE 159
Index 309
rule 229 Simple Network Management Protocol 233
single fabric 285
single fabric solution 290
S single tiered 289
safety 136
single-rate three-color marker 179–180
same fabric 290
single-token 175
scalability 70, 273, 289
SLAs 70
scalable 81
Slots 44, 56, 89–90
scalable family 42, 54
slow link 160
scheduling 158
slow recovery time 144
scheduling traffic 189
smaller HPC 287
seamless cluster 287
smaller packets 160
seat allocation 151
SNMP 233
second generation of HPC 283
snooping 51, 62, 77, 101, 226
Secure Console Access 220
socket 156
secure MAC addresses 222
soft QoS 161
Secure Shell 83
SONET 156
security 100, 136, 220
Source Addr 115
security environment 100
Source Address 230
security options 219–220
source guard 227
security settings 221
source MAC address learning 191
security suite 83
source stream 189
security zones 50
Source-Specific 51, 62
segment 121
SP 171–172
segmented 160
spanning tree environment 223
sensor readings 72
Spanning Tree Protocol 51, 61, 143, 150, 175
serial 113
Spanning Tree Root Guard 100
serial port connectivity 220
Sparse 51, 62
serialization delay 160
spatial 42
server access switch 122
Spatial multicast support 42, 54
service improvement 33
speed 160
service level 161
spoke 112
Service management 70
spoofing 114, 227
Service OAM Framework and Specifications 70
srTCM 179–180
Service VLAN tag 185
srTCM conformance 179
serviceability 39, 54
s-series 31, 123
services 49, 59, 75
SSH 232
session number 156
SSL 233
setting rate shaping 190
SSL protocol 233
sFlow 167, 191
stability features 102
sFlow packet sampling 83
stack 273
sFlow packets 167
stack topology 168
sFlow traffic sampling 101
stackable 167, 270
SFM 39
stackable topology 168
SFMs 38
stacked configurations 98
SFP 291
stacking 167, 270–271
shaper 175
stacking links 167
shaping 160–161, 189
stacking mode 168, 171
shielded 113
stacking protocol 167
stacking technology 98 Tagged ports 50, 61
S-TAG 185–186 tags 185
standard ACLs 229 tail drop 160
standard protocol 116 taildrop policy 40
standards 49, 59, 66, 75, 93, 107 TCP 191
standby management module 40, 54 TCP rate control 159
standby mode 149 TCP/IP 111, 114
state transitions 215 telecommunications feeds 143
static 146 Telnet 232
static environment 221 thermal 42, 55, 85
static list 222 thermal parameters 64, 72, 85, 103
static routing 147 Threat Detection 101
Statistical Average-Q-Size 188 threshold 189, 199
statistical preference 161 tiered approach 291
steal 228 tiered architecture 290
still 119 Tiered HPC 289
storage fabric 283, 285, 291 tiered HPC 2.0 288
Store & Forward 44, 56, 73, 105 tiers 147, 268
STP 51, 61, 137, 143, 175 timed retries 168
STP Root Guard 223 token 175
Streaming multimedia 158 Token bucket 159
strict priority 40, 171–172, 199 Token Ring 114
Strict priority-based scheduling 189 Top-of-Rack 33
striping 39 Topology Groups 138
subnet mask 116 TOR 33
super aggregate VLANs 186–187 TOS 41, 159
supercomputer 282 ToS 167
superior BPDU packets 223 ToS field 167
super-user password 221 TPD 177–178
supplicant 223 T-piece 111
switch fabric modules 38 Track Port 144
switch queues 210 track priority setting 149
switches 33 tracked link 149
switching 118 traditional design 276
switchover 40, 54 Traffic 159
SYS 86 traffic 148, 156, 159, 169, 172, 225
SYS power supply 86 traffic characteristics 162
system balances 172 traffic class 178, 194
system power 86 traffic classes 163, 169
traffic congestion 188
Traffic Engineering 159
T traffic flow 149, 161, 175
table of MAC addresses 119
traffic intercept 100
TACACS 232
traffic management 71
TACACS+ 232
traffic manager 187
tag value 186
Traffic Monitoring 100
tagged 165, 193, 196–197
traffic policed 192, 201
tagged frame 181
traffic policers 41
tagged packets 191
traffic policies 177, 180
Index 311
traffic policing 191, 193 V
traffic policing policy 193, 201 vampire tap 111
traffic policy 178 video services 42
Traffic policy definition 177 video surveillance 95
Traffic policy name 177 Video teleconferencing 158
traffic scheduling 189, 199 VIP traffic 81
traffic types 194 virtual group 149, 225
traffic-shaping 158 Virtual Leased Line 36, 193
transaction rate 162 Virtual Local Area Network 137
transceivers 74, 92 virtual MAC address 148
transit 114 Virtual Output Queuing 40
triple play 95 Virtual Private LAN 36
trTCM 179–180 Virtual Private LAN Service 49
trunk group 50, 61 virtual private network 140
trunks 50, 61 Virtual Router and Forwarding 138
trust criteria 165 virtual router group 148
trust level 163, 165 Virtual Router Redundancy Protocol 147
trusted port 227 Virtual Routing and Forwarding 52, 140
Tunnelling 139 Virtual Switch Redundancy Protocol 49, 51, 59, 61,
twisted pairs 112 137, 144, 225
twisting 112 virtual switching 29
two-rate three-color marker 179–180 virtualization 138
Type of Service 159, 167 virtualization software 221
VLAN 137, 168, 183, 224
VLAN circumvention 221
U
UDLD 102 VLAN group based 191
UDP 191 VLAN group-based policies 191
unauthorized traffic 119 VLAN Security 224
underground and 275 VLAN tag 168
unicast address 148 VLAN tagging 51, 61
unicast protocols 71 VLAN tags 224
unicast traffic flooding 231 VLAN-based 191
Unidirectional Link Detection 102 vlan-cpu-protection 194
uniform traffic flow 189 VLANs 51, 61, 275
uninterruptible power supply 143 VLL 49, 193
unique MAC address 221 VLLs 36
untagged 193, 196–197 vNIC 29
untagged interface 186 Voice over IP 49, 60
Untagged ports 224 VoIP 49, 60, 95, 136, 156, 158, 221, 225, 273
uplink 273 VoIP data 216
uplinks 269 VoIP packet 160
UPS 143 VoIP stream 216
uptime 70, 278 VoIP traffic statistic 211
user access switch 122 voltage 89
User data packets 167 VPLS 41, 49
user ports 273 VPLSes 36
VPN 50, 140
VRF 138, 140, 279
VRID 225
VRRP 147–148
VRRP Authentication 227
VRRP Extension 148
VRRP group 227
VRRP hello packet 148
VRRP master 148
VRRP priority 148
VRRP session 148
VRRP virtual MAC address 148
VRRPE 144, 147–149
VRRPE interface 149
VSRP 49, 51, 59, 61, 137, 144–145, 150, 225
VSRP switches 144
VSWITCH 29
vulnerable tier 146
W
WebTV 95
weight 172
weight based bandwidth 199
weight distribution 199
weight-based scheduling 199
weight-based traffic scheduling 199
Weighted 40
Weighted fair queuing 159
weighted fair queuing 40
weighted queuing 171
weighted random early detection 159–160
weighted random early discard 187
weighted round robin 159, 170
weights 170–171
WFQ 159, 189, 199
WFQ destination-based 199
WFQ weight-based traffic scheduling 189
wired networks 146
wireless access 95
wireless networks 146
wire-speed 292
Wq 188
WRED 40, 159–160, 187–188
WRED algorithm 188
WRED calculations 188
WRR 81, 159, 170, 172
X
X.25 159
xbar 84
XFP 291
Index 313
IBM b-type Data Center Networking: Design and Best Practices Introduction
(0.5” spine)
0.475”<->0.875”
250 <-> 459 pages
Back cover ®
IBM b-type
Data Center Networking
Design and Best Practices Introduction ®
Learn about the As organizations drive to transform and virtualize their IT

products and infrastructures to reduce costs, and manage risk, networking is INTERNATIONAL
features of the IBM pivotal to success. Optimizing network performance, availability, TECHNICAL
b-type portfolio
adaptability, security, and cost is essential to achieving the maximum SUPPORT
benefit from your infrastructure. ORGANIZATION
In this IBM Redbooks publication, we address the requirements:
Discover how to best
approach a network 򐂰 Expertise to plan and design networks with holistic consideration
design of servers, storage, application performance, and manageability
BUILDING TECHNICAL
򐂰 Networking solutions that enable investment protection with INFORMATION BASED ON
Read about the performance and cost options that match your environment PRACTICAL EXPERIENCE
unique features 򐂰 Technology and expertise to design and implement and manage
network security and resiliency IBM Redbooks are developed by
򐂰 Robust network management software for integrated, simplified the IBM International Technical
management that lowers operating costs of complex networks Support Organization. Experts
from IBM, Customers and
IBM and Brocade have entered into an agreement to provide Partners from around the world
expanded network technology choices with the new IBM b-type create timely technical
Ethernet Switches and Routers, to provide an integrated end-to-end information based on realistic
resiliency and security framework. scenarios. Specific
recommendations are provided
Combined with the IBM vast data center design experience and the to help you implement IT
Brocade networking expertise, this portfolio represents the ideal solutions more effectively in
convergence of strength and intelligence. For organizations striving your environment.
to transform and virtualize their IT infrastructure, such a combination
can help you reduce costs, manage risks, and prepare for the future.
This book is meant to be used along with IBM b-type Data Center
Networking: Product Introduction and Initial Setup, SG24-7785. For more information:
ibm.com/redbooks
SG24-7786-00 ISBN 073843440X

SG 247786

Uploaded by

Copyright:

Available Formats

SG 247786

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SG 247786

Uploaded by

Copyright:

Available Formats

Front cover

Learn about the products and features

Discover how to best approach a

Read about the unique

IBM b-type Data Center Networking:

First Edition (June 2010)

© Copyright International Business Machines Corporation 2010. All rights reserved.

Chapter 1. The role of the data center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

© Copyright IBM Corp. 2010. All rights reserved. iii

Chapter 3. Switching and routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

Chapter 4. Market segments addressed by the IBM Ethernet products 125

Chapter 5. IBM Ethernet in the green data center . . . . . . . . . . . . . . . . . . 133

Chapter 6. Network availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

Chapter 7. Quality of Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

Chapter 8. Voice over IP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

Chapter 9. Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

Chapter 10. Stacking the g-series IBM Ethernet Switch . . . . . . . . . . . . . 235

Chapter 11. Network design for the data center. . . . . . . . . . . . . . . . . . . . 249

Chapter 12. Network design for the enterprise. . . . . . . . . . . . . . . . . . . . . 265

Chapter 13. Network design for high performance computing. . . . . . . . 281

Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297

© Copyright IBM Corp. 2010. All rights reserved. ix

Dynamic Infrastructure® IBM® System Storage™

The following terms are trademarks of other companies:

As organizations drive to transform and virtualize their IT infrastructures to

In this IBM® Redbooks® publication, we address the requirements:

We realize that the scope of this subject is enormous, so we have concentrated

© Copyright IBM Corp. 2010. All rights reserved. xi

Ivo Gomilsek is a Certified IT Architect working in IBM Austria as a solution

Steven Tong is a corporate Systems Engineer for Brocade focused on

Now you can become a published author, too!

We want our books to be as helpful as possible. Send us your comments about

Stay connected to IBM Redbooks

Chapter 1. The role of the data center

© Copyright IBM Corp. 2010. All rights reserved. 1

Figure 1-1 Evolution of the data center

If we go back in time to the mainframe-centric computing area, it was out of

Along with these technology advancements it has placed a tremendous strain on

Chapter 1. The role of the data center 3

Figure 1-2 Data center network architecture

The major architectural tiers are as follows:

1.2.1 Edge Services

This component requires some or all of the following features:

Chapter 1. The role of the data center 5

1.2.2 Core Network tier

For enterprise-grade, server-intensive data centers, the multi-tier model is

Several other elements can impact the architecture of this component:

Leveraged throughout is the Network Services tier’s ability to extend a shared

1.2.4 Applications and Data Services tier

Chapter 1. The role of the data center 7

The data center application connection options (which can be combined as

A data center typically contains multiple SANs, each serving a different

Fibre Channel SANs

Chapter 1. The role of the data center 9

FC traffic uses DWDM for metro-to-regional distances and specialized FCIP

1.3 Networking requirements

1.3.3 Capacity estimates and planning

1.3.4 Configuration management

It also includes activities associated with change management policies and