SG 247786
SG 247786
SG 247786
IBM b-type
Data Center Networking
Design and Best Practices Introduction
Jon Tate
Andrew Bernoth
Ivo Gomilsek
Peter Mescher
Steven Tong
ibm.com/redbooks
International Technical Support Organization
June 2010
SG24-7786-00
Note: Before using this information and the product it supports, read the information in
“Notices” on page ix.
This edition applies to the supported products in the IBM b-type portfolio in September 2009.
Note: This book is based on a pre-GA version of a product and might not apply when the
product becomes generally available. Consult the product documentation or follow-on versions
of this book for more current information.
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
The team who wrote this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . xiii
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
iv IBM b-type Data Center Networking: Design and Best Practices Introduction
5.3 Power utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.3.1 VLAN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
5.3.2 VSRP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
5.3.3 MRP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.4 Device reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.4.1 Common routing tricks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.4.2 VRF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Contents v
7.4.3 FastIron traffic policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
7.4.4 NetIron m-series QoS implementation . . . . . . . . . . . . . . . . . . . . . . 181
7.4.5 NetIron m-series traffic policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
7.4.6 NetIron c-series QoS implementation . . . . . . . . . . . . . . . . . . . . . . . 194
7.4.7 NetIron c-series traffic policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
vi IBM b-type Data Center Networking: Design and Best Practices Introduction
9.6.1 System security with ACLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
9.6.2 Remote access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
9.6.3 Telnet / SSH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
9.6.4 HTTP / SSL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
9.6.5 SNMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
Contents vii
11.5 Other tiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
11.5.1 DCN core tier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
11.5.2 DCN connectivity tier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
viii IBM b-type Data Center Networking: Design and Best Practices Introduction
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area.
Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM
product, program, or service may be used. Any functionally equivalent product, program, or service that
does not infringe any IBM intellectual property right may be used instead. However, it is the user's
responsibility to evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document.
The furnishing of this document does not give you any license to these patents. You can send license
inquiries, in writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION
PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer
of express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may
make improvements and/or changes in the product(s) and/or the program(s) described in this publication at
any time without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any
manner serve as an endorsement of those websites. The materials at those websites are not part of the
materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without
incurring any obligation to you.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm
the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on
the capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the
sample programs are written. These examples have not been thoroughly tested under all conditions. IBM,
therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.
The following terms are trademarks of the International Business Machines Corporation in the United States,
other countries, or both:
InfiniBand, and the InfiniBand design marks are trademarks and/or service marks of the InfiniBand Trade
Association.
Microsoft, and the Windows logo are trademarks of Microsoft Corporation in the United States, other
countries, or both.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
x IBM b-type Data Center Networking: Design and Best Practices Introduction
Preface
IBM and Brocade have entered into an agreement to provide expanded network
technology choices with the new IBM b-type Ethernet Switches and Routers, to
provide an integrated end-to-end resiliency and security framework.
Combined with the vast data center design experience of IBM and the
networking expertise of Brocade, this portfolio represents the ideal convergence
of strength and intelligence. For organizations striving to transform and virtualize
their IT infrastructure, such a combination can help you reduce costs, manage
risks, and prepare for the future.
In this book, we introduce the products and the highlights of the IBM b-type
portfolio from a viewpoint of design and suggested practices.
This book is meant to be used in conjunction with: IBM b-type Data Center
Networking: Product Introduction and Initial Setup, SG24-7785
Be sure to let us know of any additions you want to see in this book, because we
always welcome fresh ideas.
Jon Tate is a Project Manager for IBM System Storage™ Networking and
Virtualization Solutions at the International Technical Support Organization, San
Jose Center. Before joining the ITSO in 1999, he worked in the IBM Technical
Support Center, providing Level 2 support for IBM storage products. Jon has 25
years of experience in storage software and management, services, and support,
and is both an IBM Certified IT Specialist and an IBM SAN Certified Specialist.
He is also the UK Chairman of the Storage Networking Industry Association.
Andrew Bernoth is the IBM Network Services Lead Architect for the Asia Pacific
region based out of Melbourne, Australia. Prior to this, Andrew worked on global
architecture and security standards for the IBM services extranet environment.
He has 20 years of experience in computing, over 15 years of which has been
focused on network and security. Andrew holds GSEC and CISSP security
certifications as well as IBM Certified IT Architect. His work on a security
checking program for communication between networks was awarded a patent in
2008.
Peter Mescher is a Product Engineer on the SAN Central team within the IBM
Systems and Technology Group in Research Triangle Park, North Carolina.
He has seven years of experience in SAN Problem Determination and SAN
Architecture. Before joining SAN Central, he performed Level 2 support for
network routing products. He is a co-author of the SNIA Level 3 FC Specialist
Exam. This is his sixth Redbooks publication.
xii IBM b-type Data Center Networking: Design and Best Practices Introduction
Thanks to the following people for their contributions to this project:
Brian Steffler
Marcus Thordal
Kamron Hejazi
Mike Saulter
Jim Baldyga
Brocade
Holger Mueller
Pete Danforth
Casimer DeCusatis
Doris Konieczny
Aneel Lakhani
Mark Lewis
Tom Parker
Steve Simon
IBM
Emma Jacobs
International Technical Support Organization, San Jose Center
Find out more about the residency program, browse the residency index, and
apply online at:
ibm.com/redbooks/residencies.html
Preface xiii
Comments welcome
Your comments are important to us!
xiv IBM b-type Data Center Networking: Design and Best Practices Introduction
1
The network is becoming the new backplane. In this chapter we describe the
history of the data center network and examine the multiple forces that are
driving a transformation of the data center network.
2 IBM b-type Data Center Networking: Design and Best Practices Introduction
The development of standardized industry hardware and different technology
innovations, such as client-server technology or e-business on demand, has led
us to the environment that we find ourselves in today, a distributed infrastructure
that is difficult to re-provision. Most services and applications are running on
dedicated environments, which means that multiple instances of raised floor,
wiring, LAN equipment, and servers are deployed in the data centers. The
positive effect of this has been an explosion of applications and access,
particularly through the Internet. However, all of this has caused a fragmentation
of the enterprise and the computing capacity associated with it. We now have
islands of computing resources and puddles of information, which become very
costly and inefficient to manage.
We have now started to enter into the next phase driven by technology
innovations such as virtualization, Web 2.0, Software as a Service (SaaS),
Service Oriented Architecture (SOA), Cloud Computing, converged networks
and so on. This is leading us into a new enterprise model, that will again be
characterized by a re-centralization and high sharing, but will be built with an
extremely flexible infrastructure. One that can be quickly re-provisioned to
respond to changing business needs and market demands.
On one side, we have a set of daily operational challenges around cost, service
delivery, business resilience, security, and green IT initiatives that bring many IT
data centers to a breaking point. On the other side, there are business and
technology innovations that can drive competitive advantage. Never before has
the Enterprise Data Center faced such a “perfect storm” of forces that drives the
need for true data center transformation.
4 IBM b-type Data Center Networking: Design and Best Practices Introduction
Core Network:
This component is where the whole switching infrastructure resides and
connects all data center networks within and across data centers
Network Services:
This component provides WAN acceleration, intrusion prevention, firewall
services, and other network services.
Applications and Data Services:
This component connects to the Core Network and hosts all of the servers,
databases, and storage.
Other services supporting these four basic blocks are deployment tools and
management services. These encompass the entire data center network. They
must be considered in the deployment of any basic component.
In the following sections, we explore the different data center network tiers in
greater detail.
Depending on the requirements, there are pros and cons for either a tier-2 or
tier-3 data center design.
6 IBM b-type Data Center Networking: Design and Best Practices Introduction
1.2.3 Network Services tier
Network Services are closely aligned to the network protocols that support data
center applications. They are generally divided into two categories:
Security services, such as firewalls and intrusion prevention
Application front-end services, such as server load balancing and content
distribution.
Load balancing services are important parts of data center architectures.
These services can be divided into two categories:
– Local server load balancers distribute content requests (from Layer 4 to
Layer 7) sourced by remote clients across several systems within a single
data center.
– Global site selectors optimize multi-site deployments that involve globally
distributed data centers. These selectors are the cornerstone of multi-site
disaster recovery plans.
The Network Services tier must extend to any of the server networks hosted in
the data center, and apply a network-specific policy and set of configurations to
appropriately interact with the traffic in that particular network section. For
example, using a security service, such as traffic SYN checking/sequence
number checking, might only be required for servers available to the outside
world. Therefore, the architecture must support the application of these features
only to those systems or networks. Most importantly, key characteristics are
enabled by direct logical attachment to the data center’s network core.
8 IBM b-type Data Center Networking: Design and Best Practices Introduction
Virtual IP:
Put in place a load balancer on the application front end with a unique IP
address that will handle requests to specific servers.
A Storage Area Network (SAN) connects servers and storage devices across a
packet-switched network. SANs allow arbitrary block- level access from servers
to storage devices, and storage devices to each other. Multiple servers can
therefore share storage for clustering and HA applications. In addition, the
storage devices themselves can implement data protection services (such as
synchronous data replication, asynchronous data replication or data snapshots)
by directly moving data to another storage device. SANs also provide a set of
configuration, directory, discovery and notification services to attached devices.
iSCSI SANs
An iSCSI SAN can be based upon any network supporting the IP protocols.
In practice, this means iSCSI SANs are built from Ethernet switches. In principle,
because iSCSI is based upon TCP/IP, it can run on any switching infrastructure.
However, in practice, depending upon the features of the Ethernet switches, the
performance characteristics of TCP/IP in the face of dropped frames can limit
iSCSI deployments to low-performance SANs. In addition, most iSCSI
deployments presently only use 1 Gigabit Ethernet with software drivers, and the
resulting performance does not compare favorably to FC at 2 Gbps, 4 Gbps or
8 Gbps with an offload HBA. However, iSCSI SANs can be considerably less
expensive than FC SANs. The Internet Storage Name Service (iSNS) server
provides all fabric services in an iSCSI SAN.
The iSCSI traffic can directly traverse the WAN connection without requiring a
gateway, but iSCSI implementations do not generally provide sufficient buffering
to fully utilize high-speed connections. The iSCSI implementations do not contain
compression or other WAN optimization features. Therefore, iSCSI WAN traffic
can often benefit from a WAN acceleration device. The iSCSI traffic also can
benefit from a data security gateway providing IPSec and VPN tunnels.
1.3.1 Availability
Availability means that data or information is accessible and usable upon
demand by an authorized person. Two of the major factors that affect availability
are redundancy and convergence.
Redundancy
Redundant data centers involve complex solution sets depending on a client’s
requirements for backup and recovery, resilience, and disaster recovery. Most
inter-data center connectivity involves private optical networking solutions for
network and storage.
Convergence
Convergence is the time required for a redundant network to recover from a
failure and resume traffic forwarding. Data center environments typically include
strict uptime requirements and therefore need fast convergence.
10 IBM b-type Data Center Networking: Design and Best Practices Introduction
1.3.2 Backup and recovery
Although the ability to recover from a server or storage device failure is beyond
the scope of network architecture requirements, potential failures such as the
failure of a server network interface card (NIC) will be taken into consideration. If
the server has a redundant NIC, then the network must be capable of redirecting
traffic to the secondary network as needed.
As for network devices, the backup and recovery ability typically requires the use
of diverse routes and redundant power supplies and modules. It also requires
defined processes and procedures for ensuring that current backups exist in
case of firmware and configuration failures.
Another possible solution uses three data centers. The first is the active data
center which is synchronized with the second or standby data center. The third
site becomes a back-up site to which data is copied asynchronously according to
specific policies. Geographically Dispersed Parallel Sysplex™ (GDPS®) and
related technologies are used.
1.3.6 Environment
There are environmental factors such as the availability of power or air
conditioning and maximum floor loading that influence the average data center
today. Network architecture and architects must take these factors into
consideration.
12 IBM b-type Data Center Networking: Design and Best Practices Introduction
1.3.9 Performance
Network performance is usually defined by the following terms:
Capacity:
Capacity refers to the amount of data that can be carried on the network at
any point of time. A network architecture must take into account anticipated
minimum, average, and peak utilization of traffic patterns.
Throughput:
Throughput is related to capacity, but focuses on the speed of data transfer
between session pairs versus the utilization of links.
Delay:
Delay, also known as “lag” or “latency” is defined as a measurement of
end-to-end propagation times. This requirement is primarily related to
isochronous traffic, such as voice and video services.
Jitter:
Jitter is the variation in the time between packets arriving, caused by network
congestion, timing drift, or route changes. It is most typically associated with
telephony and video-based traffic.
Quality of Service:
Quality of Service (QoS) requirements include the separation of traffic into
predefined priorities. QoS helps to arbitrate temporary resource contention. It
also provides an adequate service level for business-critical administrative
functions, as well as for delay-sensitive applications such as voice, video, and
high-volume research applications.
1.3.10 Reliability
Reliability is the time a network infrastructure is available to carry traffic. Because
today’s data center houses critical applications and services for the enterprise,
outages are becoming less and less tolerable.
1.3.11 Scalability
In networking terms scalability is the ability of the network to grow incrementally
in a controlled manner.
For enterprises that are constantly adding new servers and sites, architects
might want to specify something more flexible, such as a modular-based system.
Constraints that might affect scalability, such as defining spanning trees across
multiple switching domains or additional IP addressing segments to
accommodate the delineation between various server functions, must be
considered.
1.3.12 Security
Security in a network is the definitions and levels of permission needed to access
devices, services, or data within the network. We consider the following
components of a security system:
Security policy:
Security policies define how, where, and when a network can be accessed.
An enterprise will normally develop security policies related to networking as
a requirement. The policies will also include the management of logging,
monitoring, and auditing events and records.
14 IBM b-type Data Center Networking: Design and Best Practices Introduction
Network segmentation:
Network segmentation divides a network into multiple zones. Common zones
include various degrees of trusted and semi-trusted regions of the network.
Firewalls and inter-zone connectivity:
Security zones are typically connected with some form of security boundary,
such as firewalls or access-control lists. This might take the form of either
physical or logical segmentation, or a combination of both.
Access controls:
Access controls are used to secure network access. All access to network
devices will be by user-specific login credentials; there must be no
anonymous or generic logins.
Security monitoring:
To secure a data center network, a variety of mechanisms are available
including Intrusion Detection System (IDS), Intrusion Protection System
(IPS), content scanners, and so on. The depth and breadth of monitoring will
depend upon both the customer’s requirements as well as legal and
regulatory compliance mandates.
External regulations:
External regulations will often play a role in network architecture and design
due to compliance policies such as Sarbanes-Oxley (SOX), Payment Card
Industry Data Security Standards (PCI DSS), The Health Insurance
Portability and Accountability Act (HIPAA); and a variety of other industry and
non-industry-specific regulatory compliance requirements.
1.3.13 Serviceability
Serviceability refers to the ability to service the equipment. Several factors can
influence serviceability, such as modular or fixed configurations or requirements
of regular maintenance.
1.3.15 Standards
Network standards are key to the smooth and ongoing viability of any network
infrastructure:
Hardware configuration standards
Physical infrastructure standards
Network security standards
Network services standards
Infrastructure naming standards
Port assignment standards
Server attachment standards
Wireless LAN standards
IP addressing standards
Design and documentation standards
Network management standards
Network performance measurement and reporting standards
Usage metering and billing standards
16 IBM b-type Data Center Networking: Design and Best Practices Introduction
Configuration management:
This process facilitates the discovery and maintenance of device software
configurations.
Performance management:
This process provides for monitoring and reporting network traffic levels and
device utilization.
Incident management:
This process addresses the goal of incident management, which is to recover
standard service operation as quickly as possible. The incident management
process is used by many functional groups to manage an individual incident.
The process includes minimizing the impact of incidents affecting the
availability or performance, which is accomplished through analysis, tracking,
and solving of incidents that have impact on managed IT resources.
Problem management:
This process includes identifying problems through analysis of incidents that
have the same symptoms, finding the root cause and fixing it, in order to
prevent malfunction reoccurrence.
User and accounting management:
This process is responsible for ensuring that only those authorized can
access the needed resources.
Security management:
This process provides secure connections to managed devices and
management of security provisions in device configurations.
Global integration is changing the corporate model and the nature of work itself.
The problem is that most IT infrastructures were not built to support the explosive
growth in computing capacity and information that we see today.
IBM has developed a strategy known as the IBM Dynamic Infrastructure®, which
is an evolutionary model for efficient IT delivery that helps to drive business
innovation. This approach allows organizations to be better positioned to adopt
integrated new technologies, such as virtualization and Cloud computing, to help
deliver dynamic and seamless access to IT services and resources. As a result,
IT departments will spend less time fixing IT problems and more time solving real
business challenges.
IBM has taken a holistic approach to the transformation of IT and developed the
Dynamic Infrastructure, which is a vision and strategy for the future of enterprise
computing. The Dynamic Infrastructure enables you to leverage today’s best
practices and technologies to better manage costs, improve operational
performance and resiliency, and quickly respond to business needs. Its goal is to
deliver the following benefits:
Improved IT efficiency:
Dynamic Infrastructure helps transcend traditional operational issues; and
achieve new levels of efficiency, flexibility, and responsiveness. Virtualization
can uncouple applications and business services from the underlying IT
resources to improve portability. It also exploits highly optimized systems and
networks to improve efficiency and reduce overall cost.
18 IBM b-type Data Center Networking: Design and Best Practices Introduction
Rapid service deployment:
The ability to deliver quality service is critical to businesses of all sizes.
Service management enables visibility, control, and automation to deliver
quality service at any scale. Maintaining user satisfaction by ensuring cost
efficiency and return on investment depends upon the ability to see the
business (visibility), manage the business (control), and leverage automation
(automate) to drive efficiency and operational agility.
High responsiveness and business goal-driven infrastructure:
A highly efficient, shared infrastructure can help businesses respond quickly
to evolving demands. It creates opportunities to make sound business
decisions based on information obtained in real time. Alignment with a
service-oriented approach to IT delivery provides the framework to free up
resources from more traditional operational demands and to focus them on
real-time integration of transactions, information, and business analytics.
Now, equipped with a highly efficient, shared and Dynamic Infrastructure along
with the tools needed to free up resources from traditional operational demands,
IT can more efficiently respond to new business needs. As a result, organizations
can focus on innovation and aligning resources to broader strategic priorities.
Decisions can now be based on real-time information. Far from the “break/fix”
mentality gripping many data centers today, this new environment creates an
infrastructure that provides automated, process-driven service delivery and is
economical, integrated, agile, and responsive.
Energy efficiency
As IT grows, enterprises require greater power and cooling capacities. In fact,
energy costs related to server sprawl alone my rise from less than 10 percent to
30 percent of IT budgets in the coming years. 3These trends are forcing
technology organizations to become more energy efficient in order to control
costs while developing a flexible foundation from which to scale.
1
Virtualization 2.0: The Next Phase in Customer Adoption. Doc. 204904 DC, Dec. 2006
2
Jan. 2008, IDC
3
The data center power and cooling challenge. Gartner, Nov. 2007
20 IBM b-type Data Center Networking: Design and Best Practices Introduction
Harnessing new technologies
If you are spending most of the time in day-to-day operations, it is difficult to
evaluate and leverage new technologies available that can streamline your IT
operations and help keep a company competitive and profitable. Yet the rate of
technology adoption around us is moving at breakneck speed, and much of it is
disrupting the infrastructure status quo.
Ultimately, all of these new innovations need to play an important role in the
enterprise data center.
For example:
Google’s implementation of their MapReduce method is an effective way to
support Dynamic Infrastructures.
The delivery of standardized applications by the Internet using Cloud
Computing is bringing a new model to the market.
Today, the power of information, and the sharing of that information, rests firmly
in the hands of the end user while real-time data tracking and integration will
become the norm.
22 IBM b-type Data Center Networking: Design and Best Practices Introduction
IT departments require special virtualization software, firmware or a third-party
service that makes use of virtualization software or firmware in order to virtualize
some or all of a computing infrastructure’s resources. This software/firmware
component, called the hypervisor or the virtualization layer, as shown in
Figure 1-3, performs the mapping between virtual and physical resources. It is
what enables the various resources to be decoupled, then aggregated and
dispensed, irrespective of the underlaying hardware and, in some cases, the
software OS. The virtualization reduces complexity and management overhead
by creating large pools of like resources that are managed as server ensembles.
Mobility
SW SW SW
OS OS OS
Software
Virtualization
Optimized for ….
• Availability
Compute Memory Storage Network Compute Memory Storage Network • Performance
• Power
Historically, there has been a 1:1 ratio of server to application, This has left many
CPU cycles sitting unused much of the time, dedicated to a particular application
even when there are no requests in progress for that application. Now, with these
virtualization capabilities, we can run more than one OS and application or
service on a single physical server.
Business continuity:
Instead of requiring a 1:1 ratio of primary device to backup device in addition
to the 1:1 software-to-hardware ration described earlier, in the virtualized
environment, multiple servers can fail over to a set of backup servers. This
allows a many-to-one backup configuration ratio, which increases service
availability. An additional example is the decoupling of software applications,
operating systems and hardware platforms, where fewer redundant physical
devices are needed to serve primary machines.
24 IBM b-type Data Center Networking: Design and Best Practices Introduction
However, in situations where spike computing is needed, the workloads can
be redistributed onto multiple physical servers without service interruption as
shown in Figure 1-5.
High availability:
Virtual devices are completely isolated and decoupled from each other, as
though they were running on different hardware. With features like VMotion,
Live Partition Mobility (LPM) and Live Application Mobility (LAM), planned
outages for hardware/firmware maintenance and upgrades can be a thing of
the past.
Figure 1-6 shows how partitions can be relocated from one server to another
and moved back when the maintenance is complete during planned
maintenance. In other words, changes can be made in the production
environment without having to schedule downtime.
26 IBM b-type Data Center Networking: Design and Best Practices Introduction
The hypervisor layer is installed on top of the hardware. This hypervisor
manages access to hardware resources. The virtual servers (guest systems) are
then installed on top of the hypervisor, enabling each operating system of the
guest system to access necessary resources as needed; see Figure 1-7. This
solution is also sometimes referred to as “bare-metal” virtualization. This
approach allows the guest operating system to be unaware that it is operating in
a virtual environment and no modification of the operating system is required.
Logical NIC sharing allows each operating system to send packets to a single
physical NIC. Each operating system has its own IP address. The server
manager software generally has an additional IP address for configuration and
management. A requirement of this solution is that all guest OSs have to be in
the same Layer 2 domain (subnet) with each guest OS assigned an IP address
and a MAC address. Because the number of guest OSs that can live on one
platform was relatively small, the MAC address can be a modified version of the
NIC’s burned-in MAC address, and the IP addresses consisted of a small block of
addresses in the same IP subnet. One additional IP address was used for the
management console of the platform.
Features to manage QoS and load balancing to the physical NIC from the guest
OSs were limited. In addition as shown in Figure 1-8, any traffic from Guest OS1
destined to Guest OS2 traveled out to a connected switch and then returned
along the same physical connection. This had the potential of adding extra load
on the Ethernet connection.
28 IBM b-type Data Center Networking: Design and Best Practices Introduction
vNIC: Virtual switching
The advent of virtual NIC (vNIC) technology enables each server to have a
virtual NIC that connects to a virtual switch (VSWITCH). This approach allows
each operating system to exist in a separate Layer 2 domain. The connection
between the virtual switch and the physical NIC then becomes an 802.1q trunk.
The physical connection between the physical NIC and the physical switch is also
an 802.1q trunk, as shown in Figure 1-9.
A Layer 3 implementation of this feature allows traffic destined for a server that
resides on the same platform to be routed between VLANs totally within the host
platform, and avoids the traffic traversing the Ethernet connection both outbound
and inbound.
In the chapters that follow, we discuss how the IBM b-type networking portfolio
helps to meet the needs of the business.
30 IBM b-type Data Center Networking: Design and Best Practices Introduction
2
32 IBM b-type Data Center Networking: Design and Best Practices Introduction
2.1 Product overview
In the sections that follow, we describe the IBM Networking b-type family of IBM
networking products. For the most up to date information, see the website:
http://www-03.ibm.com/systems/networking/hardware/ethernet/b-type/
34 IBM b-type Data Center Networking: Design and Best Practices Introduction
IBM family Brocade IBM product IBM machine Brocade
name family name name type and name
model
With superior 1 GbE and 10 GbE port densities, the m-series switching routers
are well suited for large-scale high performance cluster computing. By combining
superior data capacity with ultra-low latency, m-series switching routers can
accelerate application performance in high performance computing clusters,
thereby increasing processing power and productivity.
36 IBM b-type Data Center Networking: Design and Best Practices Introduction
Comprehensive hardware redundancy with hitless management failover and
hitless software upgrades for Layer 2/Layer 3 with BGP and OSPF graceful
restart
All m-series models can only be installed in the rack. Non-rack installation is not
supported.
Operating system
All m-series systems run Brocade Multi-Service IronWare R4.0.00 or higher
operating system.
38 IBM b-type Data Center Networking: Design and Best Practices Introduction
Note: The Clos architecture is named after the ground breaking work by
researcher Charles Clos. The Clos architecture has been the subject of much
research over several years. A multi-stage Clos architecture has been
mathematically proven to be non-blocking. The resiliency of this architecture
makes it the ideal building block in the design of high availability, high
performance systems.
The Clos architecture uses data striping technology to ensure optimal utilization
of fabric interconnects. This mechanism always distributes the load equally
across all available links between the input and output interface modules. By
using fixed-size cells to transport packets across the switch fabric, the m-series
switching architecture ensures predictable performance with very low and
deterministic latency and jitter for any packet size. The presence of multiple
switching paths between the input and output interface modules also provides an
additional level of redundancy.
High availability
Both the hardware and software architecture of the m-series are designed to
ensure very high Mean Time Between Failures (MTBF) and low Mean Time To
Repair (MTTR). Cable management and module insertion on the same side of
the chassis allows ease of serviceability when a failed module needs to be
replaced or a new module needs to be inserted.
Each interface module maintains multiple, distinct priority queues to every output
port on the system. Packets are “pulled” by the outbound interface module when
the output port is ready to send a packet. Switch fabric messaging is used to
ensure that there is tight coupling between the two stages. This closed loop
feedback between the input and output stages ensures that no information is lost
between the two stages. The use of such “virtual output queues” maximizes the
efficiency of the system by storing packets on the input module until the output
port is ready to transmit the packet. In all, there are 512k virtual output queues on
the m-series chassis.
40 IBM b-type Data Center Networking: Design and Best Practices Introduction
The QoS subsystem on the m-series has extensive classification and packet
marking capabilities that can be configured:
Prioritization based on Layer 2 (802.1p), TOS, DSCP, or MPLS EXP bit of an
input packet
Mapping of packet/frame priority from ingress encapsulation to Egress
encapsulation
Remarking of a packet’s priority based on the result of the 2-rate, 3-color
policer
For security purposes, both input ACLs (Access Control Lists) and output ACLs
are supported by the system on every interface module. Up to 114,688 input ACL
entries and 131,072 output ACL entries for ACL rules can be applied to local
interfaces on every interface module.
Scalability
The m-series of routers is a highly scalable family of routers. Some examples of
its industry-leading scalability include:
Up to 4k VPLS/VLL instances and up to 256k VPLS MAC addresses
Support for 4094 VLANs and up to 1 million MAC addresses
512k IPv4 routes in hardware FIB
112k IPv6 routes in hardware FIB
2 million BGP routes
400 BGP/MPLS VPNs and up to 256k VPN routes
Investment protection
The m-series chassis uses a half slot design for interface modules. The divider
between two adjacent half slots can be removed in future to combine them into a
full slot. All chassis have 100 Gbps of full-duplex bandwidth per full slot. In
addition, with the ability to offer multiple services including dual-stack IPv4/IPv6
and MPLS services in hardware, the m-series offers excellent investment
protection.
Chassis type Modular 4-slot Modular 8-slot Modular 16-slot Modular 32-slot
chassis chassis chassis chassis
42 IBM b-type Data Center Networking: Design and Best Practices Introduction
Component IBM Ethernet IBM Ethernet IBM Ethernet IBM Ethernet
Router B04M Router B08M Router B16M Router B32M
H/W/D (cm) 17.68 x 44.32 x 31.01 x 44.32 x 62.15 x 44.32 x 146.58 x 44.32 x
57.15 57.15 64.77 61.21
Op. temperature 0 - 40 °C 0 - 40 °C 0 - 40 °C 0 - 40 °C
(°C)
Fan assemblies 1 1 3 10
All the fans are hot swappable and self adjusting based on sensor readings.
Power parameters
All m-series models provide redundant and removable power supplies with AC
power options. Power supplies can be exchanged between B08M and B16M
models but not between the B04M or B32M. None of the m-series models
provide the Power over Ethernet (PoE) option.
Number of power 1 2 4 4
supply bays
required for fully
loaded chassis
All m-series models have maximum 1GB RAM and FLASH of 32MB.
The number of slots, ports, and performance metrics are shown in Table 2-4.
Payload slots 4 8 16 32
Max. number of 3 3 4 8
slots for fabric
modules
Min. number of 2 2 3 8
switch fabric
modules required
for fully-loaded
chassis at line-rate
10/100/1000 48 48 48 48
copper ports per
module (MRJ21)
44 IBM b-type Data Center Networking: Design and Best Practices Introduction
Component IBM Ethernet IBM Ethernet IBM Ethernet IBM Ethernet
Router B04M Router B08M Router B16M Router B32M
10/100/1000 max. 192 384 768 1536
copper ports per
system (MRJ21)
10/100/1000 20 20 20 20
copper ports per
module (RJ-45)
POS-OC192 8 16 32 64
Fabric switching 960 Gbps 1.92 Tbps 3.84 Tbps 7.68 Tbps
capacity
Data switching 400 Gbps 800 Gbps 1.6 Tbps 3.2 Tbps
capacity
All slots have half-slot line module design. Slots have removable dividers to
support future full slot modules.
Interface modules
Table 2-5 shows which modules can be installed in the m-series chassis payload
slots.
10/100/1000MbE 48 MRJ21
10/100/1000MbE 20 RJ45
100/1000MbE 20 SFP
10 GbE 2 XFP
10 GbE 4 XFP
Interface types
Following are the available interface types:
10/100/1000 Mbps Ethernet port with MRJ21 connector
10/100/1000 Mbps Ethernet port with RJ45 connector
100/1000 Mbps Ethernet port with SFP connector
10 Gbps Ethernet port with XFP connector
OC-192 (STM-64) port with SFP connector
OC-12/48 (STM-4/STM-16) port with SFP connector
46 IBM b-type Data Center Networking: Design and Best Practices Introduction
Transceivers
In Table 2-6, Table 2-7, and Table 2-8, we show the available transceivers to be
used in interface modules.
Cables for MRJ21 must be ordered separately. One distributor of such cables is
Tyco Electronics:
http://www.ampnetconnect.com/brocade/
48 IBM b-type Data Center Networking: Design and Best Practices Introduction
Services, protocols, and standards
IBM m-series Ethernet Routers support these services, protocols, and
standards.
In addition to a rich set of Layer 2 and MPLS based capabilities, routers provide
creation of scalable resilient services with the Metro Ethernet Forum (MEF)
specifications for these features:
Ethernet Private Line (EPL)
Ethernet Virtual Private Line (EVPL)
Ethernet LAN (E-LAN)
Provided IPv4 and IPv6 multicast protocols help to make the most efficient use of
the network bandwidth.
Multi-VRF virtual routing allows enterprises to create multiple security zones and
simplified VPNs for different applications and business units while streamlining
overall network management.
For the whole list of supported standard and RFC compliance, see the website:
http://www-03.ibm.com/systems/networking/hardware/ethernet/b-type/
Link aggregation
Following are the main characteristics of link aggregation implementation on
m-series models:
802.3ad/LCAP support
256 servers trunks supported
Up to 32 ports per trunk group
Cross module trunking
Ports in the group do not need to be physically consecutive
Tagged ports support in trunk group
Compatibility with Cisco EtherChannel
Ports can be dynamically added or deleted from the group, except for the
primary port
50 IBM b-type Data Center Networking: Design and Best Practices Introduction
Layer 2 switching
Following are the main characteristics of Layer 2 switching implementation on
m-series models:
Up to 1 million MAC addresses per system
Up to 288000 MAC entries per network processor
9216 byte jumbo frames
L2 MAC filtering
MAC authentication
MAC port security
4090 VLANs
Port and protocol based VLANs
VLAN tagging:
– 802.1q
– Dual Mode
– SAV/Q-in-Q
STP (Spanning Tree Protocol) per VLAN
Compatibility with CISCO PVST (Per VLAN Spanning Tree)
STP fast forwarding (fast port, fast uplink), root guard, Bridge Protocol Data
Unit (BPDU) guard
Up to 128 Spanning-Tree instances
Rapid STP (802.1w compatible)
MSTP (Multiple Spanning Tree Protocol) (802.1s)
MRP Phase I&II
Q-in-Q/SAV support with unique tag-type per port
VSRP (Virtual Switch Redundancy Protocol)
Up to 255 topology groups
Hitless OS with 802.1ag (Connectivity Fault Management) and UDLD
(Uni-directional Link Detection)
Multicast
Following are the main characteristics of multicast implementation on m-series
models:
IGMP/IGMPv3 (Internet Group Management Protocol)
IGMP/IGMPv3 snooping
IGMP proxy
PIM (Protocol-Independent Multicast) proxy/snooping
Multicast routing PIM/DVMRP (Distance Vector Multicast Routing Protocol)
Up to 153600 multicast routes
IPv4 PIM modes - Sparse, Dense, Source-Specific
IPv6 PIM modes - Sparse, Source-Specific
Up to 4096 IPv4/IPv6 multicast cache entries
Doing this allows network designers to standardize on a single product family for
end-of-row, aggregation, and backbone switching, and is ideal for data center
and enterprise deployment. In addition, the switches, with their high-density and
compact design, are an ideal solution for High-Performance Computing (HPC)
environments and Internet Exchanges and Internet Service Providers (IXPs and
ISPs) where non-blocking, high-density Ethernet switches are needed.
52 IBM b-type Data Center Networking: Design and Best Practices Introduction
The r-series Ethernet Switches are available in the following model
configurations:
IBM Ethernet/IP Router B04R (4003-R04) - 4-slot switching router, 400 Gbps
data capacity, and up to 64 10 GbE and 128 1 GbE ports per system
IBM Ethernet/IP Router B08R (4003-R08) - 8-slot switching router, 800 Gbps
data capacity, and up to 128 10 GbE and 384 1 GbE ports per system
IBM Ethernet/IP Router B16R (4003-R16) - 16-slot switching router, 1.6 Tbps
data capacity, and up to 256 10 GbE and 768 1 GbE ports per system
All r-series models can only be installed in the rack. Non-rack installation is not
supported.
Operating system
All r-series systems run Brocade Multi-Service IronWare for BigIron RX R2.7.02
or higher operating system.
Exceptional density
The r-series is scalable to one of the industry leading densities for the occupied
space of 256 10 Gigabit Ethernet ports or 768 Gigabit Ethernet ports in a single
chassis.
The ability to handle the failure of not only an SFM but also elements within an
SFM ensures a robust, redundant system ideal for non-stop operation. The
overall system redundancy is further bolstered by redundancy in other active
system components such as power supplies, fans, and management modules.
The passive backplane on the m-series chassis increases the reliability of the
system.
Scalability
The m-series of routers is a highly scalable family of routers. Here are a few
examples of its industry-leading scalability:
Support for 4094 VLANs and up to 1 million MAC addresses
512k IPv4 routes in hardware FIB
65k IPv6 routes in hardware FIB
1 million BGP routes
Investment protection
The r-series chassis uses a half slot design for interface modules. The divider
between two adjacent half slots can be removed in future to combine them into a
full slot. All chassis have 100 Gbps of full-duplex bandwidth per full slot. In
addition, with the ability to offer multiple services including dual-stack IPv4/IPv6
in hardware.
54 IBM b-type Data Center Networking: Design and Best Practices Introduction
Physical and thermal parameters
The physical and thermal parameters are shown in Table 2-10.
Rack units 4 7 14
(RUs)
Op. temperature 0 - 40 °C 0 - 40 °C 0 - 40 °C
(°C)
Fan assemblies 1 1 3
All the fans are hot swappable and self adjusting based on sensor readings.
Power parameters
All r-series models provide redundant and removable power supplies with AC
power options. Power supplies can be exchanged between B08R and B16R
models but not between the B04R. None of the r-series models provide the
Power over Ethernet (PoE) option.
Number of power 1 2 4
supply bays required
for fully loaded
chassis
The number of slots, ports, and performance metrics are shown in Table 2-12.
Slots 9 13 22
Payload slots 4 8 16
Max. number of 3 3 4
slots for fabric
modules
Min. number of 2 2 3
switch fabric
modules required
for fully-loaded
chassis at line-rate
10/100/1000 48 48 48
copper ports per
module (MRJ21)
56 IBM b-type Data Center Networking: Design and Best Practices Introduction
Component IBM Ethernet IBM Ethernet IBM Ethernet
Switch B04R Switch B08R Switch B16R
10/100/1000 24 24 24
copper ports per
module (RJ-45)
All slots have a half-slot line module design, and the slots have removable
dividers to support future full slot modules.
With such a design, m-series models provide exceptional density of usable ports.
10/100/1000MbE 48 MRJ21
10/100/1000MbE 24 RJ45
100/1000MbE 24 SFP
10 GbE 16 SFP+
10 GbE 4 XFP
Interface types
Following are the available interface types:
10/100/1000 Mbps Ethernet port with MRJ21 connector
10/100/1000 Mbps Ethernet port with RJ45 connector
100/1000 Mbps Ethernet port with SFP connector
10 Gbps Ethernet port with SFP+ connector
10 Gbps Ethernet port with XFP connector
Transceivers
In Table 2-14, Table 2-15, and Table 2-16 we show the available transceivers to be
used in interface modules.
58 IBM b-type Data Center Networking: Design and Best Practices Introduction
Type Connector Speed Distance
100BASE-FX 1310 LC 100 Mbps Up to 2 km over
nm SFP optics multi-mode fiber
Cables for the MRJ21 must be ordered separately. One of the distributors of such
cables is Tyco Electronics:
http://www.ampnetconnect.com/brocade/
The IPv4 and IPv6 multicast protocols help to make the most efficient use of the
network bandwidth.
For the whole list of supported standard and RFC compliance, see the website:
http://www-03.ibm.com/systems/networking/hardware/ethernet/b-type/
60 IBM b-type Data Center Networking: Design and Best Practices Introduction
Link aggregation
Following are the main characteristics of link aggregation implementation on
m-series models:
802.3ad/LCAP support
31 servers trunks supported
Up to 8 ports per trunk group
Cross module trunking
Ports in the group do not need to be physically consecutive
Tagged ports support in trunk group
Compatibility with Cisco EtherChannel
Ports can be dynamically added or deleted from the group, except primary
port
Layer 2 switching
Following are the main characteristics of Layer 2 switching implementation on
r-series models:
9216 byte jumbo frames
L2 MAC filtering
MAC authentication
MAC port security
4090 VLANs
Port and protocol based VLANs
VLAN tagging:
– 802.1q
– Dual Mode
STP (Spanning Tree Protocol) per VLAN
Compatibility with CISCO PVST (Per VLAN Spanning Tree)
STP fast forwarding (fast port, fast uplink), root guard, BPDU (Bridge Protocol
Data Unit) guard
Up to 128 Spanning-Tree instances
Rapid STP (802.1w compatible)
MSTP (Multiple Spanning Tree Protocol) (802.1s)
MRP Phase I&II
VSRP (Virtual Switch Redundancy Protocol)
Up to 255 topology groups
Hitless OS with UDLD (Uni-directional Link Detection)
Multicast
Following are the main characteristics of multicast implementation on m-series
models:
IGMP/IGMPv3 (Internet Group Management Protocol)
In any deployment scenario, this switch is designed to save valuable rack space,
power, and cooling in the data center while delivering 24x7 service through its
high-availability design.
62 IBM b-type Data Center Networking: Design and Best Practices Introduction
Embedded per-port sFlow capabilities to support scalable hardware-based
traffic monitoring
Operating system
IBM x-series supports running Brocade IronWare OS R04.1.00 or higher.
When organizations upgrade a server's NICs to 10 GbE, they will only need to
replace the 1 GbE SFPs with 10 GbE SFP+ transceivers or direct attached
10 GbE SFP+ copper (Twinax) transceivers. This approach protects
Ethernet-based investments and streamlines migration to 10 GbE. The switch
also includes four 10/100/1000 MbE RJ45 ports for additional server connectivity
or separate management network connectivity.
The hot-swappable power supplies and fan assembly are designed to enable
organizations to replace components without service disruption. In addition,
several high-availability and fault-detection features are designed to help in
failover of critical data flows, enhancing overall system availability and reliability.
Organizations can use sFlow-based network monitoring and trending to
proactively monitor risk areas and optimize network resources to avoid many
network issues altogether.
Airflow Front/side-to-back
Power parameters
All x-series models provide redundant and removable power supplies with the AC
power option.
64 IBM b-type Data Center Networking: Design and Best Practices Introduction
The x-series models do not support Power over Ethernet (PoE).
The power supplies are auto-sensing and auto-switching, and provide up to 300
watts of total output power. The power supplies are hot swappable and can be
removed and replaced without powering down the system.
The number of ports and performance metrics are shown in Table 2-19.
Interface types
Following are the available interface types:
10 Gbps Ethernet port with SFP+ connector
10/100/1000 Mbps Ethernet port with RJ45 connector
66 IBM b-type Data Center Networking: Design and Best Practices Introduction
Quality of Service:
MAC address mapping to priority queue
ACL mapping to priority queue
ACL mapping to ToS/DSCP
Honoring DSCP and 802.1p
ACL mapping and marking of ToS/DSCP
DHCP assist
QoS queue management using weighted round robin (WRR), strict priority
(SP), and a combination of WRR and SP
Traffic management:
Inbound rate limiting per port
ACL-based inbound rate limiting and traffic policies
Outbound rate limiting per port and per queue
Broadcast, multicast and unknown unicast
The whole list of supported standards and RFC compliance can be found at:
http://www-03.ibm.com/systems/networking/hardware/ethernet/b-type/
68 IBM b-type Data Center Networking: Design and Best Practices Introduction
All six models are shown in Figure 2-6.
Operating system
All c-series systems run Brocade Multi-Service IronWare R3.8.00 or a higher
operating system.
Standardized services
The c-series is compliant with both the MEF 9 and MEF 14 specifications. Using
the c-series models, a provider can offer E-LINE, E-LAN and E-TREE services,
the standardized service names for point-to-point, multipoint, and rooted
multipoint services. These services can be offered using 802.1Q VLANs,
Provider Bridges or Provider Backbone Bridges.
Scalability
The c-series supports up to 128k MAC addresses per system. Support for
100/1000 Mbps SFP ports or 10/100/1000 Mbps RJ45 ports, with wire-speed
performance even at full load, ensures that abundant capacity is available on
user facing ports to accommodate a provider’s customers who wish to upgrade to
a higher bandwidth service. Additionally, the use of Link Aggregation Groups
(LAG) allows multiple links to be aggregated and offer even higher bandwidth
services at the user network interface (UNI) to the end-user.
Service management
Recently developed specifications such as IEEE 802.1ag-2007 (Connectivity
Fault Management) and MEF 17 (Service OAM Framework and Specifications)
allow the rapid and proactive identification and isolation of faults in the network or
service, thereby maintaining service uptime and maximizing the ability to meet
customer SLAs. The c-series supports all the capabilities in IEEE 802.1ag,
including Connectivity Check Messages, Loopback Message/Response and
LinkTrace Message/Response. It allows flexible association and definition of both
Maintenance End Points (MEP) and Maintenance Intermediate Points (MIP)
within a network. Fault management functions of MEF 17 Service OAM are also
supported.
Reliability
To provide a high level of reliability in the Carrier Ethernet service, the c-series
supports Foundry’s innovative Metro Ring Protocol (MRP/MRP-II), the ring
resiliency protocol of choice on several metro networks worldwide. Standard
Layer 2 protocols such as MSTP, RSTP and STP are also supported. Foundry’s
MRP/MRP-II allows Carrier Ethernet services to be delivered over ring-based
topologies, including overlapping rings that help optimize the use of fiber in metro
rings and provide fast recovery from node/link failures in milliseconds. Foundry
MRP/MRP-II can also be used within a PB/PBB network.
70 IBM b-type Data Center Networking: Design and Best Practices Introduction
Hard QoS
The c-series supports up to eight queues per port, each with a distinct priority
level. Advanced QoS capabilities such as the use of 2-rate, 3-color traffic
policers, Egress shaping, and priority remarking can also be applied to offer
deterministic “hard QoS” capability to customers of the service. The c-series can
be configured with Ingress and Egress bandwidth profiles per UNI that are in
compliance with the rigid traffic management specifications of MEF 10/MEF 14.
Multicast support
Multicast transport is a key enabler of next-generation services like IPTV. It is
also typically a major consumer of capacity in many multi-service networks. It is
therefore critical for next-generation edge switches to efficiently handle multicast
traffic. The c-series has comprehensive support for multicast switching and
routing by a variety of protocols, including PIM-SM, PIM-DM, PIM-SSM, IGMP
v2/v3, and other platform-independent multicast capabilities built in Multi-Service
IronWare.
Multicast traffic within c-series is handled with a very high degree of efficiency by
avoiding unnecessary replications and conserving bandwidth within the system.
By performing Egress interface based replication, switch performance and buffer
usage are optimally used within the system thereby maximizing network
performance when running multicast traffic.
Routing capabilities
Based on Multi-Service IronWare, the operating system software that
successfully powers thousands of m-series routers deployed around the world,
the c-series offers routing capabilities that are commonly required in edge
aggregation and other applications within a provider’s domain.
The powerful feature set of the c-series makes it an ideal candidate for
applications beyond Carrier Ethernet service delivery. For example, data center
networks and edge/aggregation routing within ISP networks often require a
compact Layer 3 switch with sufficient scalability in IPv4 routes. The
comprehensive support for IPv4 routing protocols, when complemented with
VRRP, and VRRP-E makes the c-series ideally suited for such applications.
Chassis type Fixed form factor Fixed form factor Fixed form factor
H/W/D (cm) 4.4 x 44.3 x 44.8 4.4 x 44.3 x 44.8 4.4 x 44.3 x 44.8
Max. power draw 170 W B48C (Cooper) 205 W B48C (Cooper) 255 W B50C (Cooper)
195 W B48C (Fiber) 245 W B48C (Fiber) 295 W B50C (Fiber)
Heat emission 580 - B48C (Cooper) 700 - B48C (Cooper) 870 - B50C (Cooper)
(BTU/hr) 666 - B48C (Fiber) 836 - B48C (Fiber) 1007 - B50C (Fiber)
All the fans are hot swappable and self adjusting based on sensor readings.
Power parameters
All c-series models provide redundant and removable power supplies with AC
power options. Power supplies can be exchanged between various c-series
models. None of the c-series models provide Power over Ethernet (PoE) option.
72 IBM b-type Data Center Networking: Design and Best Practices Introduction
Ports, memory, and performance
All c-series models provide a Store & Forward switching engine.
All c-series models have maximum 512 GB RAM and FLASH of 32 MB.
The number of ports and performance metrics are shown in Table 2-23.
Interface types
Following are the available interface types:
10/100/1000 Mbps Ethernet port with RJ45 connector
100/1000 Mbps Ethernet port with SFP connector
10 Gbps Ethernet port with XFP connector
Optional features
The following optional features are available for c-series models:
Full Layer 3 Premium Activation:
Enables OSPFv2, IS-IS, IGMPv1/v2/v3, PIM-DM/-SM/-SSM, MSDP, Anycast
RP, MPLS, VPLS, Multi-VRF, Ethernet Service Instance (ESI), IEEE 802.1ag
Connectivity Fault Management (CFM), 802.1ad (Provider Bridges), and
802.1ah (Provider Backbone Bridges)
74 IBM b-type Data Center Networking: Design and Best Practices Introduction
Metro Edge Premium Activation:
Enables OSPFv2, BGP-4, IS-IS, IGMPv1/v2/v3, PIM-DM/-SM/-SSM
Comprehensive IPv4 unicast routing support based on the rich feature set of
Multi-Service IronWare:
High performance, robust routing by Foundry Direct Routing (FDR) for
complete programming of Forwarding Information Base (FIB) in hardware
RIP, OSPF, IS-IS, BGP-4 support
Support for VRRP and VRRP-E
8-path Equal Cost Multipath (ECMP)
Up to 32k IPv4 unicast routes in FIB
Support for trunks (link aggregation groups) using either IEEE 802.3ad LACP or
static trunks:
Up to 12 links per trunk
Support for single link trunk
Advanced QoS:
Inbound and outbound two rate three color traffic policers with accounting
8 queues per port, each with a distinct priority level
Multiple queue servicing disciplines: Strict Priority, Weighted Fair Queuing,
and hybrid
Advanced remarking capabilities based on port, VLAN, PCP, DSCP, or IPv4
flow
Egress port and priority-based shaping
76 IBM b-type Data Center Networking: Design and Best Practices Introduction
The whole list of supported standards and RFC compliance can be found at:
http://www-03.ibm.com/systems/networking/hardware/ethernet/b-type/
IBM s-series Ethernet Switches have an extensive feature set, making them well
suited for real-time collaborative applications, IP telephony, IP video, e-learning
and wireless LANs to raise an organization’s productivity. With wire-speed
performance and ultra low latency, these systems are ideal for converged
network applications such as VoIP and video conferencing. Providing one of the
industry’s most scalable and resilient PoE designs, the 1 GbE PoE capable ports
support the IEEE 802.1AB LLDP and ANSI TIA1057 LLDP-MED standards,
enabling organizations to build advanced multi-vendor networks.
78 IBM b-type Data Center Networking: Design and Best Practices Introduction
Figure 2-7 IBM s-series Ethernet Switches
All s-series models can only be installed in the rack. Non-rack installation is not
supported.
Operating system
All s-series systems run Brocade IronWare R5.0.00 or higher operating system.
Configuration alternatives
The s-series family of switches is optimized for flexibility with upgradeability for
PoE, redundant management, switch fabric and power, and 10 Gigabit Ethernet.
Available in three chassis models, the scalable s-series family helps enterprises
and service providers reduce costs and gain operational benefits of a common
operating system, a shared interface, and common power supply modules.
Similarly, the power consumption of the line modules, switch modules, and
management modules does not impact the PoE power. Power consumption for
the system and PoE are calculated, provisioned, and managed independently of
one another. As more PoE devices are added to a switch, a simple power budget
calculation determines whether another PoE power supply needs to be added to
the switch.
The system power distribution and the PoE power distribution subsystems are
each designed for M+N load-sharing operation. This dual-distribution power
design simplifies the power configuration of the system while enhancing system
reliability. The chassis can be configured for a wide range of power
environments, including 110V/220V AC power, -48V DC power and mixed AC/DC
power configurations. To scale PoE configurations, PoE power supplies are
available in two ratings of 1250W and 2500W. When configured with four 2500W
PoE supplies, the s-series supports up to 384 10/100/1000 Mbps Class 3 PoE
ports and still maintains N+1 power redundancy. This resiliency is unmatched in
the industry.
80 IBM b-type Data Center Networking: Design and Best Practices Introduction
Intelligent and scalable Power over Ethernet
Power over Ethernet (PoE) is a key enabler of applications such as VoIP, IEEE
802.11 wireless LANs, and IP video. The s-series is Brocade’s third-generation
PoE-capable switch family and incorporates the latest advances in PoE
provisioning and system design, delivering scalable and intelligent PoE to the
enterprise. The PoE power distribution subsystem is independent of the system
power, eliminating system disruption in the event of PoE over-subscription or a
PoE power failure.
After being classified, the traffic is queued and scheduled for delivery. Three
configured queuing options provide the network administrator with flexible control
over how the system services the queues. Weighted Round Robin (WRR)
queuing applies user-configured weighting for servicing multiple queues,
ensuring that even low priority queues are not starved for bandwidth. With Strict
Priority (SP) queuing, queues are serviced in priority order ensuring that the
highest-priority traffic is serviced ahead of lower priority queues. Combined SP
and WRR queuing ensures that packets in the SP queue are serviced ahead of
the WRR queues. Combined queuing is often used in VIP networks where the
VIP traffic is assigned to the SP queue and data traffic to the WRR queues.
In addition, the switch management modules are available with integrated Gigabit
Ethernet or 10-Gigabit Ethernet ports. These modules provide cost-effective
system configurations supporting high-capacity connections to upstream
switches. The management modules utilize high-performance system processors
with high-capacity memory for scalable networking up to a routing capacity of 1
million BGP routes and 20 BGP peers.
The s-series switches utilize an advanced cell-based switch fabric with internal
flow-control, ensuring very low latency and jitter performance for converged
applications.
LLDP-MED addresses the unique needs that voice and video demand in a
converged network by advertising media and IP telephony specific messages
that can be exchanged between the network and the endpoint devices.
LLDP-MED provides exceptional interoperability, IP telephony troubleshooting,
and automatic deployment of policies, inventory management, advanced PoE
power negotiation, and location/emergency call service. These sophisticated
features make converged network services easier to install, manage, and
upgrade and significantly reduce operations costs.
82 IBM b-type Data Center Networking: Design and Best Practices Introduction
To achieve wire-speed Layer 3 performance, the s-series switches support
Foundry Direct Routing (FDR), in which the forwarding information base (FIB) is
maintained in local memory on the line modules. The hardware forwarding tables
are dynamically populated by system management with as many as 256,000
routes.
The s-series family includes Secure Shell (SSHv2), Secure Copy, and SNMPv3
to restrict and encrypt communications to the management interface and system,
thereby ensuring highly secure network management access. For an added level
of protection, network managers can use ACLs to control which ports and
interfaces have TELNET, web, and/or SNMP access.
After the user is permitted access to the network, protecting the user’s identity
and controlling where the user connects becomes a priority. To prevent “user
identity theft” (spoofing), the s-series switches support DHCP snooping, Dynamic
ARP inspection, and IP source guard. These three features work together to
deny spoofing attempts and to defeat man-in-the-middle attacks. To control
where users connect, the s-series switches support private VLANs, quarantine
VLANs, policy-based routing, and extended ACLs, all of which can be used to
control a user’s access to the network.
84 IBM b-type Data Center Networking: Design and Best Practices Introduction
Physical and thermal parameters
The physical and thermal parameters are shown in Table 2-26.
Fan tray/assemblies 1 2
All the fans are hot swappable and have adjustable speeds.
There are separate power supplies for system power (SYS) and PoE power
(PoE). Power consumption between PoE and SYS power supplies is not shared,
meaning loss of a System power supply does not impact a PoE power supply,
and vice versa.
System power supplies have internal power of 12V and PoE power supplies have
internal power of 48V.
All power supplies are auto-sensing and auto-switching. All are hot swappable
and can removed and replaced without powering down the system.
The system (SYS) power supplies provide power to the management module, all
non-PoE interface modules, and all ports on PoE modules that do not require
PoE power or to which no power-consuming devices are attached. The installed
SYS power supplies provide power to all chassis components, sharing the
workload equally. If a SYS power supply fails or overheats, the failed power
supply’s workload is redistributed to the redundant power supply.
86 IBM b-type Data Center Networking: Design and Best Practices Introduction
The PoE power parameters are shown in Table 2-28.
The PoE Power Supplies provide power to the PoE daughter card, and ultimately
to PoE power consuming devices. The installed PoE power supplies share the
workload equally. If a PoE power supply fails or overheats, the failed power
supply’s workload is redistributed to the redundant power supply. The number of
PoE power-consuming devices that one PoE power supply can support depends
on the number of watts (Class) required by each power-consuming device (PD).
The number of PoE power-consuming devices that one 1250W PoE power
supply can support depends on the number of watts required by each
power-consuming device. Each supply can provide a maximum of 1080 watts of
PoE power, and each PoE port supports a maximum of 15.4 watts of power per
PoE power-consuming device. For example, if each PoE power-consuming
device attached to the s-series consumes 15.4 watts of power, one power supply
will power up to 70 PoE ports. You can install additional power supply for
additional PoE power.
Each 2500W PoE power supply can provide a maximum of 2160 watts of PoE
power, and each PoE port supports a maximum of 15.4 watts of power per PoE
power-consuming device. For example, if each PoE power-consuming device
attached to the s-series consumes 15.4 watts of power, it will supply power up to
140 PoE ports.
Note: The system powers on as many PoE ports as each PoE power supplies
can handle. The system calculates the maximum number of PoE ports it can
support based on the number of PoE power supplies installed. PoE ports are
enabled based on their priority settings. Keep in mind that the system will
reserve the maximum configured power per PoE-enabled port, even if the PoE
power-consuming device is drawing less power.
In the B16S chassis, the system power supplies occupy slot numbers 1 – 4 in the
top row with the redundant supplies in slot numbers 3 and 4. The PoE power
supplies occupy slot numbers 5 – 8 in the bottom row. Figure 2-9 shows power
supply placement.
88 IBM b-type Data Center Networking: Design and Best Practices Introduction
What happens when one or more system power supplies fail:
If one or more system power supplies fail and the system is left with less than
the minimum number of power supplies required for normal operation, the
power supplies will go into overload and the system will start to shut down.
Several things can happen. The output voltage of the remaining good power
supplies will likely drop as they try unsuccessfully to generate more power
than they are capable of. The system will react to a drop in voltage by
increasing the current draw. The hardware will shut down due to over-current
protection or under-voltage protection, whichever takes place first. One by
one, the interface modules will shut down until the power is within the power
budget of the remaining power supplies. There is no particular order in which
the interface modules will shut down, as this will occur in hardware and not in
software. The management CPU requires power as well, and can also shut
down during a power supply failure.
If one or more PoE power supplies fail and the system is left with less than the
minimum number of PoE power supplies, the PoE power supplies will go into
overload. Non-PoE functions will not be impacted, provided the System power
supplies are still up and running.
Several things can happen with a PoE power supply failure. The output voltage
of the remaining good power supplies will likely drop as they try unsuccessfully
to generate more power than they are capable of. The system will react to a
drop in voltage by increasing the current draw. The hardware will shut down
PoE function due to over-current protection or under-voltage protection,
whichever occurs first. The interface modules will start to shut down its PoE
ports one by one until the over-power is within the power budget of the
remaining power supplies. There is no particular order in which the PoE ports
will shut down, as this occurs in hardware and not in software.
After a power loss, if the system is left with less than the minimum number of
power supplies required for normal operation, the system will be left in an
unknown state. At this point, manual recovery is required (that is, restore
power and power cycle the chassis).
All s-series models have maximum 512MB RAM and management processor
with 667 MHz.
The number of slots, ports and performance metrics are shown in Table 2-29.
Interface slots 8 16
Min. number of 1 1
management modules
required for operations
90 IBM b-type Data Center Networking: Design and Best Practices Introduction
All modules are hot-swappable and do not require power-off to be replaced.
Table 2-30 and Table 2-31 show the available port density.
10 Base X (XFP) 20 36
Management modules
The following types of management modules are available:
IPv4 management module
IPv4 management module with 2-port 10 GbE (XFP)
IPv6 management module with 2-port 10 GbE (XFP)
Interface modules
Table 2-32 shows which modules can be installed in the s-series chassis
interface slots.
Interface types
Following are the available interface types:
10/100/1000 Mbps Ethernet port with RJ45 connector
100/1000 Mbps Ethernet port with SFP connector
10 Gbps Ethernet port with XFP connector
Transceivers
Table 2-33 and Table 2-34 show the available transceivers to be used in interface
modules.
92 IBM b-type Data Center Networking: Design and Best Practices Introduction
Table 2-34 Transceivers for 10 Gbps Ethernet ports
Type Connector Speed Distance
Optional features
The s-series is capable of providing Layer 3 functions. Following are the optional
features:
IPv4 Full Layer 3 Premium Activation
Enables RIPv1/v2, OSPFv2, BGP-4, IGMPv1/v2/v3, PIM-SM/-DM/-SSM,
VRRP-E
IPv6 Full IPv4 Layer 3 Premium Activation
Enables RIPv1/v2, OSPFv2, BGP-4, IGMPv1/v2/v3, PIM-SM/-DM/-SSM,
VRRP-E
IPv6 Full IPv6 Layer 3 Premium Activation
Enables RIPv1/v2, RIPng, OSPFv2, OSPFv3, BGP-4, IGMPv1/v2/v3,
PIM-SM/-DM, DVMRP, VRRP-E
94 IBM b-type Data Center Networking: Design and Best Practices Introduction
Access to management interface can be restricted and encrypted; this can be
achieved by using:
Secure Shell (SSHv2) access
Secure Copy (SCPv2)
SNMPv3
HTTPS
ACLs to define which ports and interfaces have CLI, web and/or SNMP
access
To prevent “user identity theft” (spoofing) the s-series supports these features:
DHCP snooping
Dynamic ARP inspection
IP source guard
The whole list of supported standard and RFC compliance can be found at:
http://www-03.ibm.com/systems/networking/hardware/ethernet/b-type/
Both models enable a converged solution for vital network applications such as
VoIP, wireless access, WebTV, video surveillance, building management
systems, triple play (voice + video + data) services and remote video kiosks in a
cost-effective, high-performance compact design.
96 IBM b-type Data Center Networking: Design and Best Practices Introduction
Both models are shown in Figure 2-10.
All g-series models can only be installed in the rack. Non-rack installation is not
supported.
Operating system
G-series B48G is running Brocade IronWare R4.3.01 or higher and B50G is
running R5.0.01 or higher version of operating system.
For cost-effective and rapid scaling at the network edge, the g-series is equipped
with IronStack stacking technology, which supports stacking up to eight units in a
virtual chassis. The IronStack system supports 40-Gbps switching capacity
between stacked units providing a high-capacity interconnect across the stack.
The g-series IronStack supports stacking over copper and fiber cables. This
provides for flexible stack configurations in which stacked units can be separated
by more than several hundred meters of fiber.
Each power supply within a g-series delivers up to 480 watts of PoE power. In a
dual power supply configuration, up to 48 10/100/1000 Mbps PoE ports of 15.4
watts per port (full Class 3) can be supported. This scalability enables the
network manager to size the installation to meet current needs and have room for
future growth.
The g-series features 1+1 power redundancy, using hot-swappable and field
replaceable power modules, which install into the rear of the unit. The power
modules are load-sharing supplies providing full 1+1 redundancy for as many as
48 Class 1and Class 2 PoE ports and 31 Class 3 (15.4 watts) PoE ports.
Additional design features include intake and exhaust temperature sensors and
fan spin detection to aid in rapid detection of abnormal or failed operating
conditions to help minimize mean time to repair.
IronStack solution
IronStack is advanced stacking technology that supports stacked configurations
in which as many as eight g-series switches can be interconnected and maintain
the operational simplicity of a single switch. Each IronStack enabled g-series
model can support up to 40Gbps of stacking bandwidth per unit. IronStack
configurations can be built using 10-GbE CX4 copper or XFP-based fiber
connections. When XFP-based fiber connections are used, an IronStack
configuration can be extended between racks, floors, and buildings with fiber
lengths up to several hundred meters.
The B50G models are pre-configured with a two-port 10-GbE CX4 module,
expanded CPU memory, and IronStack license (IronStack PROM) and software.
98 IBM b-type Data Center Networking: Design and Best Practices Introduction
An IronStack system operates as a single logical chassis (with a single IP
management address) and supports cross-member trunking, mirroring,
switching, static routing, sFlow, multicast snooping and other switch functions
across the stack. An IronStack stack has a single configuration file and supports
remote console access from any stack member. Support for active-standby
controller failover, stack link failover, and hot insertion/removal of stack members
delivers the resilience that is typical of higher end modular switches.
When configured with dual power supplies, the 48-port g-series switch supports
up to 48 10/100/1000 Class 3 (15.4 watts) PoE ports, which is one of the highest
Class 3 PoE port density for a compact switch in the industry. These capacities
are a significant advantage for environments that require full Class 3 power for
devices such as surveillance cameras, color LCD phones, point-of-service
terminals, and other powered endpoints.
Network managers can apply a “mirror ACL” on a port and mirror a traffic stream
based on IP source/destination address, TCP/UDP source/destination ports, and
IP protocols such as ICMP, IGMP, TCP, and UDP. A MAC filter can be applied on
a port and mirror a traffic stream based on a source/destination MAC address.
VLAN-Based mirroring is another option for CALEA compliance. Many
enterprises have service-specific VLANs, such as voice VLANs. With VLAN
mirroring, all traffic on an entire VLAN within a switch can be mirrored or specific
VLANs can be transferred to a remote server.
100 IBM b-type Data Center Networking: Design and Best Practices Introduction
Threat detection and mitigation
Support for embedded, hardware-based sFlow traffic sampling extends Brocade
IronShield 360 security shield to the network edge. This unique and powerful
closed loop threat mitigation solution uses best-of-breed intrusion detection
systems to inspect sFlow traffic samples for possible network attacks. In
response to a detected attack, network management can apply a security policy
to the compromised port. This automated threat detection and mitigation stops
network attacks in real time, without human intervention. This advanced security
capability provides a network-wide security umbrella without the added
complexity and cost of ancillary sensors.
Enhanced Spanning Tree features such as Root Guard and BPDU Guard prevent
rogue hijacking of Spanning Tree root and maintain a contention and loop free
environment especially during dynamic network deployments. Additionally, the
g-series supports Port Loop Detection on edge ports that do not have spanning
tree enabled. This capability protects the network from broadcast storms and
other anomalies that can result from Layer 1 or Layer 2 loopbacks on Ethernet
cables or endpoints.
In addition, the g-series supports stability features such as Port Flap Dampening,
single link LACP, and Port Loop Detection. Port Flap Dampening increases the
resilience and availability of the network by limiting the number of port state
transitions on an interface. This reduces the protocol overhead and network
inefficiencies caused by frequent state transitions occurring on misbehaving
ports.
102 IBM b-type Data Center Networking: Design and Best Practices Introduction
Physical and thermal parameters
The physical and thermal parameters are shown in Table 2-35.
Number of fans 2 2
All the fans are not swappable and have fixed speeds.
Power parameters
All g-series models provide redundant and removable power supplies with AC
power option. Power supplies can be exchanged between B48G and B50G
models.
Both power supplies provide power for the system and PoE ports.
Power supplies
The power supplies are auto-sensing and auto-switching, and provide 600 watts
of total output power, including +12VDC @ 10A to the system and -48VDC@
10A for Power over Ethernet applications. The power supplies provide 100-240
VAC input, 50-60Hz @ 8A to 3.2A. All are hot swappable and can removed and
replaced without powering down the system.
Foundry 48-volt power supplies provide power to the PoE daughter card, and
ultimately to PoE power-consuming devices. The number of PoE
power-consuming devices that one 48-volt power supply can support depends on
the number of watts required by each device. Each 48-volt power supply provides
480 watts of power for PoE, and each PoE port supports a maximum of 15.4
watts of power per PoE power-consuming device. For example, if each PoE
power-consuming device attached to the g-series consumes 12 watts of power,
one 48-volt supply will power up to 40 PoE ports. You can install a second power
supply for additional PoE power.
104 IBM b-type Data Center Networking: Design and Best Practices Introduction
Note: If your g-series device has 48 ports and only one power supply, and
each PoE enabled port needs 15.4 watts, then a maximum of 31 ports can
supply power to connected devices.
The number of ports and performance metrics are shown in Table 2-38.
Interface types
Following are the available interface types:
10/100/1000 Mbps Ethernet port with RJ45 connector
100/1000 Mbps Ethernet port with SFP connector
10 Gbps Ethernet port with XFP connector
10 Gbps Ethernet port with CX4 connector
Optional features
The g-series is capable of providing Layer 3 functions. Following are the optional
features:
Edge Layer 3 Premium Activation: Enables RIPv1/v2, OSPFv2
106 IBM b-type Data Center Networking: Design and Best Practices Introduction
Services, protocols, and standards
IBM g-series Ethernet Switches support various services, protocols, and
standards.
The following Layer 2 protocols are supported:
Protected Link Groups
Link Aggregation (IEEE 802.3ad, LACP)
UDLD
STP/RSTP/MSTP
Root Guard
BPDU Guard
Up to 16000 MAC addresses (valid also for 8 unit stack)
Up to 4096 VLANs
Up to 253 STPs
Up to 8 ports per trunk, up to 25 trunk groups
For a complete list of supported standards and RFC compliance, see the
following website:
http://www-03.ibm.com/systems/networking/hardware/ethernet/b-type/
108 IBM b-type Data Center Networking: Design and Best Practices Introduction
3
The early computers used punch cards, the original implementation of what can
be referred to today as “Sneakernet”, although back then sneakers were not
really the fashion. Punch cards required a programmer to either punch out, or
shade in, circles on the card, where each circle represented a specific field for
the program or data set. After the programmer had finished creating the cards for
a program or data set, a large stack of cards was expected.
The programmer then took these cards to the computer lab where the operator
fed them into the card reader, and the computer read them and provided output
(often printed or on more punch cards). This output was then returned to the
programmer for analysis. As you can see, there is a lot of foot work involved in
moving data (in this case, cards) around, hence the term “Sneakernet,” which is
still used today to refer to passing data physically, whether the data is on memory
keys, portable hard drives, or tapes.
It was not long before computer languages evolved and computer terminals were
created to enable the programmer to enter data directly into a machine readable
file. These terminals were connected back to the computer by cables, usually
coaxial or twinaxial (a pair of coaxial cables together). Twinaxial cable allowed for
up to 7 terminals to be connected to the single length of cable. This was
effectively the first iteration of the computer network.
At that time, computers were expensive and people were not always sending or
receiving data between their terminal and computer, therefore a cheaper device
was placed in front of the computer to allow one connection on the computer to
be used by more than one terminal. This device was called a front end processor
(FEP). The FEP was able to control data for communications between the
terminal and the computer, allowing each terminal to have time to communicate
to the computer.
110 IBM b-type Data Center Networking: Design and Best Practices Introduction
The data links were quite slow compared with today’s speeds and the displays
were text based, green screens. The FEP was one of the earliest network
devices. The FEP acted as the hub for communications, the terminals were
connected like spokes on a wheel, and this is where the terminology for a
hub-and-spoke network probably originated.
To this point, we have not really described any complex networking. The FEP is
really just allowing remote display of data. Transferring data between computers
still required physically moving tapes or hard disks between systems. Let us fast
forward a bit.
During the 1960’s, more computers were entering the market and data had to be
shared to improve processing power. In 1969 the Advanced Research Projects
Agency Network (ARPANET) was created to enable multiple computers to
connect together between various universities and USA government agencies.
ARPANET was the beginning of what we now call the Internet. The concept
of connecting more computers together evolved with the availability of
microprocessor computers. Xerox first suggested the concept of inter-networking
with an early version of Ethernet. Over time, TCP/IP became the standard
protocol for the Internet.
Ethernet was not always the thin cable connecting into the computer that we
have today. It started out on a coaxial wire which was run from one end of the
required network to the other end, to a maximum of 500 meters, this is called a
bus topology. To connect a computer, the network administrator utilized a special
connection, called a vampire tap, which when inserted into the coaxial Ethernet
cable, mechanically pierced the shield and created two electrical connections,
one to the central wire in the coaxial cable, one to the shielding of the coaxial
cable.
This vampire tap had a connection that was then inserted into the personal
computer’s network interface card (NIC). The next generation of Ethernet used a
thinner coax cable with a maximum length of 185 meters for the entire single
network. This thinner cable was also designed to allow for easier connection at
the NIC. To add a computer, the network administrator had to cut the coaxial
cable and terminate each side with connectors, then both connectors were
attached to a T-piece that connected to the computer’s NIC.
To reduce the disruption of inserting a new computer into the Ethernet, the
design moved to a hub and spoke topology, introducing the cable and connector
we are familiar with today. The network hub was placed in a central location to
the site. Each computer, or spoke, was then connected by a cable made of
twisted pairs of wires which had RJ45 connectors at each end (RJ45 is the name
of the connector you typically use when connecting to a wired network today).
The new cable provided a maximum length of 100 meters, but now it was
measured from the network hub to the computer, instead of from one end of the
coaxial (bus) network to the other. Besides the hub and cable, not much really
changed in the logical design.
Coaxial has one pair of electrical conductors, twinaxial is two coaxial cables
bonded together, hence two pairs of electrical conductors. Today’s connections
utilize 4 twisted pairs; the twisting reduces the impact of electrical interference.
Terms such as Category 5 (Cat5) and Category 6 (Cat6) define how tightly
twisted these pairs of wire are. The tighter the twisting, the more resiliency the
cable has to electrical interference, which in turn allows the speed, or bandwidth,
of the network to increase.
112 IBM b-type Data Center Networking: Design and Best Practices Introduction
3.2.1 Introduction to data communications
As we are all aware, computers work in binary with bits, which are either on or
off, typically represented as a one (1) for on and zero (0) for off. These bits are
then grouped together in a set of eight (8) bits which is called a byte. While there
are more groupings, for now we only need to understand bits and bytes.
Although the MAC address is factory set for each NIC, it is possible to change the
MAC address to a locally administered address. This is sometimes referred to as
MAC spoofing. Changing the MAC address is also used in some network
equipment for high availability purposes.
Each of these data chunks was called a packet or a frame. If a packet was lost in
transit, then only that small packet had to be resent instead of the entire set of
data. In general the term frame is used to define the OSI Layer 2 transport
protocol, in our case, Ethernet; and the term packet is used to define the OSI
Layer 3 network protocol, in our case TCP/IP. For more information about the OSI
model, see the Redbooks publication, TCP/IP Tutorial and Technical Overview,
GG24-3376.
There have been many different network types over the years that you might
have heard of, including Token Ring, ATM, and of course Ethernet. All of these
computer network technologies utilized packets to transmit data. Some media,
such as Token Ring, allowed for a variable maximum packet size depending on
various network parameters. Others, such as Ethernet and ATM, decided on a
fixed maximum packet size.
More recently, due to the increase in both speed and reliability of networks,
Ethernet has defined the jumbo frame. This frame is much larger than the normal
Ethernet frame, however, all network devices in the path must support jumbo
frames in order for them to be used.
114 IBM b-type Data Center Networking: Design and Best Practices Introduction
Let us take a look at the standard 802.3 Ethernet packet. While there are two
types of Ethernet packets in use today, the 802.3 Ethernet packet is international
standard, so will be used for our examples. The Ethernet packet has a maximum
size of 1518 bytes, but it is permitted to be smaller depending on the data being
sent. Figure 3-1 shows the standard 802.3 Ethernet frame. The header is the first
25 bytes and includes the destination address (Dest Addr) and source address
(Source Addr). These addresses are the MAC addresses of the destination and
source NICs.
Note: The destination MAC address for every packet is very close to the start
of the Ethernet packet, starting only nine (9) bytes from the start of the frame.
Earlier we saw an example of the Ethernet frame (Figure 3-1). This frame has a
field labeled “Info” with a size of “variable”. The IP header starts in this Ethernet
“Info” field.
116 IBM b-type Data Center Networking: Design and Best Practices Introduction
Figure 3-2 shows the IP header. The fields of interest to us are the 32-bit source
IP address and the 32-bit destination IP address. The source IP address is the IP
address assigned to the system initiating the packet. The destination IP address
is the IP address assigned to the target system for the packet.
3.3.2 IP addresses
Of course we typically do not use IP addresses in daily life. Instead we use
names that are more easily read by humans, for example, www.ibm.com refers to
the server named “www” in the domain “ibm.com”. These names utilize the
domain name system (DNS) to translate human readable names into IP
addresses. So your TCP/IP packet destined for www.ibm.com has an IP address
of one of the servers that IBM provides for Internet data.
Note: Each TCP/IP packet contains source and destination IP addresses. The
destination IP address is located within the IP header, which is contained
within the Ethernet frame, starting forty-two (42) bytes into the frame.
This book is not a detailed network protocol book, so we do not delve into the
details of how each device learns the correct destination MAC address. For more
information, see the product configuration guide for your device
For our LAN, we assume that all computer (servers and workstations) are
located on the same network, as shown in Figure 3-3.
118 IBM b-type Data Center Networking: Design and Best Practices Introduction
If the packet was not destined for the MAC address of the device, it simply
passes the packet on to the next network device. This is true for the coaxial and
hub deployments of an Ethernet network.
The bus topology proved to be both inefficient and insecure; inefficient in that the
network was often congested with packets and each NIC had to determine
whether it was the intended destination or not; insecure in that any computer on
the network can potentially have its NIC forced into promiscuous mode.
Promiscuous mode instructs the NIC to process every packet regardless of the
intended destination, making it possible for software to then filter out specific data
and provide the unauthorized user with data that they must not have access to.
Ethernet networks can be extended by the use of bridges which can connect two
physical network segments together. These bridges learn the MAC addresses of
devices connected to each segment, including other bridges, and accept packets
destined for the other segments connected to the bridge.
Promiscuous mode has uses even today. Promiscuous mode is often used in
detailed network analysis, or for specific network security devices (such as IDS)
to determine whether unauthorized traffic is present on the network. However, it
is used by trained professionals and in controlled environments, both physically
and logically controlled.
The switch has some in built Ethernet intelligence. It learns the MAC addresses
of each device connected to each specific port. Therefore, if a packet is sent from
Workstation A to Server Z, when the switch receives the packet, it opens the
Ethernet Frame to determine the Destination MAC address. Then the switch
uses the table of MAC addresses it has built up and forwards the packet out the
interface that has the destination MAC address connected to it.
In this way, each network device acts like it is connected to its very own hub.
In a well built switch there is no undue contention for network resources, and any
contention that does exist can be buffered. Each NIC still confirms it is the
intended recipient of the frame, but is only receiving frames intended for it, either
directly specified or broadcast packets.
3.5 Routing
So far we have discussed Ethernet networks on a single physical segment or
bridged environment. These do not scale well as each device on the LAN has to
keep a table of the MAC addresses of every other device. Also consider Ethernet
networks still rely on broadcast traffic for certain communications. This broadcast
traffic is best kept to as small an environment as possible. Similarly, a faulty NIC
can still overwhelm some switches, this can cause faults on the Ethernet
segment that NIC is connected to. For these reasons Ethernet segments are also
referred to as broadcast domains, or fault domains.
120 IBM b-type Data Center Networking: Design and Best Practices Introduction
3.5.1 TCP/IP network
As we build on our existing example, again we do not delve into the detail of how
each device learns the correct destination MAC address. This time the network
architect has connected the servers onto one Ethernet segment. Similarly, the
workstations have been placed on another, separate, Ethernet segment. Both
segments are using Ethernet Switch technology.
Figure 3-5 IP Network with servers and workstations on separate segments connected
by a router
Remember that the destination IP address is the 8-bytes starting 42-bytes into
the Ethernet packet. The router also changes the destination MAC address to
that of server Z before sending the packet towards the server access switch. The
server access switch has already learnt which interface the destination MAC
address is connected to and transmits the packet to the correct interface. Finally,
server Z receives the packet and processes it.
To make a routing decision, the router must process the destination IP address
which is 42-bytes into the Ethernet Frame. While this sounds like a time
consuming, convoluted process the actual time taken is typically in the order of
milliseconds.
Note: The router makes decisions based on the destination IP address, which
is located 42-bytes into the Ethernet Frame.
122 IBM b-type Data Center Networking: Design and Best Practices Introduction
3.5.2 Layer 3 switching
Historically, the router has made routing decisions in software on the router’s
CPU, which was slower than the switch. More recently, routers have migrated to
making routing decisions through application specific integrated circuit (ASIC)
hardware. Such routers are often called Layer 3 switches, referring to the OSI
Layer 3 and the fact that the routing decision is made in hardware on the ASIC.
Terminology is dependent on the audience and most network routers now utilize
ASIC hardware.
Note: A Layer 3 (L3) switch is simply a router that makes routing decisions
through ASIC hardware rather then code on a microprocessor. Most purpose
built network routers today are in fact L3 switches.
The s-series and g-series are designed more as switches. They can be operated
as routers with some routing functions limited by hardware design and software
limitations.
Similarly, the c-series and m-series are designed as routers and have hardware
and software capable of supporting more routing functions. These devices can
also operate as switches at Layer 2.
All routing functions are executed in hardware on ASICs, so that the IBM
Ethernet product range covers all L3-switches.
Typical physical resiliency takes into account things such as physical location to
avoid building in a geologically or politically unstable area, fences, and
RAM-RAID prevention techniques are usually employed. Further physical
security is used to ensure that the DC has controlled access, which might include
multiple levels of security and authentication (for example, locked doors and
physical guards) before a person can access equipment within the DC.
Cooling can include the design of a cooling systems that contain multiple cooling
units. The cooling system is designed in such a way that the loss of one cooling
unit still allows the rest of the cooling system to maintain the correct temperature
for the equipment in the data center.
126 IBM b-type Data Center Networking: Design and Best Practices Introduction
Capacity management of redundant telecommunications is both a design
consideration as well as a cost consideration. There are two typical modes of
operation for dual circuits:
Active / Backup
Active / Active
With Active / Backup, the two circuits have the same bandwidth, but only one is
active at any time. In case of a loss of the active circuit, data will traverse the
backup circuit without performance degradation because both circuits are
maintained at the same bandwidth. It is very important to ensure that any
increase in bandwidth on the active circuit is also reflected in the backup circuit.
For example, if 10 Mbps is required between two sites, each circuit must be
configured to 10 Mbps, if an increase in bandwidth of 2 Mbps is required, then
both circuits must be upgraded to 12 Mbps. Some carriers might provide
discounted services for a nominated backup circuit.
In the other option, Active / Active, both links transport data at the same time,
allowing both links to be configured to less than the total bandwidth required,
which might be preferable in situations where the carrier does not offer discounts
for nominated backup circuits. However, to ensure minimal disruption in the case
of a circuit outage, they must both have the same bandwidth.
It is also important to utilize a data protocol that can balance traffic over paths of
equal cost. In this case, if one link fails, the other link transports all the traffic with
some degradation to services. As a general rule of thumb, each circuit must be
maintained at 75% of the required bandwidth, which will result in a degradation of
25%. For example, if 10 Mbps is required between two sites, each circuit must be
at least 7.5 Mbps. If one circuit fails, the other circuit can carry 75% of the total
expected traffic, resulting in a 25% degradation of bandwidth.
Also note in this example that if an increase in the required bandwidth of 2 Mbps
is required (from 10 Mbps to 12 Mbps), then both circuits must be increased to
9 Mbps (12Mbps * 75%). Keep in mind the calculations are based on required
bandwidth, not the usable bandwidth. It might also benefit the site to operate a
policy controller which can drop traffic that is not business critical in the case of a
circuit outage.
There are many components to the data center, including application services,
storage services, data center network infrastructure, and multiple connections at
the data center edge to various environments. Figure 4-1 shows the interaction of
these components within a data center. In this example, the WAN component
can connect to other locations for the Enterprise. The LAN connectivity is used
where the building that houses the data center also houses the Enterprise users.
With the growth of IT and the cost of providing full DC services, it is not unusual
for a DC site to house more than one Enterprise. In this case the network
architect needs to determine whether to deploy multiple network devices or
utilize virtual separation. For more information, see Chapter 5, “IBM Ethernet in
the green data center” on page 133.
With a resilient data center network architecture, most network architects will
follow a hierarchical design. Although it is possible to deploy a single tier
architecture, it is usually only suited to very small locations where fault domains
and function segregation is not required. These small locations are not those
expected in a data center.
Tiers can consist of physical or logical tiers. The traditional multi-tier network
architecture shows the connectivity between the Access, Distribution, and Core
layers, as well as the edge connectivity from the Core. In some cases it is
preferable to collapse some components of the infrastructure such as Access
and Distribution, or Distribution and Core.
128 IBM b-type Data Center Networking: Design and Best Practices Introduction
Figure 4-2 shows a typical multii-tier data center infrastructure.
The IBM Ethernet products are suited to all tiers within the data center. These
are discussed in more detail in Chapter 11, “Network design for the data center”
on page 249.
130 IBM b-type Data Center Networking: Design and Best Practices Introduction
4.3 High Performance Computing market segment
High Performance Computing (HPC) is found in various commercial and
research markets. It refers to the ability to connect various compute systems
together with high speed network links forming a cluster of computers. This
allows parallel processing of complex tasks. There are two primary differentiators
in HPC, these are:
Tasks requiring low latency
Tasks requiring high bandwidth
It is also possible that a particular HPC intent might require both low latency and
high bandwidth.
On the other hand, when modeling data where all HPC nodes need to be in
constant communication, advising all other nodes what they are working on, a
more important metric is low latency. This feature can be used in atomic research
or weather forecasting, where computation is complex and dependent on the
results from another node. If the latency is not low enough, the HPC cluster might
find that a number of nodes are waiting for the results from a single node before
computation can continue.
Because the carrier’s business is solely to transport data, their network is typified
by a predominance of data ports. Their equipment will transport data from many
customers, so data separation is of great importance.
Video is less sensitive, but data loss is still noticeable when large sets of video
data is lost. Most streaming video applications will allow for some jitter and not
display data that is too far out of sync. Other data is less concerned about loss or
latency, except in the case of human input, where slow response is often
reported. However, even this slow response complaint is nowhere near as
sensitive as loss of voice data.
132 IBM b-type Data Center Networking: Design and Best Practices Introduction
5
Although the “silver bullet” is still eluding us, the IBM Ethernet products can
provide some solutions and overall benefits to the “green data center”
134 IBM b-type Data Center Networking: Design and Best Practices Introduction
Within the data center, a reduction in the power consumption of equipment can
mean more equipment can be powered by the same emergency infrastructure,
that is, existing UPS and discrete generators. However it does not have to also
mean loss of features.
IBM Ethernet products have some of the industry’s lowest power consumption
per port on each of the products, as shown in Table 5-1 for select products in the
IBM Ethernet range. For specific details on each product, see Chapter 2,
“Product introduction” on page 31.
Safety: Human safety is a primary concern. For a site with VoIP handsets, or
PoE IP surveillance cameras, controls need to be in place for human security.
If an employee is in the site during normal PoE down time, they must be safe.
This might require a process to allow the PoE to be enabled upon request, or
alternate communication measures (for example, cellular / mobile phone).
136 IBM b-type Data Center Networking: Design and Best Practices Introduction
Table 5-3 shows the effect on the power per port if they only have the maximum
number of ports are used on the m-series Ethernet Router.
We now consider some options to assist in utilizing the available ports more
effectively.
5.3.1 VLAN
The Virtual Local Area Network (VLAN) is not a new concept, and many
networks architects already utilize these wherever possible. VLANs allow for
multiple Layer 2 networks to exist on the same network device, without
interaction with each other. They are typically used for separation of logical
teams or groups, perhaps one VLAN is configured for the finance department
while another VLAN is configured for the developers. These two VLANs can exist
on the same device which allows for separation of function without installing
separate equipment for each group.
Various organizations have differing opinions on how VLANs can be used for
separation in a secured environment. Consult your organizations security team
before deploying VLANs.
5.3.2 VSRP
The Virtual Switch Redundancy Protocol (VSRP) can also assist in maximizing
the use of equipment and connections. As discussed in the Chapter 6, “Network
availability” on page 141, VSRP removes the requirement for STP/RSTP and can
improve the Layer 2 resiliency. VSRP can also be configured so that links can be
active for one VLAN and backup for another VLAN. With this configuration, it is
possible to load balance and allow both physical links to be utilized for different
VLANs. In contrast to STP/RSTP, that can block one of the interfaces for all
traffic. While this might result in degradation of the link during a failure, if both
links were being used near their theoretical maximum, it provides maximized use
of the links during normal operations.
Each VLAN can have a specified MRP Master switch. Therefore, in a ring with
two VLANs operating on it, the network can be configured with a different MRP
master switch for each VLAN. Due to the operation of MRP, each master switch
will block the port where it receives its own Ring Hello Packet (RHP). With two
different master switches, the blocked port for one VLAN will not be blocked for
the other VLAN. This allows every link to be utilized for traffic.
Combining these two methods of efficiencies can allow for all links to be utilized
with a minimum of CPU overhead. MRP is a proprietary protocol available on all
the IBM Ethernet range of products.
We have already seen in 5.3.1, “VLAN” that the physical number of switches can
be reduced by deploying multiple VLANs on a single switch, this is fairly common
practice today. However that is where the technology stayed for many years.
More recently virtualization has been developed for routers as well. On the IBM
m-series Ethernet Routers, Virtual Router and Forwarding (VRF) is available.
138 IBM b-type Data Center Networking: Design and Best Practices Introduction
5.4.1 Common routing tricks
Here we consider some historical designs for an environment where multiple
clients needed to connected to a single data center, but to dedicated systems.
To achieve this capability, each client has required a separate data connection
that connects into a dedicated client router and perhaps a dedicated firewall.
Often clients have chosen to utilize private address space (RFC 1918), which
causes routing conflicts if that address space is utilized on the same network.
This caused many different routing tricks to be created and used. Depending on
the situation, routing might include:
Network Address Translation (NAT)
Policy Based Routing (PBR)
Tunnelling
PBR exists where a routing decision is made based on the known source
address of the packet. This method required each client to have a dedicated
destination within the service provider network. This dedicated destination has
an IP address that was either registered or unique within the data center. The
routing decision for a packet destined to the client’s network, was made by the
PBR configuration. The PBR configuration defines the next hop address based
on the source address of the packet.
All these solutions required dedicated equipment which took up space, power
and air conditioning.
5.4.2 VRF
Virtual Routing and Forwarding (VRF) tables allow for multiple instances to be
configured for route and forwarding information on the same network device. The
VRF configuration include a Route Distinguisher (RD) which is unique to each
VRF within the network. The VRFs must connect to the same Layer 3 network,
although the intervening Layer 2 networks can vary, a combination that allows for
the same private addresses to be routed on the same IBM Ethernet routers
without conflicting with each other. Doing this simplifies the creation of a trusted
virtual private network (VPN) without the complexity or overhead of other
solutions such as MPLS.
The deployment of VRF in the Layer 3 router can reduce the number of separate
routers required within a data center. Network complexity can never be solved
completely; however, reducing the number of devices and deploying trusted
VPNs through the use of VRFs is achievable today with IBM’s m-series and
c-series Ethernet/IP devices.
140 IBM b-type Data Center Networking: Design and Best Practices Introduction
6
If any of these components fail, the device will send an alert to the network
monitoring stations, typically by SNMP and or Syslog. This alert must be acted
upon to maintain network reliability. For example, if the fan unit on the primary
device fails, it will send an alert to the network monitoring stations as the backup
device takes over operations. If that alert is simply filed away and no action taken
to rectify the problem, in a couple of weeks, if power is lost to the backup device
(now operating as primary), there is nothing to fail over to. Simply replacing the
fan unit as soon as possible allows the system to fail over on any fault on the
backup unit.
142 IBM b-type Data Center Networking: Design and Best Practices Introduction
Note: An integral part of a high availability design is network monitoring with
defined actions for any alert. Fixing any fault that has been alerted will
maintain the ability for the high availability design to maintain operations.
Similarly, with telecommunications feeds, the network architect can work with the
site operations team and carrier, to ensure diverse paths into the data center. We
will assume the site operations team also consider other site environmental
factors before deciding on the site. For example, floor loading capability,
uninterruptible power supply (UPS), power generators, air conditioning, humidity
controllers, static electricity protection, and fire protection, just to name a few.
Due to this lengthy delay, Rapid Spanning Tree Protocol (RSTP) was developed
as the standard 802.1w protocol. While there are differences in the operation of
RSTP, the basic function is the same as STP. Ports creating a loop are logically
blocked until a fault is identified in the chosen primary path. The benefit of RSTP
is the greatly reduced failover time, RSTP has a failover time between 50ms and
5 seconds.
VSRP switches are all configured as backup switches and an election process
selects the primary switch. The primary switch broadcasts VSRP packets which
are forwarded by VSPR aware switches. If the backup switches do not receive
the hello packet within the configured time period, the next highest priority
backup switch takes over as the primary, providing sub-second failover.
144 IBM b-type Data Center Networking: Design and Best Practices Introduction
Each VLAN can run a separate VSRP, and MAC addresses are assigned to each
VSRP instance, thus allowing for better utilization of the available links, as
discussed in Chapter 5, “IBM Ethernet in the green data center” on page 133.
When the VSRP failover occurs, a VSRP aware switch will see the VSRP hello
packets and MAC address originating from the backup VSRP switch. As a result,
the VSRP aware switch can redirect packets to the backup switch without the
need to relearn the entire Layer 2 network again.
For more detailed information about MRP consult the Brocade resource center:
http://www.brocade.com/data-center-best-practices/resource-center/index
.page
For more detailed information about MRP consult the Brocade resource center:
http://www.brocade.com/data-center-best-practices/resource-center/index
.page
MRP sends a Ring Health Packet (RHP) out one interface and expects to see it
return on the other interface to the ring. An election process is held to select the
ring master, this ring master initiates the RHP on the lowest numbered interface
and will block the interface the RHP is received on. If the RHP is not received
within the configured time-out period, the blocked port is unblocked. Similarly,
when the fault is rectified the RHP will be received again on the second interface
and it will be blocked once again. The network administrator needs to define the
time out value for the RHP, this time-out value must be greater than the combined
latency of the ring.
MRP is not restricted to a single ring, it can be configured in multiple rings. Each
ring is assigned a ring number and elects a master. The devices connecting to
multiple rings will forward the RHP out all interfaces with an equal or lower ring ID
as that of the RHP. The RHP is then blocked on devices that have received a
matching RHP from another interface. This allows multiple rings to provide
availability for each other without flooding rings with traffic not destined for that
ring.
A server that is dual homed to the same network, must run some form of Network
Interface Backup (NIB) configuration. While more information about NIB can be
found in the references, the main feature to note is NIB allows the server to be
connected to two different Layer 2 switches. If one switch fails, NIB automatically
moves the traffic over to the other switch.
For user workstations, dual homing is not typically required or deployed. Instead,
many businesses are deploying wireless networks as well as wired networks. If
one connection happens to fail (such as the wired connection), the other one
(such as the wireless network) is usually available. In the event of a catastrophic
failure at a user access switch, the users can move to another physical location
while the fault on the switch is repaired.
Routers maintain a table of networks the router can forward traffic to. The route
table is populated in a number of ways these can all be categorized as either
static or dynamic routes.
Again, the access tier is the most vulnerable to availability issues, however,
protocols exist to greatly reduce this vulnerability.
146 IBM b-type Data Center Networking: Design and Best Practices Introduction
6.3.1 Static routing
Static routes are defined by the network administrator and require manual
intervention for any changes. A static route consists of a destination network and
a next hop address. The next hop address is the IP address of the router that is
closer to the destination network. The next hop address must be within a network
that is physically or logically connected to the router.
While some implementations of static routes allow for multiple paths to the same
destination network, many do not. Even those routers that allow for multiple
paths to the same network might not behave in a fully predictable manner in case
one path is no longer be available.
For a simple end-point device the network can provide redundancy of the default
gateway by utilizing Virtual Router Redundancy Protocol (VRRP) configurations.
The IBM Ethernet routers allow for up to 512 VRRP or VRRPE instances to be
configured on a single router.
VRRP
For a pair of routers deployed at the distribution tier, redundancy can be
configured by using VRRP, more than two routers can be used if required. Each
router in the virtual router group, must be configured with VRRP to provide
redundancy. VRRP configuration requires each VRRP member to be configured
with the IP address of the master as the virtual IP address, this allows the virtual
router to share one IP address. Each interface within the virtual router still has a
unique IP address, but only the virtual IP address is used to transmit traffic
through the virtual router.
The VRRP session communicates over the same Layer 2 network as the
computers using a unicast address. The owner of the IP address (VRRP master)
defines a virtual MAC address for the virtual router. If the backup VRRP routers
do not receive the VRRP hello packet within a predetermined time, the next
highest priority router will take ownership of both the IP address and the virtual
MAC address. This allows all computers on the network to maintain data
communications through the same IP address which is configured as their
default route.
Within the IBM Ethernet products, VRRP has been extended to allow the virtual
router configuration to monitor another link. If the monitored link fails on one
router it notifies the virtual router over VRRP and one of the backup routers will
assume ownership of the IP address and virtual MAC address. For example, a
distribution router might be configured to monitor the link to the core router, if this
link fails, the backup router will take over as the primary until the fault can be
fixed.
148 IBM b-type Data Center Networking: Design and Best Practices Introduction
VRRPE
There are a number of differences and benefits of using VRRPE. Perhaps the
biggest benefit of VRRPE is that the IP address for the virtual router is not a
physical interface on any of the member routers. Instead both the IP address and
MAC address are virtual. The master is elected based on the highest priority of
all routers in the virtual group.
There is one item to be aware of, when VRRPE is set to monitor another link.
When that link fails, the priority of the device is reduced by the value in the track
priority setting. This result is very different to VRRP, where the priority is dropped
to twenty (20) if the tracked link fails. Therefore, if the track priority is not set large
enough to reduce the priority to below the next backup device, the next backup
device will not take ownership of the virtual IP or virtual MAC.
Note: When using VRRPE with track ports, the track port priority is subtracted
from the VRRPE device’s interface priority. If the interface priority of the
backup device is not greater than the resultant VRRPE priority less the track
port priority, the backup device will not take over as the master.
Consider an example where the VRRPE interface priority of the intended master
is set to 200 and the track port priority is set to 20. If the tracked port fails, the
new interface priority is 180 (200 - 20). If the intended backup interface priority is
not greater than 180, it will not take over as master in this example.
Protected link groups can be created across modules and can include links of
different speed. For example, the active link can be a 10 GbE link and the
standby link can be a 1 GbE link, or a group of multiple 1 GbE links (6.5, “Link
Aggregation Group”). Doing this allows for a reliable high speed link to be backed
up by a lower speed (lower cost) link. Although a lower speed link will impact
performance, it is a cost effective link backup method that allows data
communications to continue traversing, at a slower speed, while the failed link is
repaired.
The LAG can also be configured to provide a few high availability benefits:
Multiple connections between devices can be grouped together to balance
the load. This provides faster throughput as well as some high availability
benefits. In the case of two links between devices, in case one link fails, the
other link in the LAG continues transporting data without spanning tree
protocols needing to update connectivity status.
Ports within a LAG set can be distributed across different modules. This
provides module based HA, if one module fails, the LAG is still active allowing
degraded throughput while the failed module is replace.
A minimum number of active ports can be configured. This is especially useful
if there are multiple paths between devices. For example if two LAGs are
configured between two devices, the primary path was designed with six (6) 1
GbE ports the backup path designed with four (4) 1 GbE ports, the six (6) port
LAG can be configured to ensure at least four (4) ports were active, if less
than four (4) ports are active that LAG set will shut down and communications
can use the other LAG. Consider this in conjunction with VSRP where the
same MAC is also assigned, this allows for seamless cut over to a backup
path.
150 IBM b-type Data Center Networking: Design and Best Practices Introduction
The IEEE specification defines five classes of power as shown in Table 6-1, all
power is measured as maximum at the Power Source Equipment (PSE) which is
the IBM Ethernet switch in this book.
0 Default 15.4
1 Optional 4
2 Optional 7
3 Optional 15.4
For the purpose of the calculations in this section, we use the worst case
scenario of class-3 9 (15.4W) for all our PoE devices. However, typically not all
devices are class-3 or even require a constant 15.4W supplied.
Each of the g-series and s-series can be configured with more available PoE
power than available ports. This allows the majority of PoE devices to continue
operating even though a power supply has failed. If all the ports are requiring full
class-3 PoE supply, each device will by default assign power from the lowest port
number to the highest port number, disabling power to ports when the available
PoE power has been exhausted.
There are two options to design availability into the PoE environment:
Place your more important PoE devices on the lowest available port
Set PoE priority
Although it is entirely possible to define a site standard forcing the use of the first
ports to be connected according to priority, staff movement might change seat
allocation over time. However, devices such as PoE powered access points or
surveillance cameras do not move ports. Consider reserving the first five (5) or
ten (10) ports, depending on site constraints, for business critical infrastructure.
If insufficient power is available, PoE priority starts assigning power to the highest
priority devices first, disabling PoE supply to remaining ports when supply is
exhausted. In the case of contention for power to ports of equal PoE priority the
lowest ports will be assigned power first.
6.6.1 g-series
The g-series power supply shares power between system functions and PoE
functions. This shared power supply has been designed to allow for 480 W of
PoE supply, each individual power supply can therefore provide power to 31 PoE
ports (480 / 15.4 = 31). By using redundant power supplies more power is
available than ports.
6.6.2 s-series
The IBM s-series has specific PoE power supplies, there are two power options
within each of the mains voltage models depending on your PoE requirements.
The network architect can decide between the 1250 W or 2500 W power supply.
The 1250 W PoE power supply can provide power to 81 class-3 PoE devices.
The 2500 W model provide power to 162 class-3 PoE devices.
The B08S can support up to 324 class-3 PoE devices if two (2) PoE power
supplies each of 2500 W were installed, whereas the current maximum number
of copper ports on this model is 192. Even with the failure of one PoE power
supply, 162 class-3 PoE ports can still be provided full class-3 power.
Similarly, the B16S can support up to 648 class-3 PoE devices with four (4)
2500 W PoE power supplies. The current maximum number of copper ports for
this model is 384. Even with the failure of one PoE power supply, there is still
enough power for more than 384 class-3 PoE ports.
152 IBM b-type Data Center Networking: Design and Best Practices Introduction
6.7 Hitless upgrades
The IBM Ethernet chassis based products (s-series and m-series) are capable of
having code upgraded without interrupting the operation of the system. This is
known as a hitless upgrade, because the device does not take a “hit” to
operations. This feature improves the high availability of the network as it is not
required to be off-line at any stage during an upgrade.
Both platforms require two management modules and access to the console
ports for hitless upgrades. There might also be release-specific upgrades or
other constraints that are documented in the upgrade release notes.
6.7.1 s-series
The s-series supports hitless upgrade for Layer 2 functions only. If the s-series is
operating Layer 3 functions, these functions will be interrupted.
6.7.2 m-series
The m-series supports hitless upgrade for both Layer 2 and Layer 3 functions.
Specific configuration is required to provided graceful restart for OSPF and / or
BGP.
QoS guarantees are very important when network capacity is not sufficient,
especially for real-time streaming multimedia applications such as voice over IP
(VOIP), IP based TV, online gaming, and cellular data communication, because
those types of applications require fixed bandwidth and are delay sensitive. In
cases where there is no network congestion or when the network is oversized,
QoS mechanisms are not required.
Note: Do not get QoS confused with high level of performance or achieving
service quality. QoS only means that some traffic will be prioritized over other
traffic. If there are not enough resources available, even utilizing QoS cannot
produce high performance levels. QoS will ensure that if there is capacity
available, it will be assigned in a consistent manner to prioritize traffic when
required, and with this, the level of performance can be maintained.
In the past, QoS was not widely used because of the limitation of the networking
devices’ computer power for handling the packet in the network infrastructure.
156 IBM b-type Data Center Networking: Design and Best Practices Introduction
7.2 Why QoS is used
In over-subscribed networks, many things can happen to a packet while traveling
from the source to the destination:
Dropped packets:
In some cases routers fail to deliver packets if they arrive when the buffers are
already full. Depending on the situation, some, none, or all of the packets can
be dropped. In this case the receiving side can ask for packet retransmission
which can cause overall delays. In such a situation it is almost impossible to
predict what will happen in advance.
Delays:
It can take a long time for a packet to reach its destination, because it can get
stuck in long queues, or, for example, it takes a non-optimal route to avoid
congestion. Some applications which are sensitive to delays (that is, VOIP)
can become unusable in such cases.
Out-of-order delivery:
It can happen that when a collection of packets travel across the network that
different packets take different routes. This can result in a different delay and
packets arriving in a different order than they were sent in. Such a problem
requires special additional protocols responsible for rearranging out-of-order
packets to a correctly ordered state (isochronous state) once they reach the
destination. This is especially important for video and VOIP streams where
the quality can be dramatically affected.
Jitter:
When traveling from the source to the destination the packets reach the
destination with different delays. The delay is affected by the position in the
queue of the networking equipment along the path between the source and
destination and this position will vary unpredictably. Such a variation in delay
is know as jitter and can seriously affect the quality of streaming applications
such as streaming audio and video.
Error:
It can also happen that packets are misdirected, combined together, or even
corrupted while traveling the network. The receiving end has to detect this
and as in the case when packets are dropped ask the sender to retransmit the
packet.
QoS is the most beneficial for what are known as inelastic services:
VOIP (Voice over IP)
IPTV (IP based TV)
Streaming multimedia
Video teleconferencing
Dedicated link emulation
Safety critical applications (for example, remote medical procedures requiring
a guaranteed level of availability, sometimes also called hard QoS)
Online gaming, especially real time simulation in a multi-player environment
158 IBM b-type Data Center Networking: Design and Best Practices Introduction
Multiprotocol Label Switching (MPLS)
Resource Reservation Protocol - Traffic Engineering (RSVP-TE)
Frame relay
X.25
Asynchronous Transfer Mode (ATM)
TOS (Type of Service)) field in the IP header (now superseded by Diffserv)
IP Differentiated services (DiffServ)
IP Integrated services (IntServ)
Resource reSerVation Protocol (RSVP)
7.3.3 Classification
To provide priority handling of particular types of traffic, first this traffic needs to
be identified. Classification is the process of selecting packets which will be
handled by the QoS process. The classification process assigns a priority to
packets as they enter the networking device (that is, switch or router). The priority
can be determined based on the information that is already contained within the
packet or assigned to the packet on the arrival. After a packet or traffic flow is
classified, it is mapped to a corresponding queue for further processing.
Queue management
The size of the queues is usually not infinite, so the queues can fill and thus
overflow. In the case when a queue is full, any additional packets cannot get into
it and they are dropped. This is called tail drop. The issue with tail drops is that a
network device cannot prevent dropping of the packets (even if those packets are
high priority packets). To prevent such issues, there are two options:
Provide some kind of criteria for dropping packets that have lower priority
before dropping higher priority packets.
Avoid the situation when queues fill up, then there is always space for high
priority packets.
Both of these functions are, for example, provided by Weighted Random Early
Detection (WRED).
Congestion management
Bursty traffic can sometimes exceed the available speed of the link. In such a
case, the network device can put all the traffic in one queue and use a first in, first
out (FIFO) method for the packets, or it can put packets into different queues and
service some queues more often than the others.
Link efficiency
Low speed links can be a problem for a smaller packets. On a slow link, the
serialization delay of a big packet can be quite long. For example, if an important
small packet (such as a VoIP packet) got behind such a big packet, the delay
budget for this is exceeded even before the packet has left the network device
(router). In such a situation, link fragmentation and interleaving allows large
packets to be segmented into smaller packets interleaving important small
packets. It is important to use both options, interleaving and fragmentation. There
is no reason to fragment big packets if later you do not interleave other packets
between those fragments.
Too much header overhead over payload can also influence efficiency. To
improve that situation, compression can be utilized.
160 IBM b-type Data Center Networking: Design and Best Practices Introduction
Traffic shaping and policing
Shaping is used to limit full bandwidth of the particular traffic flow and is mainly
used to prevent overflow situations. Shaping can be used in the links to the
remote sites that have lower capacity as a link to the main site. For example, in a
hub-spoke model, you can have a 1 Gbps link from a central site and 128 Kbps
from remote sites. In such a case, traffic from the central site can overflow links to
the remote sites. Shaping the traffic is a perfect way to pace traffic to the
available link capacity. In the case of traffic shaping, traffic above the configured
rate is buffered for later transmission.
Policing is very similar to shaping. It only differs in one very important way:
All traffic that exceeds the configured rate is not buffered and it is normally
discarded.
Basic end-to-end QoS can be provided across the network in three ways:
Best-effort service: Such a service is also known as a service without QoS.
This service provides connectivity without any guarantees. This can be best
characterized by the FIFO type of queues, which does not differentiate
between flows.
Differentiated service (soft QoS): Some of the traffic is treated with priority
over the rest, that is, faster handling, more bandwidth, and lower loss rate,
and is statistical preference, not a hard guarantee. Such QoS is provided by
classification of traffic and applying QoS tools as mentioned above.
Guaranteed service (hard QoS): With this service, network resources are
explicitly reserved for a specific traffic. For example, guaranteed service can
be achieved with RSVP.
Setting up the QoS in the networking environment is not a one-time action. QoS
has to evolve alongside changes that happen in the networking infrastructure
and it must be adjusted accordingly. QoS needs to become an integral part of
network design.
IBM b-type networking products use two software platforms: FastIron (s-series
and g-series) and NetIron (m-series and c-series). QoS and rate limiting
implementation depends on the software used.
162 IBM b-type Data Center Networking: Design and Best Practices Introduction
7.4.1 FastIron QoS implementation
In FastIron, the QoS feature is used to prioritize the use of bandwidth in a switch.
When QoS is enabled, traffic is classified when it arrives at the switch and it is
processed based on the configured priorities. Based on that, traffic can be:
Dropped
Prioritized
Guaranteed delivery
Subject to limited delivery
When the packet enters the switch, it is classified. After a packet or traffic flow is
classified, it is mapped to a forwarding priority queue.
FastIron based devices classify packets into eight traffic classes with values
between 0 to 7. Packets with higher priority get precedence for forwarding.
Classification
Processing of classified traffic is based on the trust level, which is in effect on the
interface. Trust level is defined based on the configuration setup and if the traffic
is switched or routed. Trust level can be one of the following possibilities:
Ingress port default priority
Static MAC address
Layer 2 Class of Service (CoS) value: This is the 802.1p priority value in the
tagged Ethernet frame. It can be a value from 0 – 7. The 802.1p priority is
also called the Class of Service.
Layer 3 Differentiated Service Code Point (DSCP): This is the value in the six
most significant bits of the IP packet header’s 8-bit DSCP field. It can be a
value from 0 – 63. These values are described in RFCs 2472 and 2475. The
DSCP value is sometimes called the DiffServ value. The device automatically
maps a packet's DSCP value to a hardware forwarding queue.
ACL keyword: An ACL can also prioritize traffic and mark it before sending it
along to the next hop.
Because there are several criteria, there are multiple possibilities as to how the
traffic can be classified inside a stream of network traffic. Priority of the packet is
resolved based on criteria precedence.
164 IBM b-type Data Center Networking: Design and Best Practices Introduction
As defined in Figure 7-1, the trust criteria are evaluated in the following order:
1. ACL defining the priority; in this case, the ACL marks the packet before
sending it along to the next hop.
2. 802.1p Priority Code Point (PCP) when the packet is tagged according to
802.1Q definition.
3. Static MAC address entry
4. Default port priority
0-7 0 0 0 (qosp0)
8 - 15 1 1 1 (qosp1)
16 - 23 2 2 2 (qosp2)
24 - 31 3 3 3 (qosp3)
32 - 39 4 4 4 (qosp4)
40 - 47 5 5 5 (qosp5)
48 - 55 6 6 6 (qosp6)
56 - 63 7 7 7 (qosp7)
The mapping between the DSCP value and forwarding queue cannot be
changed. However, the mapping between DSCP values and the other properties
can be changed as follows:
DSCP to Internal Forwarding Priority Mapping: Mapping between the DSCP
value and the Internal Forwarding priority value can be changed from the
default values shown in Table 7-1. This mapping is used for CoS marking and
determining the internal priority when the trust level is DSCP.
Internal Forwarding Priority to Forwarding Queue: The internal forwarding
priority can be reassigned to a different hardware forwarding queue.
1 qosp1
2 qosp2
3 qosp3
4 qosp4
5 qosp5
6 qosp6
If those priorities are not set, all traffic will be by default placed in a “best-effort
queue”, which is the queue with priority 0 (qosp0).
It is possible that if a packet qualifies for an adjusted QoS priority based on more
than one criteria, the system will always give a packet the highest priority for
which it qualifies.
QoS marking
QoS marking is the process of changing the packet’s QoS information for the
next hop.
In the marking process, the 802.1p (Layer 2) and DSCP (Layer 3) marking
information can be changed and this is achieved by using ACLs. It is possible to
mark Layer 2 802.1p (CoS) value, Layer 3 DSCP value or both values. Marking is
not enabled by default.
Marking can be used in cases when traffic is coming from the device that does
not support QoS marking and we want to enable the use of QoS on the traffic.
166 IBM b-type Data Center Networking: Design and Best Practices Introduction
DSCP based QoS
FastIron devices supports basic DSCP based QoS also called Type of Service
(ToS) based QoS.
FastIron also supports marking of the DSCP value. FastIron devices can read
Layer 3 QoS information in an IP packet and select a forwarding queue for the
packet based on the information. It interprets the value in the six most significant
bits of the IP packet header’s 8-bit ToS field as a Diffserv Control Point (DSCP)
value, and maps that value to an internal forwarding priority.
The internal forwarding priorities are mapped to one of the eight forwarding
queues (qosp0 – qosp7) on the FastIron device. During a forwarding cycle, the
device gives more preference to the higher numbered queues, so that more
packets are forwarded from these queues. So, for example, queue qosp7
receives the highest preference; while queue qosp0, the best-effort queue,
receives the lowest preference. Note the following considerations:
DSCP based QoS is not automatically enabled, but can be, by using ACLs.
On g-series switches, DSCP is activated on a per port basis.
On s-series switches, DSCP is activated with the use of ACLs.
QoS mappings
To achieve more granular QoS management, it is possible to change the
following QoS mappings:
DSCP to internal forwarding priority
Internal forwarding priority to hardware forwarding queue
IronStack function reserves one QoS profile for providing higher priority for stack
topology and control traffic. Internal priority 7 is reserved for this purpose and it
cannot be reconfigured for any other purpose.
168 IBM b-type Data Center Networking: Design and Best Practices Introduction
Note: By default, the b-type devices do the 802.1p to CoS mapping. If you
want to change the priority mapping to DSCP to CoS mapping, this can be
achieved with the ACL.
On s-series switches, marking and prioritization can be done inside one ACL
rule. On the g-series switches only one option can be used inside one ACL rule:
802.1.p-priority marking
dscp-marking
Internal-priority.marking
qosp0 0
qosp1 1
qosp2 2
qosp3 3
qosp4 4
qosp5 5
qosp6 6
qosp7 7
Both markings can be applied in one ACL. Internal priority marking is optional
and if not specified separately it will default to the value 1, which means traffic will
be mapped to the qosp1 forwarding queue.
Scheduling
Scheduling is the process of mapping a packet to an internal forwarding queue
based on its QoS information, and servicing the queues according to some kind
of mechanism.
170 IBM b-type Data Center Networking: Design and Best Practices Introduction
Note: In stacking mode, qosp7 queue is reserved as Strict Priority under
weighted queuing. Attempts to change the qosp7 setting will be ignored.
WRR is the default queuing method and uses a default set of queue weights.
The number of packets serviced during each visit to a queue depends on the
percentages you configure for the queues. The software automatically
converts the percentages you specify into weights for the queues.
Note: Queue cycles on the s-series and g-series switches are based on
bytes. These devices service a given number of bytes (based on weight) in
each queue cycle.
The default minimum bandwidth allocation for WRR is shown in Table 7-4.
qosp6 7% 8%
qosp5 3% 8%
qosp4 3% 8%
qosp3 3% 8%
qosp2 3% 8%
qosp1 3% 8%
qosp0 3% 8%
Strict Priority (SP): This method ensures service for high priority traffic. The
software assigns the maximum weights to each queue, to cause the queuing
mechanism to serve as many packets in one queue as possible before
moving to a lower queue. This method biases the queuing mechanism to
favor the higher queues over the lower queues.
For example, strict queuing processes as many packets as possible in qosp3
before processing any packets in qosp2, then processes as many packets as
possible in qosp2 before processing any packets in qosp1, and so on.
qosp5 25%
qosp4 15%
qosp3 15%
qosp2 15%
qosp1 15%
Queue 7 supports only SP, whereas queue 6 supports both SP and WRR, and
queues 5 - 0 support only WRR as the queuing mechanism.
The type of the queuing method is defined globally on the device. Queues can be
renamed and weight of the queues can be modified to meet specific
requirements.
When the weight of the queues is modified the total amount must be 100%.
Minimum bandwidth percentage of 3% for each priority. When jumbo frames are
enabled, the minimum bandwidth requirement is 8%. If these minimum values
are not met, QoS might not be accurate.
172 IBM b-type Data Center Networking: Design and Best Practices Introduction
7.4.2 FastIron fixed rate limiting and rate shaping
In this section, we discuss FastIron rate limiting and rate shaping.
The maximum number of bytes is specified in kilobits per second (Kbps). The
fixed rate limiting policy applies to one-second intervals and allows the port to
send or receive the number of bytes specified in the policy. All additional bytes
are dropped. Unused bandwidth is not carried over from one interval to another.
Table 7-8 shows where inbound and outbound rate limiting is supported or not
supported.
GbE Inbound/outbound
10-GbE Inbound/outbound
The maximum number of bytes is specified in bits per second (bps). The fixed
rate limiting policy applies to one-second intervals and allows the port to receive
the number of bits specified in the policy. All additional bytes are dropped.
Unused bandwidth is not carried over from one interval to another.
174 IBM b-type Data Center Networking: Design and Best Practices Introduction
s-series rate shaping
Outbound rate shaping is a port-level feature that is used to shape the rate and to
control the bandwidth of outbound traffic. The rate shaping feature smooths out
excess and bursty traffic to the configured maximum before it is sent out on a
port. Packets are stored in available buffers and then forwarded at a rate which is
not greater than the configured limit. This approach provides better control over
the inbound traffic on the neighboring devices.
Note: It is best not to use fixed rate limiting on ports that receive route control
traffic or Spanning Tree Protocol (STP) control traffic. Dropped packets due to
the fixed rate limiting can disrupt routing or STP.
An s-series switch has one global rate shaper for a port and one rate shaper for
each port priority queue. Rate shaping is done on a single-token basis, where
each token is defined to be 1 byte.
After the one-second interval is complete, the port clears the counter and
re-enables traffic.
Figure 7-2 shows an example of how fixed rate limiting works. In this example, a
fixed rate limiting policy is applied to a port to limit the inbound traffic to 500000
bits (62500 bytes) a second. During the first two one-second intervals, the port
receives less than 500000 bits in each interval. However, the port receives more
than 500000 bits during the third and fourth one-second intervals, and
consequently drops the excess traffic.
Bytes are counted by polling statistics counters for the port every 100
milliseconds, which gives 10 readings per second. With such a polling interval,
the fixed rate limiting policy has 10% accuracy within the port’s line rate. As a
result, it is possible that in same cases, policy allows more traffic than the
specified limit, but the extra traffic is never more than 10% of the line rate for the
port’s line rate.
176 IBM b-type Data Center Networking: Design and Best Practices Introduction
ACL based rate limiting policy
The g-series and s-series switches support IP ACL based rate limiting of inbound
traffic. For s-series switches this is available for Layer 2 and Layer 3.
The ACL based rate limiting is achieved with traffic policies which are applied on
ACLs. The same traffic policies can be applied to multiple ACLs. Traffic policies
become effective on the ports to which ACLs are bound.
Traffic policies consists of the policy name and policy definition as follows:
Traffic policy name: Identifies traffic policy and can be in the form of a string
with up to 8 alphanumeric characters.
Traffic policy definition (TPD): Can be any of the following policies:
– Rate limiting policy
– ACL counting policy
– Combined rate limiting and ACL counting policy
16 - 23 2
24 - 31 3
32 - 39 4
40 - 47 5
48 - 55 6
After TPD is defined and referenced in the ACL entry, then applied on the ACL
to a VE in the Layer 3 router code, the rate limit policy is accumulative for all of
the ports in the port region. If the VE/VLAN contains ports that are in different
port regions, the rate limit policy is applied per port region.
178 IBM b-type Data Center Networking: Design and Best Practices Introduction
When a traffic policy for rate limiting is configured, the device automatically
enables rate limit counting, similar to the two-rate three-color marker (trTCM)
mechanism described in RFC 2698 for adaptive rate limiting, and the single-rate
three-color marker (srTCM) mechanism described in RFC 2697 for fixed rate
limiting. This counts the number of bytes and trTCM or srTCM conformance level
per packet to which rate limiting traffic policies are applied.
ACL based rate limiting can be defined on the following interface types:
Physical Ethernet interfaces
Virtual interfaces
Trunk ports
Specific VLAN members on a port
Subset of the ports on a virtual interface
Table 7-10 shows configurable parameters for ACL based adaptive rate limiting.
Committed Information Rate The guaranteed kilobit rate of inbound traffic that is
(CIR) allowed on a port.
Committed Burst Size (CBS) The number of bytes per second allowed in a burst
before some packets will exceed the committed
information rate. Larger bursts are more likely to
exceed the rate limit. The CBS must be a value
greater than zero (0). It is best that this value be
equal to or greater than the size of the largest
possible IP packet in a stream.
Peak Information Rate (PIR) The peak maximum kilobit rate for inbound traffic
on a port. The PIR must be equal to or greater than
the CIR.
Peak Burst Size (PBS) The number of bytes per second allowed in a burst
before all packets will exceed the peak information
rate. The PBS must be a value greater than zero
(0). It is best that this value be equal to or greater
than the size of the largest possible IP packet in the
stream.
If a port receives more than the configured bit or byte rate in a one-second
interval, the port will either drop or forward subsequent data in hardware,
depending on the action specified.
ACL counting enables the switch to count the number of packets and the number
of bytes per packet to which ACL filters are applied.
ACL counting enables the switch to count the number of packets and the number
of bytes per packet to which ACL filters are applied.
Rate limit counting counts the number of bytes and conformance level per packet
to which rate limiting traffic policies are applied. The switch uses the counting
method similar to the two-rate three-color marker (trTCM) mechanism described
in RFC 2698 for adaptive rate limiting, and the single-rate three-color marker
(srTCM) mechanism described in RFC 2697 for fixed rate limiting. Rate limit
counting is automatically enabled when a traffic policy is enforced (active).
180 IBM b-type Data Center Networking: Design and Best Practices Introduction
7.4.4 NetIron m-series QoS implementation
In this section we cover QoS implementation on m-series routers. The NetIron
m-series QoS processing can be divided into two major areas:
Ingress traffic processing through an m-series router
Egress traffic processing exiting m-series router
2. Derive priority and drop precedence from the packets EXP value.
3. Derive priority and drop precedence from the packets DSCP value.
4. Merge or force the priorities defined in steps 1 through 3.
5. Merge or force the priority and drop precedence value based on the value
configured for the physical port.
6. Merge or force the priority value based on the value configured for the VLAN.
7. Merge or force the priority value based on an ACL look-up. This is used for
setting a specific priority for L2, L3 or L4 traffic flow.
182 IBM b-type Data Center Networking: Design and Best Practices Introduction
2. In the second step, the router determines if the priority value must be forced
or merged. The following actions are possible in this step:
– If a packet’s EtherType matches 8100, or the port’s EtherType, derive a
priority value and drop precedence by decoding the PCP value
– If PCP forcing is configured on the port, the priority and drop precedence
values are set to the value read from the PCP bits
– If EXP forcing is configured on the port, the priority and drop precedence
values are set to the value read from the MPLS EXP bits
– If DSCP forcing is configured on the port, the priority and drop precedence
values are set to the value read from the DSCP bits
– If there is no forcing configured on the port the following rules apply:
• For IPv4/v6 packets - priority and drop precedence values are obtained
as a merge of the decoded PCP and decoded DSCP values.
• For MPLS packets - priority and drop precedence values are obtained
as a merge of the decoded PCP and decoded EXP values.
There are several ways of how to “force” the priority of a packet based on the
following criteria:
Forced to a priority configured for a specific ingress port
Forced to a priority configured for a specific VLAN
Forced to a priority that is obtained from the DSCP priority bits
Forced to a priority that is obtained from the EXP priority bits
Forced to a priority that is obtained from the PCP priority bits
Forced to a priority that is based on an ACL match
There are several ways how to “force” the drop precedence of a packet based on
the following criteria:
Forced to a priority configured for a specific ingress port
Forced to a priority configured for a specific VLAN
Forced to a priority that is obtained from the DSCP priority bits
Forced to a priority that is obtained from the EXP priority bits
Forced to a priority that is obtained from the PCP priority bits
Forced to a priority that is based on an ACL match
184 IBM b-type Data Center Networking: Design and Best Practices Introduction
Traffic processing is shown in Figure 7-4.
Egress decode policy map: The QoS value that a packet carries in its header
when it exits an m-series router on an egress interface is determined by a
specified mapping. Unless configured, this value once determined is placed in an
internal queue by using one of the default maps. Alternately, the following
alternate mappings can be defined:
PCP decode map: This map defines how to map internal priority and drop
precedence value of a packet into the PCP code point.
DSCP decode map: This map defines how to map internal priority and drop
precedence value of a packet into the DSCP code point.
EXP decode map: This map defines how to map internal priority and drop
precedence value of a packet into the EXP code point.
186 IBM b-type Data Center Networking: Design and Best Practices Introduction
Configuring QoS
To successfully implement QoS in m-series routers, you need to perform the
following procedures on ingress and egress QoS processing.
The WRED algorithm is applied to the traffic on all individual internal queues
(0-7) based upon parameters configured for its assigned queue type. When
traffic arrives at a queue, it is passed or dropped as determined by the WRED
algorithm. Packets in an individual queue are further differentiated by one of four
drop precedence values which are determined by the value of bits 3:2 of the
TOS/DSCP bits in the IPv4 or IPv6 packet header as shown in Figure 7-5.
188 IBM b-type Data Center Networking: Design and Best Practices Introduction
Scheduling traffic for forwarding
If the traffic being processed by an m-series router is within the capacity of the
router, all traffic is forwarded as received.
When the point is reached where the router is bandwidth constrained, it becomes
subject to drop priority or traffic scheduling if so configured.
The m-series routers classify packets into one of eight internal priorities. Traffic
scheduling allows you to selectively forward traffic according to the forwarding
queue that is mapped to, according to one of the following schemes:
Strict priority-based scheduling: This scheme guarantees that higher-priority
traffic is always serviced before lower priority traffic. The disadvantage of
strict priority-based scheduling is that lower-priority traffic can be starved of
any access.
WFQ weight-based traffic scheduling: With WFQ destination-based
scheduling enabled, some weight based bandwidth is allocated to all queues.
With this scheme, the configured weight distribution is guaranteed across all
traffic leaving an egress port and an input port is guaranteed allocation in
relationship to the configured weight distribution.
Mixed strict priority and weight-based scheduling: This scheme provides a
mixture of strict priority for the three highest priority queues and WFQ for the
remaining priority queues.
Note: Because excess traffic is buffered, rate shaping must be used with
caution. In general, it is not advisable to rate shape delay-sensitive traffic.
The m-series routers support egress rate shaping. Egress rate shaping is
supported per port or for each priority queue on a specified port.
0 - 10M 8,333
Note: The egress rate shaping burst size for a port-based shaper is 10000
bytes.
Note: The egress rate shaping burst size for a port and priority-based shaper
is 3072 bytes.
190 IBM b-type Data Center Networking: Design and Best Practices Introduction
The m-series router can be configured to use one of the following modes of traffic
policing policies:
Port-based: Limits the rate on an individual physical port to a specified rate.
Only one inbound and one outbound port-based traffic policing policy can be
applied to a port. These policies can be applied to inbound and outbound
traffic.
Port-and-priority-based: Limits the rate on an individual hardware forwarding
queue on an individual physical port. Only one port-and-priority-based traffic
policing policy can be specified per priority queue for a port. These policies
can be applied to inbound and outbound traffic.
VLAN-based: Untagged packets as well as tagged packets can be
rate-limited. Only one rate can be specified for each VLAN. Up to 990
VLAN-based policies can be configured for a port under normal conditions or
3960 policies if priority based traffic policing is disabled. These policies can
be applied to inbound and outbound traffic.
VLAN group based: Limits the traffic for a group of VLANs. Members of a
VLAN group share the specified bandwidth defined in the traffic policing policy
that has been applied to that group. Up to 990 VLAN group-based policies
can be configured for a port under normal conditions or 3960 policies if
priority-based traffic policing is disabled. These policies can only be applied to
inbound traffic.
Port-and-ACL-based: Limits the rate of IP traffic on an individual physical port
that matches the permit conditions in IP Access Control Lists (ACLs). Layer 2
ACL-based traffic policing is supported. Standard or extended IP ACLs can
be used. Standard IP ACLs match traffic based on source IP address
information. Extended ACLs match traffic based on source and destination IP
address and IP protocol information. Extended ACLs for TCP and UDP also
match on source and destination TCP or UDP addresses. and protocol
information. These policies can be applied to inbound and outbound traffic.
Up to 990 Port-and-ACL based policies can be configured for a port under
normal conditions or 3960 policies if priority-based traffic policing is disabled.
Rate limiting for copied-CPU-bound traffic: The rate of Copied-CPU-bound
packets from applications such as sFlow, ACL logging, RPF logging, and
source MAC address learning (with known destination address) can be
limited. Copied-CPU-bound packets are handled and queued separately from
packets destined to the CPU such as protocol packets and using this feature
they can be assigned to one of eight priority queues which has a rate limit
assigned to it. The queue and rate are assigned by port and apply to all of the
ports that are supported by the same packet processor.
PPCR1 PPCR2
20 x 1 Gbps 1 - 20
When traffic exceeds the bandwidth that has been reserved for it by the CIR rate
defined in its policy, it becomes subject to the CBS rate. The CBS rate provides a
rate higher than the CIR rate to traffic that exceeded its CIR rate. The bandwidth
in the CBS rate is accumulated during periods of time when traffic that has been
defined by a policy does not use the full CIR rate available to it. Traffic is allowed
to pass through the port for a short period of time at the CBS rate.
When inbound or outbound traffic exceeds the bandwidth available for the
defined CIR and CBS rates, it is either dropped, or made subject to the
conditions set in the EIR bucket.
192 IBM b-type Data Center Networking: Design and Best Practices Introduction
Like the CIR, the EIR provides an initial bandwidth allocation to accommodate
inbound and outbound traffic. If the bandwidth provided by the EIR is insufficient
to accommodate the excess traffic, the defined EBS rate provides for burst traffic.
Like the CBS, the bandwidth available for burst traffic from the EBS is subject to
the amount of bandwidth that is accumulated during periods of time when traffic
that has been allocated by the EIR policy is not used.
In addition, to providing additional bandwidth for traffic that exceeds that available
for the CIR bucket, traffic rate limited by the EIR bucket can have its excess
priority and excess dscp values changed. Using this option, priority parameters
are set following the EBS value that change the priority of traffic that is being rate
limited using the EIR bucket.
Configuration considerations
The following considerations apply:
Only one type of traffic policing policy can be applied on a physical port. For
example, port and-ACL-based and port-based traffic policing policies cannot
be applied on the same port.
When a VLAN-based traffic policing policy is applied to a port, all the ports
controlled by the same packet processor are rate limited for that VLAN. It is
not possible to apply a VLAN-based traffic policing policy on another port of
the same packet processor for the same VLAN ID.
The Multi-Service IronWare software supports VLAN-based traffic policing
that can limit tagged and untagged packets that match the VLAN ID specified
in the policy. Untagged packets are not subject to traffic policing.
The maximum burst in a traffic policing policy cannot be less than the average
rate and cannot be more than the port’s line rate.
Control packets are not subject to traffic policing.
Source MAC address with Virtual Leased Line (VLL) endpoints are not
subject to traffic policing.
Traffic types
The c-series device uses the following traffic types:
Data: The data packets can be either Network-to-Network traffic or traffic from
the CPU. Network-to-Network traffic is considered data traffic. QoS
parameters can be assigned and modified for data traffic.
194 IBM b-type Data Center Networking: Design and Best Practices Introduction
Control: Packets to and from the CPU is considered control traffic. The QoS
parameters for this traffic are preassigned and not configurable.
Each of the ingress pipeline engines contain several Initial QoS Markers that
assign the packet’s initial QoS attribute.
The ingress pipeline engine also contains a QoS Remarker that can modify the
initial QoS attributes.
Even though c-series device supports four drop precedence values 0,1,2 and 3
internally 1 and 2 are assigned the same drop precedence level. The four levels
are kept for CLI compatibility with other products of NetIron family. Three internal
level of drop precedence are 0, {1,2} and 3. in terms of commonly used color
based terminology: 0 represents Green (lowest drop precedence}, 1 and 2
represents yellow (higher drop precedence) and 3 represents Red (highest drop
precedence).
TC (Traffic Class) This is the priority level assigned to the packet. When the
TxQ enqueues the packet, it uses this field to select the
appropriate priority queue.
DC (Drop Precedence) The TxQ uses this field for congestion resolution. Packets
with higher drop precedence are more likely to be discarded
in the event of congestion.
2. Derive priority and drop precedence from the packets DSCP value.
3. Force the priority and drop precedence value based on the value configured
for the physical port.
4. Force the priority value based on an ACL look-up. This is used for setting a
specific priority for and L2, L3 or L4 traffic flow.
The derived values for PCP, and DSCP are mapped to a default map.
In the second step, the router determines if priority value must be forced. The
following actions are possible in this step:
If a packet’s EtherType matches 8100, or the port’s EtherType, derive a
priority value and drop precedence by decoding the PCP value
if PCP forcing is configured on the port, the priority and drop precedence
values are set to the value read from the PCP bits
If DSCP forcing is configured on the port, the priority and drop precedence
values are set to the value read from the DSCP bits
If there is no forcing configured on the port the following rules apply:
– For tagged and untagged IPv4 packets - priority and drop precedence
values are obtained from decoded DSCP values.
196 IBM b-type Data Center Networking: Design and Best Practices Introduction
– For tagged non-IPv4 packets - priority and drop precedence values are
obtained from decoded PCP values.
– For untagged non-IPv4 packets - priority and drop precedence values
obtained from priority and drop precedence assigned on a port. If no
priority and drop precedence is assigned on a port default value of Priority
0 and Drop Precedence 0 is picked.
There are several ways to “force” the priority of a packet based on the following
criteria:
Forced to a priority configured for a specific ingress port.
Forced to a priority that is obtained from the DSCP priority bits.
Forced to a priority that is obtained from the PCP priority bits.
Forced to a priority that is based on an ACL match.
There are several ways how to “force” the drop precedence of a packet based on
the following criteria:
Forced to a priority configured for a specific ingress port.
Forced to a priority that is obtained from the DSCP priority bits.
Forced to a priority that is obtained from the PCP priority bits.
Forced to a priority that is based on an ACL match.
The QoS value that a packet carries in its header when it exits a c-series device
on an egress interface is determined by a specified mapping. Unless configured,
this value, once determined, is placed in an internal queue by using one of the
default maps. This can be either enabled or disabled.
Configuring QoS
To successfully implement QoS in an m-series router the following procedures
need to be performed on ingress and egress QoS processing.
198 IBM b-type Data Center Networking: Design and Best Practices Introduction
Egress QoS procedures
The following egress procedures are available:
Egress encode policy on of off: To be used to map the internal priority to a
packet header when it exits the device.
Support for QoS configuration on LAG ports: To be used when using
enhanced QoS on ports within a LAG
After the point is reached where the router is bandwidth constrained, it becomes
subject to drop priority or traffic scheduling if so configured.
The m-series routers classify packets into one of eight internal priorities. Traffic
scheduling allows to selectively forward traffic according to the forwarding queue
that is mapped to, according to one of the following schemes:
Strict priority-based scheduling: This scheme guarantees that higher-priority
traffic is always serviced before lower priority traffic. The disadvantage of
strict priority-based scheduling is that lower-priority traffic can be starved of
any access.
WFQ weight-based traffic scheduling: With WFQ destination-based
scheduling enabled, some weight based bandwidth is allocated to all queues.
With this scheme, the configured weight distribution is guaranteed across all
traffic leaving an egress port and an input port is guaranteed allocation in
relationship to the configured weight distribution.
Mixed strict priority and weight-based scheduling: This scheme provides a
mixture of strict priority for the three highest priority queues and WFQ for the
remaining priority queues.
Note: Because excess traffic is buffered, rate shaping must be used with
caution. In general, it is not advisable to rate shape delay-sensitive traffic.
Note: The egress rate shaping burst size for a port and priority-based shaper
is 4096 bytes.
The c-series router can be configured to use one of the following modes of traffic
policing policies:
Port-based: Limits the rate on an individual physical port to a specified rate.
Only one inbound and one outbound port-based traffic policing policy can be
applied to a port. These policies can be applied to inbound and outbound
traffic.
Port-and-ACL-based: Limits the rate of IP traffic on an individual physical port
that matches the permit conditions in IP Access Control Lists (ACLs). Layer 2
ACL-based traffic policing is supported. Standard or extended IP ACLs can
be used. Standard IP ACLs match traffic based on source IP address
information. Extended ACLs match traffic based on source and destination IP
address and IP protocol information. Extended ACLs for TCP and UDP also
match on source and destination TCP or UDP addresses. and protocol
information. These policies can be applied to inbound and outbound traffic.
Up to 990 Port-and-ACL based policies can be configured for a port under
normal conditions or 3960 policies if priority-based traffic policing is disabled.
200 IBM b-type Data Center Networking: Design and Best Practices Introduction
Traffic is initially traffic policed by a Committed Information Rate (CIR) bucket.
Traffic that is not accommodated in the CIR bucket is then subject to the Excess
Information Rate (EIR) bucket.
When traffic exceeds the bandwidth that has been reserved for it by the CIR rate
defined in its policy, it becomes subject to the CBS rate. The CBS rate provides a
rate higher than the CIR rate to traffic that exceeded its CIR rate. The bandwidth
in the CBS rate is accumulated during periods of time when traffic that has been
defined by a policy does not use the full CIR rate available to it. Traffic is allowed
to pass through the port for a short period of time at the CBS rate.
When inbound or outbound traffic exceeds the bandwidth available for the
defined CIR and CBS rates, it is either dropped, or made subject to the
conditions set in it EIR bucket.
In addition to providing additional bandwidth for traffic that exceeds that available
for the CIR bucket, traffic rate limited by the EIR bucket can have its excess
priority and excess dscp values changed. Using this option, priority parameters
are set following the EBS value that change the priority of traffic that is being rate
limited using the EIR bucket.
202 IBM b-type Data Center Networking: Design and Best Practices Introduction
8
VoIP is one of those value add applications bringing additional resource and
service requirements to the existing networking infrastructure. Certain key
elements must be provided by the networking infrastructure for a successful VoIP
implementation.
IBM s-series and g-series switches, with their support for PoE, enhanced QoS,
enhanced VoIP features such as dual-mode and voice VLANs, enhanced
security and storage features, are well positioned for successful VoIP
implementations.
204 IBM b-type Data Center Networking: Design and Best Practices Introduction
8.2 Architecture overview
In this section we describe the architecture for the VoIP infrastructure using IBM
b-type networking switches.
There are two main ways that the traffic flows in VoIP environments:
VoIP traffic inside the enterprise: In this scenario, all parties in the VoIP call
are using devices that are connected to the enterprise networking
infrastructure.
VoIP traffic for external calls: in this case, one side of the VoIP call is outside
the enterprise networking infrastructure. The outside path can be either over
a PSTN gateway to classical phone exchange, or by a VoIP gateway to
another VoIP gateway.
We can see all the components of the typical network infrastructure with VoIP
infrastructure elements:
VoIP phones: The IP phones are used to initiate or receive the calls.
The typical call flow between two IP phones within the enterprise will have the
following steps, also shown in Figure 8-1 on page 205:
1. Phone A requests a call setup to the call manager. The call setup message is
sent from phone A to the call manager. The call manager identifies the IP
address of phone B.
2. The call manager initiates a call to phone B. In this step the call manager
sends a call setup message to phone B and A confirming the call.
3. During the call, phone A talks to phone B.
When the call is established, traffic flow is local, depending where the phones are
physically connected. There are two possible scenarios:
Both phones connected to the same physical access layer switch: In this case
VoIP traffic during the call is processed inside the access switch.
Phones are connected to different physical access layer switches: In this case
VoIP traffic during the call is traveling across the aggregation/core layer.
Note: Communication to the call manager is only active during call setup.
The VoIP network infrastructure must use QoS to provide low jitter and latency
for the VoIP traffic. It is important to understand that the right QoS is used to
prioritize both call setup and actual call traffic throughout the entire network
infrastructure.
IBM s-type and g-type switches provide all required capabilities for providing QoS
to server VoIP calls. QoS functions are also available on m-type routers which
can be used in aggregation/core layer of the infrastructure.
206 IBM b-type Data Center Networking: Design and Best Practices Introduction
8.2.2 VoIP call flow for external calls
Figure 8-2 shows the call flow with an external VoIP call.
The infrastructure for the external call is the same as described in 8.2.1, “VoIP
call flow inside enterprise” on page 205.
The typical call flow between the IP phone in the enterprise and external entity
will have the following steps, as shown in Figure 8-1 on page 205:
1. Phone A requests a call setup to the call manager. Call setup message is sent
from phone A to the call manager. Call manager identifies that this is an
external call.
2. Call manager initiates a call setup message over the gateway to the external
(PSTN) network. After this, call manager sends a call setup message to
phone A confirming the call.
3. Phone A can now make a call to the external (PSTN) network. During the call,
VoIP traffic is forwarded to the external (PSTN) network.
Just as for VoIP calls inside the enterprise, the network infrastructure must
provide adequate resources. In cases of external calls, the VoIP traffic is flowing
from the endpoint of the VoIP phone to the gateway that connects to the external
(PSTN) network. VoIP traffic must be prioritized on the whole path.
IBM FastIron based switches (s-series and g-series) provide a rich set of features
in their QoS set which can help to manage the QoS correctly and thus enforce
strict QoS policies.
There are several possible scenarios regarding how to handle VoIP traffic:
VoIP devices are already marking the QoS priority (802.1p) and network
equipment is honoring this marking.
VoIP devices are marking the QoS priority, but network managers want to
change the priority on the path through their environment.
VoIP devices are not marking the QoS priority and network managers need to
provide priority for the traffic.
Based on these conditions, in the following sections we describe the options that
are available on IBM FastIron based switches.
Default behavior
In this case, the IBM FastIron based switches will honor the 802.1p priority set by
the VoIP device. There is no need for additional QoS setup if the VoIP device
priority setting is sufficient to provide good enough quality for the VoIP traffic.
208 IBM b-type Data Center Networking: Design and Best Practices Introduction
When a packet enters and exits the switch by a tagged port, its 802.1p value will
be maintained. For example, if the packet had an 802.1p value of 5, it will exit the
switch with the same value (if there was no QoS configuration applied on this
packet).
When the packet is untagged, it has a default priority of zero and it will be queued
to the queue with the lowest priority. In FastIron switch terminology, this queue is
called qosp0.
Even if this packet will exit on the tagged port, the 802.1p value will be set to
zero, unless QoS configuration was applied to modify the 802.1p priority.
The FastIron switches have eight internal hardware queues (each queue has a
value that ranges from 0-7), and each 802.1p will map the packet in one of the
queues so that the switch can forward the traffic according to the queue priority.
This mapping is shown in Table 8-1.
7 qosp7
6 qosp6
5 qosp5
4 qosp4
3 qosp3
2 qosp2
1 qosp1
0 qosp0
Port priority
Port priority can be applied on a per port basis on all s-series and g-series
switches. With this setting, all traffic coming into the port, with port priority set,
will be subject to this internal priority mapping.
The internal priority mapping maps the traffic to the internal queues, and these
are the same as described in Table 8-1.
The port priority setting will never affect the DSCP value of the packet. It is only
used for internal prioritization for egress queuing and to set the 802.1p value
when the packet comes in untagged and leaves by a tagged interface.
To remark a VoIP phone 802.1p priority, FastIron based switches offer two ways
to identify the device:
VoIP phone IP address
VoIP phone DSCP value
Based on these parameters, the packet 802.1p priority can be remarked to the
desired value. On top of this, the internal priority (which will be used in the switch)
can be modified.
The 802.1p packet priority will also be used when the packet leaves the switch.
0-7 qosp0
8 - 15 qosp1
16 - 23 qosp2
24 - 31 qosp3
32 - 39 qosp4
40 - 47 qosp5
48 - 55 qosp6
56 - 63 qosp7
210 IBM b-type Data Center Networking: Design and Best Practices Introduction
DSCP remarking using ACLs
If required, the DSCP values can be remarked similarly to the *02.1p priority
values. In the remarking process, VoIP devices can be identified in two possible
ways:
VoIP phone IP address
VoIP phone DSCP value
VoIP traffic statistics can be by using traffic policy, or by using extended ACLs for
a VoIP phone matching IP or traffic.
Power reduction
The following options are available for the power reduction:
It is possible to configure exact value of delivered power in milliwatts
For devices which are 802.3af compliant the power can be configured by
802.3af power class:
– For class 1 devices, 4 watts will be available
– For class 2 devices, 7 watts will be available
– For class 3 devices, 15.4 watts will be available
IBM s-series switches can be configured not to support legacy PoE devices.
Power priority
It is possible to configure power port priority. With this you are able to specify that
some ports get a higher priority in a case when not enough PoE power is
available. Such a situation can happen when not enough PoE power is available
because of PoE power supply failures, or power line failures. With such a setup
most critical VoIP devices can get higher priority over less critical VoIP devices.
After the request is specified, the switch will allocate the required power if it has
enough power available. If the switch does not have enough power resources to
support the request, it will not allocate any power to the port.
If there is explicitly defined power delivery, by milliwatts or power class, this will
take precedence over the device power request. For example, if the port is
defined to support class 3 PoE devices (15.4W) and the device requests 10W,
then the port still delivers 15.4W of power.
212 IBM b-type Data Center Networking: Design and Best Practices Introduction
Dual mode ports
Usually VoIP provides a physical connection to the computer device. With such a
setup, only one switch port is required to connect the VoIP device and the
computer. An example is shown in Figure 8-3.
In this setup, traffic from the VoIP device is tagged and the traffic from the
computer device is untagged. In some cases it will be required that VoIP device
traffic is part of its own VLAN, and computer traffic is part of its own VLAN. In our
example in Figure 8-3, VoIP traffic is part of VLAN 20 and computer traffic is part
of VLAN 10.
With the dual mode port feature FastIron switches can support tagged and
untagged frames in different VLANs. In our example in Figure 8-3 port e1 is
configured in dual mode. In this setup the e1 port can transmit and receive
tagged VoIP traffic (which is part of VLAN 20) and at the same time it can classify
untagged traffic received on the port from the computer device to VLAN 10.
Dual mode can be configured on a per interface basis. If the VLAN ID parameter
is omitted any untagged packet received on the port will be classified to VLAN 1,
otherwise the packet will be classified as specified in VLAN ID.
When a port is configured to use voice VLAN and it receives a query from a VoIP
device it will respond back with the voice VLAN configured ID. After this the VoIP
device will reconfigure itself to use this VLAN ID for sending VoIP traffic.
Note: If the VoIP device does not generate a query to the switch, or it does not
support autoconfigure of VoIP VLAN ID, it cannot support the voice VLAN
feature.
Figure 8-4 shows the steps when the VoIP device is moved to another port:
1. The VoIP device is moved to another port on the same device.
2. The VoIP device sends query message to the switch.
3. The switch responds with the voice VLAN ID.
4. The VoIP device auto reconfigures itself for the use of provided VLAN ID.
214 IBM b-type Data Center Networking: Design and Best Practices Introduction
8.3.5 Security features for VoIP traffic
Several security features are available in VoIP implementations using IBM
FastIron based switches:
BPDU guard
Port flap dampening
MAC filter override for 802.1x enable ports
CALEA compliance with ACL based mirroring
Tracking VoIP data to multiple servers
BPDU guard
The Bridge Protocol Data Unit (BPDU) guard is used to remove loops which are
created by devices on a Layer 2 network. Some of the VoIP devices are known to
put themselves into loopback mode as shown in Figure 8-5.
In this scenario, the VoIP device is not 802.1x compliant and it will fail
authentication. With MAC filter override, if a device’s MAC address (the VoIP
device in this case) matches a “permit” clause, the device is not subjected to
802.1x authentication and it is permitted to send the traffic. Similarly, if a device
matches a “deny” clause it is blocked before being subjected to 802.1x
authentication.
216 IBM b-type Data Center Networking: Design and Best Practices Introduction
A setup for tracking of VoIP data to multiple servers is shown in Figure 8-7.
Chapter 9. Security
In this chapter we delve into the security options available in the IBM b-type
Ethernet product range, including the following options:
Console access
Port security
Spanning-tree security
VLAN
ACLs (Layer 2 and Layer 3 or Layer 4)
Anti-spoofing protection
Route authentication
Remote access security
Network device security can be considered to encompass the five (5) lowest OSI
layers:
Layer 1, physically secure the equipment.
Layer 2, secure link protocols such as MAC & spanning tree
Layer 3, secure logical access, as well as restricting IP connectivity
Layer 4, restrict the traffic available in segments
Layer 5, secure the available services on network equipment
Not all security options described below are available on all products. Be sure to
check your product configuration guide for availability of the security options that
you require.
Note: How to provide physical security is beyond the intended scope of this
book.
220 IBM b-type Data Center Networking: Design and Best Practices Introduction
Enable password
Although anyone with physical access to the devices can connect and see the
data in user mode, all sensitive information is displayed in encrypted format. To
ensure that configuration changes are not possible, an enable password must be
set. Without the enable password no configuration changes are possible. This
level of authorization allows configuration of interfaces.
Super-user password
To allow a console user to configure all parameters on the device the super-user
password is used.
Console time-out
By default, the console port will allow a user to remain logged in and active
indefinitely. This can result in a console being left connected in enable mode
unintentionally. It is preferable to set a reasonable time-out value for example 2
minutes. This will ensure that the console will not be left in enable mode by
mistake. The relevant product configuration guides provide more information
about this setting.
Port security allows the network administrator to restrict the number of MAC
addresses permitted on each port. The approved or secured, MAC addresses
can either be manually configured or learnt dynamically. In a static environment it
is even possible to have the learnt MAC addresses saved into the device startup
configuration file. It is important to ensure that your port security settings match
your business needs. With the deployment of VoIP it is not unusual to have two
(2) MAC addresses per interface, some computers also run virtualization
software that can also apply a unique MAC address for every virtual instance.
Desk flexibility will also create an environment where the MAC address will
change when a user connects their laptop to the port.
However, even in those cases, the workstation will need to be changed at some
point during technology refresh cycles. The environment that requires static MAC
addresses over lengthy periods of time is assumed to be an exception in today’s
dynamic workplace. Therefore, IBM advise the majority of environments who
chose this security option, to always configure a suitable age out timer for these
secure MAC addresses.
The MAC port security setting also has a default action to drop traffic from MAC
addresses not in the secure MAC list. Another configurable option is to shutdown
the interface for a defined time period, this will halt all data communications to
that port until the time period has expired. Regardless of which option is best for
your network, IBM advises the use of a suitable time-out value.
Consider the case of the mobile workforce and drop-in desks. Assume a fairly
dynamic workforce where it is possible for a desk to be used by more than four
(4) people in a work day (8 hours). In this case, setting port security to 5 dynamic
MAC addresses, with a time out of 120 minutes (2 hours), and a violation action
to shut down the port for 5 minutes can allow the port to be used for the expected
daily operations of the business. Any violation or change to the business will shut
the port down for only five (5) minutes.
Note: If using Port Security, ensure that the following precautions are in place:
Suitable maximum MAC addresses per port are configured.
Time out MAC addresses if using dynamic learning.
Set a time period for the violation action.
222 IBM b-type Data Center Networking: Design and Best Practices Introduction
In the case where the MAC address has not been registered, the Radius server
will reply with an Access-Reject message indicating the authentication has failed.
If the MAC address is not authenticated the default action is to drop all packets in
hardware. Alternatively, the network device can be configured to place the port
into a different, untrusted VLAN. For example, if a visitor to the site connects their
workstation to the port, they can be provided access to a guest VLAN that can
then provide limited access to the network.
The authentication feature can also be used to configure different devices into
different VLANs. For example the VoIP handset can be authenticated and
configured into the VoIP VLAN, while the workstation can be configured into the
corporate VLAN or the Guest VLAN.
The port remains in blocked state until the superior BPDU packets are no longer
seen and the time out value has expired. The port will then be placed back into
forwarding mode.
Important: STP Root Guard must only be deployed at the edge of the
network, it must never be deployed in the core.
BPDU guard restricts an end station from inserting BPDU packets into the
spanning tree topology. For an end station to attempt to participate in BPDU, it is
considered an attack, or a Layer 2 device has been inserted into the wrong port.
For this reason BPDU guard will block the port if a BPDU packet is received and
will not automatically enable the port. Instead the network administrator must
enable the port from the command line after ensuring the initiator of the rogue
BPDU has been fixed or identified.
Note: The network administrator must manually unblock a port where BPDU
Guard has blocked it. This is to allow the network administrator to identify the
cause of the unexpected BPDU packet and rectify it.
The IBM Ethernet switch LAN configuration allows VLANs to be defined with
tagged or untagged ports. Tagged ports allow 802.1Q tagged packets to be
received. Untagged ports will drop any packet with an 802.1Q tag.
224 IBM b-type Data Center Networking: Design and Best Practices Introduction
Dual mode ports: With the growing deployment of VoIP, it is common for an
access port to have a VoIP handset connected as well as a workstation
connected to the VoIP handset. Industry standard practice is to configure the
VoIP handset into one VLAN and the workstation in another VLAN. Furthermore,
the VoIP handset will typically tag the traffic with the VoIP VLAN ID using 802.1Q.
In these cases, the IBM Ethernet products allow each port to be configured as a
dual mode port.
9.3.7 VSRP
The Virtual Switch Redundancy Protocol (VSRP) can be configured to provide a
highly available Layer 2 switch environment as described in Chapter 6, “Network
availability” on page 141.
Each VSRP can be configured to use authentication there are two types of
authentication available:
None The default; does not use authentication
Simple Utilizes a simple text-string as a password
Whichever authentication method you use, all switches within that virtual group
must be configured with the same VRID and authentication method.
Layer 2 ACLs filter incoming traffic based on any of the following Layer 2 fields in
the MAC header:
Source MAC address
Destination MAC address
VLAN ID
Ethernet type.
The IBM Ethernet products can be configured with Dynamic ARP Inspection
(DAI). After DAI is enabled on a VLAN, the IBM Ethernet product will intercept
and examine all ARP packets within that VLAN. The ARP packet will be
inspected by the CPU and discarded if the information is found to contain invalid
IP to MAC address bindings.
To overcome this MiM attack, the network administrator can configure DHCP
snooping. This feature works by blocking any DHCP server packets from user
(end-point) ports and only permitting DHCP server packets from a trusted port.
DHCP snooping is configured per VLAN and will accept a range of VLANS.
There are two steps to configuring DHCP Snooping:
1. Enable DHCP snooping per VLAN:
ip dhcp-snooping vlan 123
226 IBM b-type Data Center Networking: Design and Best Practices Introduction
2. Set the trusted port:
interface ethernet 1/1
dhcp-snooping-trust
When first enabled IP Source Guard only permits DHCP traffic to flow through
the untrusted ports. It then learns the IP addresses from the ARP table. IP traffic
is only passed through the network device after the IP address is learnt and the
source address of the packet matches the learnt source address.
VRRP authentication is an interface level command. After the VRRP has been
defined, the network administrator can then configure the VRRP interface with
the authentication type and password. There are two (2) authentication modes
available today:
None This is the default and does not utilize any authentication.
Simple The port can be configured with a simple text password, this
password will be used for all communications from that port
Just as the VRRP group of routers will all need the same VRID configured, the
interfaces assigned to the VRRP on each router need the same authenticating
method and password. Failure to set the same authentication method and
password will result in VRRP not activating resulting in a single point of failure.
When using a simple authentication, the default is for the password to be stored
and displayed in an encrypted format. The network administrator can optionally
configure to display and save the password in plain text. When authenticating the
simple password is decrypted and passed in clear text.
The MD5 hash key is stored as the hash and up to 255 different keys can be
defined.
228 IBM b-type Data Center Networking: Design and Best Practices Introduction
By default, the IBM Ethernet routers assume that the BGP password is being
entered in clear text and it is stored and displayed in encrypted format. It is
possible to enter a password that is already encrypted. Regardless of password
entry method the password is decrypted before use.
9.5.1 IP ACLs
To restrict packets based on a common rule, the use of Access Control Lists
(ACLs) is suggested. These do not replace other security devices such as a
firewall or Intrusion Prevention System (IPS).
As with most implementations of an ACL based filter there are two types of ACLs
available, Standard and Extended.
Standard ACLs
The Standard ACLs are numbered from one (1) to ninety-nine (99). These are
used for IP control and as such are really a Layer 3 security function. However
we will cover them here with the other IP ACLs.
Standard ACLs can provide coarse security controls by controlling data flows
from the defined source IP address. The IP address can be a host or a network.
Standard ACLs are typically used to deny a single IP host or network, or in Policy
Based Routing.
Extended ACLs
The Extended ACL can be defined numerically from 100 to 199. Extended ACLs
allow more detailed control of IP traffic by allowing the definition of any field in the
TCP/IP packet.
If the IP protocol is TCP or UDP, the Extended ACL can also define restrictions
based on the following values:
Source Port(s)
Destination Port(s)
TCP Flags
If the IP protocol is ICMP, the Extended ACL can also define restrictions based
on the following values:
ICMP Type
ICMP Code
Named ACL
Both types of ACLs can be defined by a name. If this method is chosen the
configuration commands are slightly different as the administrator must define
whether the ACL is a standard or extended.
The IBM Ethernet Routers can have up to 100 named ACLs defined as standard
ACLs, and up to 500 named ACLs defined as extended ACLs.
230 IBM b-type Data Center Networking: Design and Best Practices Introduction
9.5.3 Protocol flooding
The IBM Ethernet products can pass traffic at wire speed due to their
architectural design. However, not all network devices can process traffic at wire
speed, nor is it always preferable to allow traffic to flood the network. Whether the
traffic is broadcast, multicast or unicast traffic flooding can be detrimental to
devices and systems.
The IBM Ethernet switches can be configured to restrict the number of flood
packets, or bytes, per second. Protocol Flooding can be enabled at the port level.
The IBM Ethernet products allow for common mitigations and strongly advise all
network deployments to enable simple security measures to reduce the risk of a
network device being used for a malicious purposes.
ACLs can, and must, be configured to restrict access to Telnet, SSH, HTTP, SSL
and SNMP functions.
Authentication
Authentication is the process of confirming the access attempt has credentials
that are recognized. This is usually thought of as the username and password
that you enter to access a system.
Authorization
After being authenticated, network devices will then also check the authorization
of the user. Authorization defines the instructions the user has been permitted to
use once authenticated.
Accounting
Tracking what a user does and when they do it can provide valuable data both in
forensics and in problem determination. Accounting logs can be used to identify
when a change was made that might have caused a fault. Similarly, it can be
used after an event to discover who accessed a system and what they did.
SSH provides all the functionality of Telnet, but provides encryption of the entire
communications stream. Furthermore, it can also be configured to only permit
connectivity from known systems that have the system’s public key uploaded to
the Network Device.
232 IBM b-type Data Center Networking: Design and Best Practices Introduction
9.6.4 HTTP / SSL
In case your deployment also requires the use of the web interface, be aware that
HTTP has similar vulnerabilities to Telnet. The HTTP protocol transmits all data
in clear text. Rather than HTTP, it is preferable to enable the SSL protocol to
ensure that system administration data is encrypted wherever possible. The
relevant product system configurations guides provide more detail on enabling
this feature.
9.6.5 SNMP
Simple Network Management Protocol (SNMP) versions 1 and 2 utilize
community strings to provide simple authentication, in order to restrict what
systems can access SMTP. However, devices always set a default community
string which is well known. In addition to the use of ACLs discussed in 9.6.1,
“System security with ACLs” on page 231, the network administrator must
change the SNMP community string to a non-trivial string.
The SNMP community string is transmitted in clear text, however the IBM
Ethernet devices display the SNMP community string in an encrypted format
which is displayed when a network administrator has read-only access to the
device.
236 IBM b-type Data Center Networking: Design and Best Practices Introduction
10.1.3 IBM g-series stacking terminology
Here we describe some of the terminology associated with the IBM g-series:
Active Controller This is the unit that manages the stack and configures
all the units as a single system.
Future Active Controller If a unit is configured with the highest stack priority
(see 10.2.1, “Stack priority” on page 238), that unit will
become the Active Controller after the next reload. To
prevent disruption to the stack, the Active Controller
does not change due to a configuration change.
Standby Controller The unit in the stack with the second highest priority
after the Active Controller. This unit will become the
Active Controller in the event of a failure with the
configured Active Controller.
Stack Member A unit within the stack that is neither the Active
Controller nor the Standby Controller
Stack Unit A unit functioning within the stack, including the Active
Controller and Standby Controller.
Upstream Stack Unit The Upstream Stack Unit is connected to the first
stacking port on the Active Controller. The first port is
the left hand port as you face the stacking ports as
shown in Figure 10-1
Downstream Stack Unit The Downstream Stack Unit is connected to the
second stacking port on the Active Controller. The
second port is the right hand port as you face the
stacking ports, as shown in Figure 10-1.
Note: Because either of the 10 GbE ports can be used in any order for
stacking ports, your cable configuration might differ from those shown in the
figures.
The B50G ships with a single 0.5 meter CX4 cable for stacking. To deploy a ring
topology another CX4 of sufficient length (usually either 1 or 3 metres depending
on the number of Stack Units) must be ordered.
Each stack also has a Standby Controller, this unit takes over as the Active
Controller in the event of a failure with the initial Active Controller. Through
configuration a Standby Preference can be set, again from 1 through to 8. If the
Standby Preference is not set the Standby Controller is the unit with the lowest
MAC address.
238 IBM b-type Data Center Networking: Design and Best Practices Introduction
10.2.2 Linear Stack
A Linear Stack is created by connecting each switch only to the switch above or
below itself as shown in Figure 10-2. The top and bottom units in the stack only
use a single stack connection. This other connection can be used as a data port.
Redundancy in a Linear Stack: In the Linear Stack, there are three main
possible failure options:
If the Active Controller fails, the Secondary Controller takes over as the Active
Controller. All Stack Units that can communicate with the new Active
Controller continue operation with minimal interruption.
If a Stack Member fails between the Active Controller and the Secondary
Controller, then the Secondary Controller becomes the Active Controller for
the Stack Units it can communicate with. While the Active Controller remains
in control of the Stack Units that it can still communicate with.
If a Stack Member fails in such a way that the Master and Secondary
Controllers are still able to communicate. Any Stack Unit that can
communicate with the Active Controller will continue operations without
interruption. Whereas all other Stack Units will become non-functioning and
require manual intervention to regain network and stack connectivity.
Redundancy in a Ring Stack: In a Ring Stack, there are also three possible
failure options, each failure scenario results in the stack operating in a Linear
Stack mode until the failed Stack unit is repaired.
If the Active Controller fails the Secondary Controller takes over as the Active
Controller. Due to the ring all Stack Units can still communicate with the new
Active Controller and continue operation with minimal interruption. A new
Secondary controller is elected. In case the old Active Controller recovers, it
will not take over the Active Controller function until the next stack reset.
If the Secondary Controller fails the stack elects a new Secondary Controller.
All Stack Units can still communicate with the Active Controller and continue
operation without interruption.
If a Stack Member fails, all other Stack Units can still communicate with the
Active Controller and continue operation without interruption.
240 IBM b-type Data Center Networking: Design and Best Practices Introduction
10.2.4 Stacking ports
The B50G does not have dedicated stacking ports. Therefore, either of the
10 GbE ports can be utilized as stacking ports or data ports. The first unit in the
stack will have both 10 GbE ports automatically configured to stacking ports,
however this can be changed manually. In the Linear Stack topology, doing this
allows two 10 GbE ports in the stack to be used as data ports.
The Active Controller can reset other Stack Members as required. However, it will
not reset itself. If you require the Active Controller to reset, perhaps to force
another Stack Member to become the Active Controller, you must manually reset
the Active Controller.
If the Active Controller fails, the Standby Controller waits thirty (30) seconds
before taking over as the Active Controller. It will then reset the entire stack,
including itself, before resuming operation as the Active Controller.
Note: If the Active Controller loses connectivity with the stack, it might take up
to twenty (20) seconds for the connection to age out, then the Standby
Controller will wait another thirty (30) seconds before taking over control of the
stack. This can result in a delay up to fifty (50) seconds before the Standby
Controller takes over control of the stack.
The Active Controller copies its startup configuration with the rest of the Stack
Members, including the Standby Controller.
The configuration of the Standby Controller is copied from the Active Controller
on each reboot.
Secure Stack also allows units to be reallocated Stack IDs as well as allowing the
addition, removal or replacement of a Stack Unit.
The Stack Unit that is issued the stack secure-setup command becomes the
Active Controller with a priority of 128. All other units joining the stack are
assigned a stack priority below 128. In the case where the secure setup
discovers another Stack Unit with a Stack Priority of 128 or higher, it reconfigures
the Stack Priority on this unit to 118.
242 IBM b-type Data Center Networking: Design and Best Practices Introduction
For example, a connection from an Access Layer Linear Stack group can use the
end unit’s 10 GbE CX4 ports for uplinks to the Distribution Layer, while these
Distribution units might also be B50G products, they are not intended to be part
of the current stack, configuring stack disable on the distribution layer will
prevent these units from joining the stack.
This secure setup polls upstream and downstream units using a proprietary,
authenticated, discovery protocol to identify units connected to this Active
Controller. The stack administrator is then presented with a proposed
configuration, including the Stack Units discovered, and asked to confirm the
topology and then the proposed configuration and Secure Stack Membership.
Best practice: Cable up the stack ports before applying power to a new or
replacement Stack Unit.
If the unit was powered up when the cables were removed it will maintain its
stack configuration and will need to have the stack information cleared before
being redeployed as a separate unit or into another stack.
Note: Unless the removed Stack Unit is intended to be installed in the same
position in the same stack, it is advisable to clear the stack configuration.
244 IBM b-type Data Center Networking: Design and Best Practices Introduction
10.5.3 Removing the Active Controller from the stack
If the Active Controller is removed from the stack, the Standby Controller will wait
30 seconds before taking over as the Active Controller within the remaining
stack. The removed unit will be able to function as a stack of one unit even
without clearing the stack configuration. While it is not required to clear the stack
configuration in this instance, it is advisable to clear the stack configuration to
ensure a deterministic operation of the unit. After being cleared, the stack
enable command can be reentered to allow the unit to create a new stack in the
future.
To avoid nonsequential Stack IDs the simplest method is to replace one unit at a
time. Also, Secure Setup can be used to reassign stack IDs in a sequential order.
Note: In a Linear Stack topology, adding or removing a Stack Unit in any other
position except the top or bottom of the stack will cause some Stack Units to
lose connectivity with the Active Controller, causing a reboot on these Stack
Units. In a Ring Stack topology, adding or removing a Stack Unit must not
interrupt any of the other Stack Units.
Before merging stacks, it is important to ensure none of the stacking ports have
been configured as data ports. Recall in a Linear Stack it is possible to
reconfigure the unused end stacking ports as data ports. Secure Stack does not
work across stack boundaries, so it is not possible to use the Secure Stack
process to merge stacks.
When stacks are merged, the existing Active Controllers undergo a reelection
process. The winner of this process retains both its configuration and the Stack
IDs of its previous Stack Members. The new Stack Members only retain their
Stack IDs if their existing Stack ID does not conflict, otherwise a new Stack ID is
assigned to the new Stack Member. This might result in the Stack IDs no longer
being sequential. However, now that all the units are in a single Stack, the Secure
Stack process can be utilized to reassign Stack IDs and ensure the stack
members are the expected units.
Following are some other examples of when merging stacks might be required:
If a Linear Stack had a failure in such a way that both parts of the stack
remained active, upon repairing the failure the two stacks can be merged
again. In this case, the new stack has the same configuration as the original
stack because all the Stack Units had unique Stack IDs.
If two Linear Stacks are connected it is important to ensure the unused end
units are configured so that both ports are stacking ports
246 IBM b-type Data Center Networking: Design and Best Practices Introduction
10.7 Best practices in a stack
In the topics that follow we describe some best practices as they relate to stacks.
Because the data center houses servers that host the corporate data and
applications, the data center network (DCN) design must continue operations in
the event of a single device failure. In some instances there might be more
difficult requirements such as continued operation in the event of multiple device
failure, and in such cases the architecture can change, but the device selection
criteria detailed in the topics that follow will still remain valid.
Type
Design decision
250 IBM b-type Data Center Networking: Design and Best Practices Introduction
Problem statement and questions
What hierarchical structure will best support your DCN? Does it consider future
directions in dynamic infrastructure or cloud computing environments?
Assumptions
We make the following assumptions:
Single tier network architectures are not suited for the DCN style network.
Tiers can be either physical or logical.
Alternatives
Here we discuss the various alternatives:
Three tier design This design is the traditional design and consists of
core, distribution, and access layers in the network.
The design provides scalability for fast growing
networks; each tier assigns specific roles to each
device in the network infrastructure.
Collapsed tier design This design is also called a two tier model and
requires less equipment and less connections. The
main drawback is that it is not as scalable, however.
The current DCN network offerings have exceptional
port density compared to network devices of
previous generations. Combined with server
virtualization, the collapsed tier design is becoming
more prevalent, and it is the one we examine in this
chapter.
Differentiated distributionThis design is a three tier model that further
delineates areas of responsibility within the
distribution layer. The distribution layer devices can
be separated into specific functions for user access,
data center, and WAN.
Considerations
Observe the following considerations:
The data center cable design might constrain the hierarchical design.
The data center rack layout might constrain the hierarchical design.
Virtualization of network components (see Chapter 5, “IBM Ethernet in the
green data center” on page 133) might challenge traditional hierarchical
designs.
The deployment of 10 GbE uplinks from the access layer might impact the
overall architecture.
Figure 11-1 shows the Data Center Network architecture. Further differentiation
is enabled on the connectivity tier with WAN, LAN, and multiple ISP connectivity
modules, which are discussed in this section also.
252 IBM b-type Data Center Networking: Design and Best Practices Introduction
Type
Design decision
Assumptions
We make the following assumptions:
Servers can be stand-alone or rack based servers.
Media can be copper or fiber.
Speeds can range from 10 Mbps to 10 Gbps, either over time or in the current
environment.
Alternatives
Here we discuss the various alternatives:
Top of Rack (ToR) This option connects the servers within a rack to the
access switch in the same rack, usually at the top of the
rack. This allows standard cable lengths to be
pre-purchased for the data center for server connectivity.
The server access switches can be connected to the
distribution tier by high-speed uplinks, reducing the
number of connections exiting the rack.
End of Row (EoR) This option connects the servers within a rack to a large
access switch at the end of the row of racks, typically a
high port density modular chassis. The chassis can either
be cabled to patch panels at the top of each server rack or
directly to each NIC in a server. While patch panels can
take additional space in the server rack, the benefits of
doing so include better cable management as well as a
decrease in devices needing to be managed.
Middle of Row (MoR) This option is similar to EoR, except that being in the
middle row, the maximum cable distance between the
server and access switch is less. This is more desirable
when doing direct cable runs from a high port count
chassis directly to the servers.
Decision
A Top of Rack (ToR) switch at the access layer is more traditional and gives the
flexibility to use fixed-length network cables to connect to servers within the rack.
The capability to have minimal high speed uplinks (10 GbE) to a distribution tier
also helps with cable management. A pre-tested rack solution containing servers,
storage, and a ToR switch might also be easier to deploy.
However, End of Row (EoR) and Middle of Row (MoR) solutions that utilize
higher port density, modular devices typically have higher levels of availability
and performance. Most modular chassis have N+M redundancy for
management, power supplies, and fans. A backplane provides connectivity to all
ports within the chassis, providing 1:1 (1 GbE) full mesh connectivity and either
1:1 or 4:1 (10 GbE) full mesh connectivity depending on options selected.
Connecting up to 7 times the number of devices to a single chassis minimizes the
number of devices that need to be managed. A patch panel can also be used at
the top of each server rack to help with cabling.
With virtualized servers becoming more and more prevalent and increasingly
saturating network connections, EoR and MoR solutions might merit serious
consideration, though many network designers might be more familiar with a ToR
solution.
254 IBM b-type Data Center Networking: Design and Best Practices Introduction
11.3.2 Server access switch selection
Here we discuss server access switch selection using Top of Rack (ToR).
Type
Technology decision
Assumptions
We make the following assumptions:
Uplinks to the distribution tier are required to be scalable to a greater speed
than the links to their servers.
There is typically some over-subscription when connecting the access layer to
the distribution layer.
The architect is able to determine prior to final product selection, the type of
media required to connect the servers to the server access layer.
Alternatives
Table 11-1 shows the alternatives and features between the DCN ToR products.
The primary difference between the two c-series switches is the available media.
The c-series (C) models have primarily copper, 10/100/1000 MbE RJ45 ports for
the servers. The c-series (F) models have primarily 100/1000 MbE (hybrid fiber)
SFP ports.
Some models have combination ports which are shared ports between the first
four 10/100/1000 MbE RJ45 ports and 100/1000 MbE SFP ports found on the
devices. These allow network designers the flexibility to use optical transceivers
for long distance connectivity.
Number of 10 24 0 0 0
GbE (SFP+)
Size 1 RU 1 RU 1 RU 1.5 RU
Considerations
Observe the following considerations:
Redundant links or dual homing systems can be achieved by installing a
second ToR unit.
To increase port utilization cables can be run from an adjacent rack, this will
still follow the cables of a known length rule.
In an environment where traffic coming from the distribution to the access
layer is expected to be bursty, network buffers on the server access device
become more important. This is because it might be possible for the 10 GbE
links to the distribution layer to burst near maximum capacity. In this scenario
the traffic needs to be buffered on the access switch for delivery to the slower
server connections. Without sufficient network buffers some data might be
dropped due to traffic congestion. Increasing uplink bandwidth can mitigate
some of these issues.
Decision
The IBM Ethernet B50C products have more available network buffer capacity to
allow bursty traffic from a 10 GbE uplink to be delivered to the slower
10/100/1000 Mbps RJ45, or the 100/1000 Mbps SFP server connections, which
is useful because the over-subscription ratio can be as high as 48:20 (2.4:1).
256 IBM b-type Data Center Networking: Design and Best Practices Introduction
The B50C also requires less rack space (1RU each) and less power, making a
good choice for a ToR server access switch. Additional software features can
also help build advanced data centers, including support for Multi-VRF to
virtualize routing tables and MPLS/VPLS capabilities to directly extend remote
Layer 2 networks to the server rack.
For virtualized server deployment, a high capacity network link might be required
to support a greater number of applications running on a single physical server.
In this scenario, the x-series will enable high-bandwidth, 10 GbE server
connectivity. Up to eight links can be aggregated using LACP to provide a 80
Gbps trunk from the switch to the distribution layer.
Type
In a traditional design, the distribution tier connects only to other switches and
provides inter-connectivity between access tier switches and uplinks to the core
or WAN Edge of the network to enable connectivity out of the data center. This
can be seen in Figure 11-1 on page 252, where the distribution tier is also the
core tier. This can be done when devices used meet the requirements of the
core, such as Layer 3 routing and any advanced services needed such as MPLS,
VPLS, and VRF are satisfied by the devices chosen. The main decision on
whether to separate the distribution and core layers is whether additional device
scalability is required.
In a collapsed design, the distribution tier can connect directly to servers, making
it an access/distribution tier. Typically higher density, modular chassis are
deployed. This architectural decision can simplify management by reducing the
number of devices required while increasing the availability and resiliency of the
network.
Assumptions
We make the following assumptions:
The network architect has the required skills and experience to select the best
option for the agreed business requirements.
The DCN deployment is for a commercial business and requires commercial
grade equipment and manageability.
Alternatives
Here we discuss the various alternatives:
Collapse the access and distribution tiers.
Maintain physical tiers with lower cost devices.
258 IBM b-type Data Center Networking: Design and Best Practices Introduction
Considerations
Observe the following considerations:
Collapsed tiers are able to deliver the business requirements in a
standardized way for small environments.
Lower cost devices, probably will not have the commercial manageability and
stability that a commercial business requires.
Layer 3 routing might be required.
Decision
When the network architect is faced with this issue, the best solution is to
investigate collapsing tiers and maintain business grade equipment in the
commercial DCN. However, network architects might be more comfortable with
more traditional designs implementing a separate access layer with lower-cost
ToR switches connecting to the Distribution layer.
Assumptions
We make the following assumptions:
The network architect has captured the customer’s business requirements
The network architect has captured the customer’s logical network
preferences.
The network architect has confirmed the application architect’s requirements.
Decision
There is no wrong answer here; the decision of running Layer 2 or Layer 3 at the
distribution tier is dependent on the rest of the network environment, the intended
communications between VLANs, and the security (ACL) requirements of the
DCN.
Type
Technology decision
260 IBM b-type Data Center Networking: Design and Best Practices Introduction
Problem statement and questions
How can the core device provide maximum uptime (minimal disruptions to
service)? Will the core device be capable of peering with many BGP peers? How
can the core push MPLS deeper into the DCN? How can the core device connect
to my carrier’s POS? What products will support my IPv6 requirements, even if I
don’t run IPv6 today?
Assumptions
We make the following assumptions:
The core device will run BGP with many peers.
MPLS will be deployed deeper into the network, rather than just the
connectivity tier.
The core can connect directly to a Packet Over Sonet (POS) circuit.
The core will run L3 routing protocols.
Alternatives
The m-series devices must be placed at the DCN core tier. Table 11-2 on
page 263 shows a short comparison of the two products highlighting some of the
considerations the core tier might have.
Considerations
Observe the following considerations:
The m-series Ethernet products have supported hitless upgrades for a
number of releases, this provides a mature hitless upgrade solution.
If your site is housing multiple functions, or clients, that have a need for
separate routing and forwarding tables, VRF will provide this separation,.
If there is a requirement to support IPv6, or perhaps deploy it in the near
future, the m-series supports dual stack (running IPv4 as a separate stack to
IPv6) out of the box.
Decision
The m-series Ethernet Routers provide proven support for Layer 2 and Layer 3
hitless upgrades, removing the need for downtime during upgrades. In addition
the m-series provides greater support for IP routing protocols with sufficient
memory to maintain 1 million BGP routes in the BGP RIB as well as 512,000
routes in the IPv4 route protocols FIB. As IPv6 becomes a requirement in more
data centers, the m-series provides dual stack support for both IPv6 and IPv4,
this avoids future downtime to upgrade blades to IPv6 capabilities if they are
required.
Internet connectivity through one or more ISPs, brings up many unique issues,
especially in the security area. The IBM Ethernet products are not security
devices (for example, firewalls, IDS/IPS) the network designer must consider how
to provide security for his or her DCN requirements. This section covers only the
Ethernet connectivity considerations.
The WAN and MAN can be considered equal, because in these cases the client
owns the connecting infrastructure at the other end of the carrier circuit. There
might be other DCNs, Enterprise networks or both at the remote side of these
connections.
Type
Technology decision
Assumptions
We make the following assumptions:
Full Internet security will be provided by appropriate security devices such as
firewalls, IDS/IPS.
The DCN has a requirement to provided Internet connectivity.
The DCN can have more than one (1) ISP providing connectivity.
262 IBM b-type Data Center Networking: Design and Best Practices Introduction
Alternatives
On the connectivity edge, the IBM Ethernet products provide the m-series ideal
candidates, Table 11-2 shows some of the design considerations for connectivity
to the Internet.
Table 11-2 Comparison of c-series and m-series for connectivity edge devices
Feature B04M B08M / B16M
BGP Route capability Supported out of the box supported out of the box
ACL Support Support for ingress and Support for ingress and
egress ACLs. egress ACLs.
Support for OC12 or OC48 Yes, refer to Chapter 2, Yes, refer to Chapter 2,
interfaces “Product introduction” on “Product introduction” on
page 31 for more details page 31 for more details
Support for VRF Supported out of the box Supported out of the box
Considerations
Observe the following considerations:
Determine how many BGP routes your DCN needs to see from the Internet.
ACLs are important to ensure that BOGON lists and other basic security
functions can be addressed as close as possible to the source. Remember
that full security, such as firewalls and IDS/IPS devices, must be provided by
your DCN security infrastructure.
Determine how many BGP peers you will have in your WAN or MAN.
Decision
The IBM m-series Ethernet product provides scalable interface connectivity and
allows the use of ACLs to provide security or route filtering as well as the option
to utilize VRF’s for route separation. The m-series supports IPv6 to future proof
your investment as well as large BGP route table capabilities.
As previously mentioned in this book, we will treat the data center as the main
hardened facility housing the systems and services that require high availability
and performance characteristics. The enterprise accesses the applications
housed in the data center through the links such as the WAN and LAN as shown
in Figure 12-1.
Focusing on the enterprise components and considering the variability within the
enterprise, we find there are a number of different options. Most of them can be
solutioned in similar ways or by adding minor components, and are discussed in
the following sections.
266 IBM b-type Data Center Networking: Design and Best Practices Introduction
Figure 12-2 depicts an example of an Enterprise Campus with multiple buildings.
The type of work being done at the business might impact network design
decisions. For example, a site with mainly programmers might require a more
high-performance network to support code sharing and development, whereas a
call center might need lots of ports to support a large number of people, but less
bandwidth because most of the traffic might be just Voice over IP telephony.
Some tiers can be shared with the DCN; the core can be shared in a
differentiated distribution model, or for a smaller site with specific server access
needs, the network architect might determine that sharing the distribution device
is acceptable.
Type
Design decision
Assumptions
We make the following assumptions:
Single tier network architectures are not suited for the corporate enterprise
style network.
Tiers can be either physical or logical.
268 IBM b-type Data Center Networking: Design and Best Practices Introduction
Alternatives
Here we discuss the various alternatives:
Three tier design As discussed in Chapter 4, “Market segments
addressed by the IBM Ethernet products” on
page 125, this design is traditional and consists of
core, distribution, and access layers in the network.
The design provides scalability for fast growing
networks, each tier assigns specific roles to each
device in the network infrastructure.
Collapsed backbone This design is also called a two tier model, this can
be used in a smaller enterprise site. It requires less
equipment and less connections. The main drawback
is this model is not as scalable as a three tier design.
Differentiated distributionThis is a three-tier design that further delineates
areas of responsibility within the distribution layer.
The distribution layer devices can be separated into
specific functions for user access, perhaps by
department, for example, the finance department
can be physically separated from the call center staff.
Considerations
Observe the following considerations:
The enterprise site cable design might constrain the hierarchical design.
The enterprise site layout might constrain the hierarchical design.
Virtualization of network components (see Chapter 5, “IBM Ethernet in the
green data center” on page 133) might challenge traditional hierarchical
designs.
The deployment of 10 GbE uplinks from the user access layer might impact
the overall architecture.
The distance between floors, or buildings, might impact the overall
architecture.
Requirements for Power over Ethernet have to be taken into account.
Decision
Regardless of how they are created, most site designs follow a tiered model for
reliability and scalability reasons. While it is possible to collapse some tiers into a
single physical device, considerations for future expansion must be taken into
account. Most importantly, understand what your client needs are; what is their
business planning for this site? What design decisions that you make today will
impact the ability for the client to meet their business goals?
Type
Management decision
Assumptions
We make the following assumptions:
The network is being managed and monitored on a per device basis.
The number of users at the site is expected to grow over time.
Alternatives
Here we discuss the various alternatives:
Discrete devices can be managed individually and might show benefits at
certain tiers.
Stacked devices include the IBM B50G product from the g-series which can
be connected in a stack allowing up to eight (8) units to be managed as a
single device through a single management IP address. This allows the
flexibility for businesses to pay as they grow and need more switches while
simplifying deployment.
Chassis devices, provide scalability through the addition of modules, require
less power outlets, allow for hot swapping of modules, power supplies and fan
units and provide the ability for hitless upgrades which all allow greater
up-time.
Considerations
Observe the following considerations:
Site availability plays a main factor in deciding between stackable and chassis
based products. Chassis products have the ability for hitless upgrades.
Stackable solutions provide management and monitoring through a single IP
address for the entire stack. This can both simplify management and provide
scalability for installations that can accept outages for upgrades and might
increase user base.
270 IBM b-type Data Center Networking: Design and Best Practices Introduction
It is important to decide whether stacking is an appropriate option before
purchasing a g-series switch. For example, the B48G cannot be upgraded to
the stackable model, whereas the B50G is ready to be stacked or operate
independently.
Decision
Where greater up-time is required, for example, consider the chassis based
products and understand the hitless upgrade capabilities of each of those
devices. A single B16S chassis can support up to 384 devices while delivering
Class 3 PoE power.
Type
Technology decision
Alternatives
At the access tier, there are two alternatives and each of these has scalability
options. Table 12-1 shows a comparison of the products at a series level.
Table 12-1 Comparison of g-series and s-series for the Enterprise access tier.
Feature g-series (B50G) s-series
System Power Supplies 1+1 per stack unit 1+1 for the 8-slot chassis
16 in a stack of 8 units 2+2 for the 16-slot chassis
Separate PoE Power System power supplies 2 for the 8-slot chassis
Supplies supply PoE power 4 for the 16-slot chassis
272 IBM b-type Data Center Networking: Design and Best Practices Introduction
Considerations
Observe the following considerations:
Is PoE required or planned? Many VoIP handsets can utilize PoE saving the
need to deploy separate power supplies on individual desks. Also some
wireless access points can also utilize PoE, allowing the network to have AP’s
deployed in ideal wireless points without concern for power outlets.
Is the location small but capable of growing? A small location might initially
only house up to 40 users, but have the ability to expand to adjoining office
space as the need arises. In this case the B50G can provide stacking
capabilities allowing low cost expansion.
Decision
The s-series provides scalability in a flexible low cost chassis format. It allows for
separate hot swapped PoE power supplies for growth and redundancy. Modules
allow growth in 24 port increments for the 10/100/1000 Mbps RJ45 or the
100/1000 SFP options. If your site requires the connections to the distribution to
be 10 GbE capable, these modules are available each with two ports. If you need
Layer 3 functions at the user access layer, the s-series has options for either IPv4
only or IPv4 and IPv6 support. This is the choice for the enterprise site that has
expansion plans or requires more than 48 user ports plus 10 GbE ports to
connect to the distribution layer.
The g-series is a good choice for the smaller site, or sites where greater
over-subscription between the user ports and the uplink to the distribution tier is
suitable. The B50G allows more units to be added to the stack as a site grows,
allowing the stack to be operated as a single unit, from a single management IP.
The B48G allows for the addition of a 10 GbE module if high speed uplinks to the
distribution layer are required and the site does not have the capability to expand,
or the convenience of operating a stack is not required.
Type
Design decision
Assumptions
We make the following assumptions:
Connectivity is available between floors and is terminated in a central location
for each building.
Connectivity is available between buildings and is terminated at a central
location on the campus.
Appropriate media is used between floors or buildings for the distances
involved.
Each floor, or building, has a secure closet with sufficient power, cooling and
Alternatives
Here we discuss the various alternatives:
Position the distribution devices in a central location for the campus so that all
floors, or buildings, can connect to the central distribution. This most likely
requires fiber covering the entire distance from the floor to the central
building, a rather inefficient use of fiber connectivity.
Position distribution devices in a central part of the building so that all floors,
or local buildings, can connect to the building distribution device. This requires
cabling between the floors and the central location for the building. Then
further cabling between the building and the location for the campus core
devices.
Collapsed access and distribution tier for a single building, where the
distribution switch is large enough to connect to all end devices and does
routing to the MAN, WAN, or Internet.
274 IBM b-type Data Center Networking: Design and Best Practices Introduction
Considerations
Observe the following considerations:
Efficient use of expensive cabling, such as connections between buildings.
These connections are typically laid underground and digging up the trench to
lay more connections adds cost due to the care required to ensure no existing
connections are severed with the back hoe.
The cost difference between Layer 2 and Layer 3 devices has allowed for
collapsing the access and distribution tiers into a single device. This does
restrict the scalability but might be a viable option for buildings that cannot
scale further.
Routing between VLANs will provide benefits within the building. For example,
if printers are on a separate VLAN to the workstations, the distribution tier can
enable routing between workstations and printers without the traffic needing
to leave the building.
Peer-to-peer connectivity is also enhanced by deploying the distribution tier
within a building. Consider the following VoIP example: each department
device can be put on a separate VLAN for security reasons, for example,
marketing is on VLAN 100 while finance is on VLAN 200. If finance needs to
access some documents on marketing’s shared drive, they need to be routed
to another VLAN. Having a distribution switch within the same building that is
Layer 3 capable can prevent traffic needing to traverse back to the Core.
Decision
Distribution tier devices must be deployed to allow scalability of the enterprise.
These devices must be located at a central location for the building, or site
depending on layout. Deploying the distribution tier at the building allows routing
between VLANs within the building.
Type
Technology decision
Assumptions
We make the following assumptions:
The enterprise campus is large enough to require a separate device for the
distribution tier.
Alternatives
Here we discuss the various alternatives:
Layer 2, this might be beneficial if the access tier is running in Layer 3 and
has therefore made the initial routing decision.
Layer 3 IPv4, this is the more traditional design where the access tier provides
Layer 2 functions and the distribution tier provides routing functions.
Layer 3 IPv6, this might be a requirement in some government networks or
other corporations that have decided on supporting IPv6 throughout their
network.
Considerations
Observe the following considerations:
In a small site it might be acceptable to collapse the access and distribution
tiers into one device.
If IPv6 support is required, all modules in the s-series must be purchased with
IPv6 capability.
How many end switches will need to be aggregated at this point?
Decision
Unless advanced routing services such as MPLS are required, the decision will
most likely be between the IBM s-series and IBM r-series devices. Both devices
are fully capable, resilient, modular chassis that support Layer 3 routing. One key
criteria might be how scalable the device needs to be. Table 12-2 is a snapshot of
the speeds/feeds of these devices:
276 IBM b-type Data Center Networking: Design and Best Practices Introduction
Table 12-2 Speeds and feeds
r-series s-series
System Power Supplies 2+1 for 4-slot chassis 1+1 for the 8-slot chassis
3+1 for 8-slot chassis 2+2 for the 16-slot chassis
5+3 for 16-slot chassis
Type
Technology decision
Assumptions
We make the following assumptions:
The core device will run BGP.
The core will run Layer 3 routing protocols.
Alternatives
Both the m-series and r-series devices can be placed at the enterprise core tier,
and in smaller environments the s-series as well. Table 12-3 shows a short
comparison of the products highlighting some of the considerations the core tier
might have.
278 IBM b-type Data Center Networking: Design and Best Practices Introduction
Feature r-series s-series m-series
Route Support 400k IPv4 routes 256K IPv4 routes 512K IPv4 routes
(FIB) (FIB) (FIB)
1M BGP routes 1M BGP routes 2M BGP routes
(RIB) (RIB) (RIB)
Both require the full
Layer 3 feature set
ACL Support Ingress ACLs only ingress ACLs only ingress and egress
ACLs available
Considerations
Observe the following considerations:
The m-series and r-series devices have supported hitless upgrades for a
number of releases which provide a mature hitless upgrade solution.
If your enterprise site is housing multiple functions, or business units, that
have a need for separate routing and forwarding tables, VRF will provide this
separation available on the m-series.
If there is a requirement to support IPv6, or perhaps deploy it in the near
future, the m-series and r-series supports dual stack IP out of the box. The
s-series has different modules for IPv4 only and IPv4 plus IPv6 support,
therefore all modules that needs to run L3 IPv6 must be upgraded to the dual
IPv4 and IPv6 model.
Decision
The m-series and r-series provide proven support for L2 and L3 hitless upgrades,
removing the need for downtime during upgrades. The m-series provides greater
support for IP routing protocols with sufficient memory to maintain up to 2 million
BGP routes in the BGP RIB and as many as 512,000 routes in the IPv4 route
protocols FIB as well as support for advance services such as MPLS and VRF.
As IPv6 becomes a requirement in the enterprise, the m-series is ready to
provide dual stack support for both IPv6 and IPv4, this avoids future downtime to
upgrade modules to IPv6 capabilities in case IPv6 is required.
For larger Cores, the m-series must be strongly considered. Smaller Enterprise
sites can utilize an r-series or s-series device as appropriate.
Whereas IBM acknowledges the growth in the use of Infiniband for high speed,
low latency supercomputing environments, the use of gigabit Ethernet has been
increasing since June 2002 as shown in Figure 13-1.
Note: For the Top 500 Super Computer Project, see the following website:
http://www.top500.org/
282 IBM b-type Data Center Networking: Design and Best Practices Introduction
Figure 13-1 Connectivity media used by the top 500 supercomputers
Although HPC 2.0 allows for communications separation by function (that is,
storage, HPC communications and network / user access), it also increases the
complexity and cost to deploy the various fabrics, often leading to the creation of
specialized teams within the support organizations.
Furthermore, each system in the cluster must have the appropriate specialized
communications modules to be connected to each fabric, and this might increase
the complexity for system management teams. The combination of these added
support requirements is thought to restrict deployment of HPCs to organizations
that have the financial ability to support various technologies within their data
center.
With HPC 2.0, the architect can consider a combination of IBM Ethernet products
as well as the IBM storage products. The storage products are not covered in this
book; however, many Redbooks publications are available for those products.
Figure 13-2 HPC 2.0 showing separate storage and cluster fabric
284 IBM b-type Data Center Networking: Design and Best Practices Introduction
13.1.2 HPC 3.0
To reduce the various teams required to support HPC 2.0 and with the availability
of 10 GbE, many organizations are now investigating HPC 3.0 as a viable
alternative for their business needs. HPC 3.0, shown in Figure 13-3, utilizes a
single fabric for all communications needs (that is, storage, compute
communications and network communications).
The major benefits in HPC 3.0 are expected to be the ability for more enterprises
to utilize HPC clusters by reducing the cost of support. The network support
team are now able to support the cluster fabric and storage fabric with little extra
knowledge than they otherwise need for their Ethernet networks today.
With HPC 3.0, the architect can choose from the IBM Ethernet product range to
provide network fabric. Storage devices are not covered in this book, however,
note that Fibre Channel over Ethernet (FCoE) or iSCSI devices can connect to
the IBM Ethernet products, as described in the following sections.
Figure 13-3 HPC 3.0 utilizes a single fabric for compute, storage, and network communications
Type
Design decision
Assumptions
We make the following assumptions:
The HPC is physically within the limits of Ethernet.
Expansion to another HPC site utilizes high speed, low latency fiber for
site-to-site connectivity. (for example, a carrier’s MAN or DWDM)
Alternatives
Here we discuss the various alternatives:
Flat HPC design, with a single device connecting all the systems together.
Hierarchical design, with tiered connectivity allowing scalability and easier
future growth.
Considerations
Observe the following considerations:
Existing DCN design might dictate the HPC design.
Other data center controls might constrain design of the HPC environment.
Other existing infrastructure might dictate the HPC design.
286 IBM b-type Data Center Networking: Design and Best Practices Introduction
Decision
Just as in the DCN case for maintaining a hierarchical design for scalability, a
single HPC with any large number of systems connected to the one network
switch is not scalable. Instead, consider a simple two tier, hierarchical redundant
design. However, perhaps a flat topology might suit the HPC installation, if the
site is small and does not have room to expand to become a major HPC center.
Type
Technology decision
Assumptions
We make the following assumptions:
For HPC 2.0, we assume that the storage fabric exists, or is being created.
The intended application has been designed with HPC in mind.
Alternatives
Here we discuss the various alternatives:
High speed connectivity with Ethernet, this is considered in Table 13-1
because it represents over 50% of the HPC environments as of June 2009
according to the Top 500 project.
High speed and low latency with other technology, such as Infiniband. This is
not considered here as many other references cover this topic.
Maximum 1 GbE ports 768 on the 16 slot chassis 1536 on the 32 slot chassis
384 on the 8 slot chassis 768 on the 16 slot chassis
192 on the 4 slot chassis 384 on the 8 slot chassis
192 on the 4 slot chassis
Considerations
Observe the following considerations:
Although other connectivity technology might be required for the client, the
architect also has to consider the impact of multiple types of connectivity
media within a data center. In certain cases, this is acceptable and the
business case will support this decision.
Jumbo frame or custom packet sizes might be required to support HPC
control traffic. The IBM DCN devices support jumbo frames up to 9,216 bytes.
High speed data transfer is the primary requirement of many HPC
applications such as geophysical data analysis. In these cases latency
introduced by Ethernet protocols is not a major consideration.
Decision
For HPC with GbE connectivity the m-series allows for greatest capacity with
1,536 1 GbE, 1:1 subscribed ports on the 32-slot chassis.
For HPC with 10 GbE connectivity requirements the r-series allows for greatest
capacity at 768 10 GbE ports but at 4:1 oversubscription. If 1:1 subscription is
required then the m-series with up to 128 ports can be used.
288 IBM b-type Data Center Networking: Design and Best Practices Introduction
Type
Design decision
Assumptions
We make the following assumptions:
The application has been designed with HPC clusters in mind.
The HPC environment fits the “good enough” segment where both high speed
and low latency are required.
HPC is a requirement for the client application.
Alternatives
Here we discuss the various alternatives:
Tiered HPC cluster fabric; utilizing various tiers to provide the scalability
required.
Retain the flat HPC cluster fabric; purchase an IBM Ethernet product with
more slots to accommodate the current scalability requirements.
Considerations
Observe the following considerations:
Although other connectivity technology might be required for the client, the
architect also has to consider the impact of multiple media within a data
center. In certain cases, this is acceptable and the business case will support
this decision.
High speed data transfer is the primary requirement of many HPC
applications such as geophysical data analysis. In these cases, latency
introduced by Ethernet protocols is not a major consideration.
Decision
It is a best practice for the network architect to always design with scalability in
mind. In the case of HPC 2.0, this might allow an architect to start with an IBM
r-series to create a single tiered model for the initial deployment. Then as HPC is
embraced within that client, the design can scale up with the addition of an
m-series, especially with the large 1:1 10 GbE capacity of the device and ability
to create link aggregation groups of 32 ports for up to 320 Gbps of bandwidth
between two m-series devices.
Type
Design decision
Assumptions
We make the following assumption:
The HPC is physically within the limits of Ethernet.
Alternatives
Here we discuss the various alternatives:
Flat HPC design, with a single device connecting all the systems together.
While the HPC might look somewhat flat, the DCN will unlikely be a single tier.
Hierarchical design, with tiered connectivity allowing scalability and easier
future growth. This alternative retains DCN architecture as well.
Considerations
Observe the following considerations:
The design of the current DCN might influence the design of the HPC.
The design of an existing HPC might influence the design of the new HPC.
290 IBM b-type Data Center Networking: Design and Best Practices Introduction
Decision
HPC 3.0 must consider an hierarchical design for scalability, and interoperability
with the existing DCN architecture. HPC 3.0 must consider connectivity to
storage as well as the network, all through the same fabric. Current industry best
practice has proven that a tiered approach allows for greater flexibility and
scalability. This is no different in the case of an HPC 3.0 design.
Type
Design decision
Assumptions
We make the following assumptions:
The data center already utilizes a separate storage fabric. We assume that
the corporate direction is to continue utilizing this storage infrastructure.
Network delivery staff are already familiar with Ethernet technology and have
worked with various Ethernet connectivity options before (RJ45, Cat6, SFP,
XFP, Fiber).
Alternatives
Here we discuss the various alternatives:
HPC 2.0 with 10/100/1000 MbE RJ45 interfaces for compute node
connectivity. Initially flat but scalable as the HPC grows, maintaining a
separate HPC fabric.
HPC 2.0 with 10/100/1000 MbE RJ45 interfaces for compute node
connectivity, scalable to 500 compute nodes over time.
Considerations
Observe the following considerations:
These two alternatives do not need to be independent from each other. In fact
alternative one can be used to create a basic building block for the scalability
required in alternative two.
Distances between the equipment needs to be confirmed prior to the architect
defining a base building block, or at least made into a selectable component
of a building block.
Decision
In this case we will decide to start with the first alternative and define a building
block that can be used to scale to the second alternative.
Alternative 1
For this design, the architect can utilize a modular design where compute nodes
are connected to an IBM r-series, B08R, with 10/100/1000 MbE RJ45 modules
installed. This module assumes each server has three connections:
1 x Storage fabric connection
2 x Network connections (1 x server access connection, 1 x compute cluster
fabric connection)
292 IBM b-type Data Center Networking: Design and Best Practices Introduction
The compute module might look something like Figure 13-4.
Figure 13-4 Compute module with integration into existing DCN server access and storage infrastructure
Table 13-2 shows the initial compute cluster fabric hardware purchase list. This
allows for up to 288 compute nodes to be connected to the single r-series switch.
The complete design also needs to account for sufficient cabling to connect the
compute nodes to the switch, which are not included in this example.
2 Power cable
Alternative 2
To allow this HPC to scale up to 500 systems and beyond, the architect has to
decide upon a suitable time to split the infrastructure. For this example, we
assume that the next compute nodes were chosen to be used to expand the HPC
2.0 compute cluster fabric. For the next twenty systems to connect, the network
architecture looks similar to Figure 13-5.
Figure 13-5 HPC 2.0 with 500 compute nodes connecting to a scalable HPC cluster fabric
294 IBM b-type Data Center Networking: Design and Best Practices Introduction
In this case the architect deploys another B08R with the same configuration but
orders two 4 port 10 GbE modules to populate the two remaining slots on each of
the devices. An 8 port LAG can be configured to allow up to 80 Gbps of traffic to
pass through the two devices.
The IBM Ethernet products provide solutions for the network designs of today
and the designs of the future. All the IBM Ethernet products have 10 GbE
capabilities today, as well as support for IPv6, both dependent on options.
The publications listed in this section are considered particularly suitable for a
more detailed discussion of the topics covered in this book.
Online resources
These websites are also relevant as further information sources:
IBM System Storage hardware, software, and solutions:
http://www.storage.ibm.com
IBM System Networking:
http://www-03.ibm.com/systems/networking/
IBM b-type Ethernet switches and routers:
http://www-03.ibm.com/systems/networking/hardware/ethernet/b-type/in
dex.html
Brocade Resource Center:
http://www.brocade.com/data-center-best-practices/resource-center/in
dex.page
Brocade and IBM Ethernet Resources:
http://www.brocade.com/microsites/ibm_ethernet/resources.html
IBM System Storage, Storage Area Networks:
http://www.storage.ibm.com/snetwork/index.html
298 IBM b-type Data Center Networking: Design and Best Practices Introduction
Index
aggregation 33
Numerics aggregation switches 67
4003-M04 37, 53
algorithm 159, 188
4003-M08 37, 53
alternate path 147
4003-M16 37, 53
Anycast RP 52, 62
802.1d 143
application specific integrated circuit 123
802.1p 168
architecture 174
802.1Q 169, 224
ARP 77
802.1w 144
ARP cache poisoning 226
802.1x 223
ARP packet 226
802.1x supplicant authentication 223
ARP setup 222
802.3 115
ARP table 227
ARP traffic 193
A ARPANET 111
AAA 232 ASIC 123
Access Control Lists 41, 200, 225, 229 Asynchronous Transfer Mode 159
access layer 269 ATM 114, 156, 159
access point 151, 220 attacks 100, 224, 231
access port 225 authentication 225, 232
access tier 146, 268 authentication credentials 228
access tier availability 146 authentication request 222
Accounting 232 authentication server 232
ACL based mirror 216 Authentication, Authorization, and Accounting
ACL based rate limiting 177–178 (AAA) 232
ACL counting 180 authorization 232
ACL counting policy 177 authorized neighbors 228
ACL keyword 163 automated threat detection 101
ACL logging 191 auto-sensing 65, 86, 103
ACL look-up 196 auto-switching 65, 86, 103
ACL match 183 average rate 193
ACL rule 169
ACLs 41, 166, 191, 200, 225, 229
actions 180
B
backplane 40, 54
active link 149
backup interface priority 149
adaptive rate limiting 179
backup path 144
address 113, 156
backup router 148
Address Resolution Protocol 77
backup switches 144
address translation 139
bandwidth 46, 57, 114, 158, 161, 163, 173–175,
adjusted QoS 166
190, 193, 201
Advanced IronWare 82
bandwidth allocation 171
Advanced Layer 2 36
bandwidth constrained 189, 199
Advanced QoS 81
bandwidth limit 179
Advanced Research Projects Agency Network 111
bandwidth management 82
age out timer 222
300 IBM b-type Data Center Networking: Design and Best Practices Introduction
cluster fabric 283, 285–286, 289 counter 176
Cluster Fabric Architecture 286 CPU-bound traffic 191
cluster fabric architecture 286 credits 192
cluster systems 283 cross module trunking 50, 61
coaxial 111 crossbar 84
color based terminology 195 c-series 31, 123
Committed Burst Size 179, 192 c-series router 200
Committed Information Rate 179, 192, 201 c-series traffic policies 200
common connectivity 287 C-TAG 185–186
Communications Assistance for Law Enforcement Current-Q-Size 188
Act 100 Customer VLAN tag 185
community strings 233
compression 160
compromised port 101
D
DAI 226
compute nodes 292, 294
data center 33
conductor 113
data communications 113
configuration considerations 193
data flooding 221
configuring QoS 187
data flows 229
conformance level 178
data packets 167
congestion 157–158, 187–188
data traffic 194
congestion avoidance 159
data traffic type 194
congestion management 160
DCN core 277
Connectivity Check Messages 70
dead timer interval 146
Connectivity Fault Management 70
decode policy map 185
connectivity standards 28
decoded DSCP 183
connectors 111
decoded PCP 183
console port 220–221
dedicated bandwidth 156
console time-out 221
Dedicated link emulation 158
Content Addressable Memory 175
default gateway 122
contention 119
default mappings 167
Control 195
Default QoS mappings 185
control packets 168
default route 121
control traffic 168, 195
Deficit weighted round robin 159
converged 36, 49, 60, 95
DEI 182, 186
converged access layer 33
DEI bit 185, 187
converged applications 81
delay 156–157, 161
converged solution 95
delay budget 160
cooling system 43, 55
delays 157
copied-CPU-bound packets 191
delay-sensitive traffic 189
copper 97
Denial of Service 54
core tier 268, 273
Denial of Service (DoS) 41
corrupted 157
denial of service attack 41
CoS 163, 165
Dense 51, 62
CoS parameters 178
density 38, 53
cost difference 275
Dest Addr 115
cost effective 149
destination address 115, 230
cost reduction 33
destination IP address 122
cost requirements 33
destination MAC address 119
cost savings 136
Index 301
destination network 147 Dual Mode Ports 225
deterministic latency 39 dual stack 42, 49, 54, 60, 279
device reduction 138 DWRR 159
DHCP snooping 226 Dynamic ARP Inspection 226
differentiated service 161 dynamic infrastructure 33
differentiation 161 dynamic list 222
DiffServ 159, 163 dynamic routes 146
Diffserv 159 dynamic routing 147
Diffserv Control Point 167 dynamic routing protocols 147
digital assets 33
discarded 161
disruption 112
E
EBS 192
distance 269
edge aggregation 71
distributed queuing scheme 40
egress 71, 181, 185, 198
distribution board 143
egress interface 185, 198
distribution layer 273
egress procedures 199
distribution model 268
egress QoS procedures 187
distribution router 148
egress rate shaping 189, 200
distribution tier 148, 268, 275
egress traffic 184, 198
DNS 117
EIR 192
domain name system 117
EIR bucket 192, 201
DoS 41, 100
E-LAN 49, 70
dotted decimal format 116
election process 144–145
downtime 279
electrical connections 112
Drop Eligible Indicator 182, 186
electrical interference 112–113
drop precedence 181, 183–184, 194, 196–197
electrical properties 113
drop precedence values 188, 195
E-LINE 70
drop priority 189, 199
enable mode 221
dropped packets 176, 188
encrypted 221, 226
dropped traffic 163
encrypted format 233
dropping packets 160
encryption 140
drop-precedence 186, 198
endpoints 193
drop-precedence force 186, 198
engine 73
drop-precedence value 186
enhanced QoS 187, 198
DSCP 41, 163, 165, 167–168, 181, 196
enterprise access device 271
DSCP based QoS 167
enterprise access tier 271
DSCP decode 185
enterprise core tier 277
DSCP decode map 183
enterprise distribution location 274
DSCP Decode Table 185
enterprise distribution tier 273
DSCP Encode Table 185
enterprise site 273
DSCP field 163
EPL 49
DSCP forcing 183, 196
error rate 156
DSCP marking 169
Ethernet 114
DSCP priority bits 183
Ethernet intelligence 119
DSCP value 165, 167, 169, 196
Ethernet LAN 49
DSCP values 210
Ethernet Packet Delivery 118–119
dscp-matching 170
Ethernet Private Line 49
dual homing 146
Ethernet Switch 121
dual IPv4 and IPv6 279
302 IBM b-type Data Center Networking: Design and Best Practices Introduction
Ethernet switches 33 fixed-size cells 39
Ethernet Virtual Private Line 49 flaps 215
Ethernet/IP routers 33 flat fabric 287
EtherType 182, 196 flat HPC 289
E-TREE 70 flat HPC 2.0 287
ETYPE 186 flexible bandwidth 179
EVPL 49 flood 231
Excess Burst Size 192 flood packets 231
Excess Information Rate 192 flooding 41
excess traffic 176, 189 floor loading 143
exchange information 224 flow 156
exhaust temperature 98 force 181
existing storage fabric 291 force priority 187, 198
EXP 181 forcing 183–184, 196–197
EXP decode map 183, 185 forwarded 144
EXP Decode Table 185 forwarding 199
EXP Encode Table 185 forwarding cycle 167
EXP forcing 183 Forwarding Information Base 49, 60
EXP priority bits 183 forwarding information base 83
Explicit congestion notification 159 forwarding mode 223
extended ACLs 169, 229 forwarding priority 165, 167, 170
Forwarding Queue 165
forwarding queue 167, 170
F Foundry Direct Routing 49, 60, 83
fabric 39
fragmentation 160
fabric elements 38
fragments 160
fan controllers 43, 55
frame 114
fan module 43, 55
frame relay 156, 159
fan spin detection 98
frames 114
fan tray 43, 55
FRR 49
fans 43, 55, 72
full bandwidth 161
fast 51, 61
future-proofing 79
fast forwarding 51, 61
fast port 51, 61
Fast Reroute 36, 49 G
FastIron 162–163, 166–167 gaming 158
FastIron QoS 163 gateway address 122
FastIron traffic policies 177 GbE connectivity 288
Fault detection 102 gigabit Ethernet 282
fault detection 102 Gigabits 113
fault domains 120 global rate shaper 175
faults 70 goals 162
faulty NIC 120 good enough 289
FDR 49, 60, 83 government 276
FIB 49, 60, 83, 279 graceful restart 37, 71
Fiber 291 g-series 31, 95, 123
FIFO 160–161 guaranteed allocation 199
first in first out 160 guaranteed delivery 163
fixed rate limiting 173, 175, 179 guaranteed service 161
Index 303
guarantees, QoS 156 ICMP Code 230
ICMP Type 230
IDS 119
H IEEE 802.1ad 185
half slot design 42, 54
IEEE 802.1p 181, 194, 196
hard QoS 71, 158, 161
IEEE 802.1Q 181, 196
hard wired 272
IEEE specification 151
hardware based rate limiting 175
IETF-DiffServ 194
hardware forwarding queue 167, 169
IGMP 51, 61
hardware table 169
inbound traffic 177
hash key 228
incoming packets 169
header 115
inelastic 158
header overhead 160
Infiniband 282
head-of-line blocking 39
ingress 71, 181, 185, 198
heat dissipation 134
ingress data traffic 168
hello packet 144
ingress decode 182
hierarchical design 288, 291
Ingress decode policy map 183
high performance cluster 36
ingress decode policy map 187
High Performance Computing 282
ingress drop precedence 184
high priority 160
ingress interface 194
high priority packets 160
ingress pipeline engine 195
high speed 289
ingress port 183
high speed data transfer 289
ingress priority 183, 197
high speed link 149
Ingress QoS procedures 187
highest preference 167
ingress traffic 181
highest priority 149
Ingress traffic processing 195
hitless management 37, 71
initial bandwidth allocation 193
hitless OS 51, 61
Initial QoS Markers 195
hitless upgrade 153
inspect sFlow 101
hitless upgrades 37, 153, 271, 279
intercept 100
hop 163, 166
interference 112
hot standby paths 36
interleaving 160
hot swapped 273
internal forwarding priority 165, 167–168
HPC 282, 285, 291
internal forwarding priority mapping 165
HPC 2.0 282–283, 286
internal forwarding queue 165
HPC 3.0 282, 285, 290–291
internal priorities 189, 199
HPC clusters 285, 289, 292
internal QoS handling 194
HTTP 233
internal queue 185, 198
hub and spoke 112
internal queues 188
hub-spoke model 161
internal-priority-marking 170
Hybrid WRR 172
Internet Group Management Protocol 51, 61
intrusion detection 101
I Intrusion Detection System 15
IBM Ethernet 285 Intrusion Prevention System 220
IBM Ethernet Router B04M 37, 53 Intrusion Protection System 15
IBM Ethernet Router B08M 37, 53 IntServ 159
IBM Ethernet Router B16M 37, 53 IP ACLs 177, 200, 229
IBM versus Brocade 33 IP address 28, 116, 122
ICMP 230 IP based ACLs 168
304 IBM b-type Data Center Networking: Design and Best Practices Introduction
IP Differentiated services 159 Layer 3 Differentiated Service Code Point 163
IP header 116 Layer 3 IPv4 276
IP Integrated services 159 Layer 3 IPv6 276
IP phones 100 Layer 3 security 226
IP precedence 179 Layer 3 switches 123
IP routing 82 Layer 3 Switching 123
IP routing protocols 279 Layer 3 trusting 168
IP segment 148 Layer 4 security 229
IP Source Guard 227 Layer 5 security 231
IP Spoofing 228 layer solutions 33
IP-routed 156 layout 269
IPS 220 Leaky bucket 159
IPTV 71, 158 Legacy PoE devices 212
IPv4 292 LFS 102
IPv4 route protocols 279 limit action 180
IPv6 276 limited delivery 163
IronShield 360 security shield 101 line rate 176, 192
IronStack 97, 168 Link aggregation 50, 61
IronStack solution 98 Link Aggregation Group 70, 150
IronWare operating software 100 link capacity 161
isochronous state 157 link efficiency 160
Link Fault Signaling 102
link fragmentation 160
J link management 143
jitter 39, 81, 156–157, 161
LinkTrace Message/Response 70
jumbo 172
load 39
jumbo frames 51, 61, 114
load balancing 28, 148
Local Area Network 118
K Logical NIC sharing 28
Kilobits 113 loop 143
Loopback Message/Response 70
loss 161
L
L3 123 low latency 81, 289
Label Switching Router 36 low speed links 160
LAG 70, 150, 186, 198 lower capacity 161
LAG ports 186–187, 198 lower priority 160
LAN 118 lower queue 171
latency 81, 96, 156, 158, 289 lower ring ID 145
Lawful Intercept 100
Layer 1 Security 220 M
Layer 2 276 MAC 36, 113, 119, 122, 150
Layer 2 ACLs 225 MAC address 28, 41, 113, 118, 120, 163, 165
Layer 2 Class of Service 163 MAC Address Authentication 222
Layer 2 protection 144 MAC address spoofing 221
Layer 2 resiliency 137 MAC addresses 49, 51, 70, 75, 115, 119, 221
Layer 2 security 221 MAC authentication 51, 61
Layer 2 switching 51, 61 MAC entry 166
Layer 2 trusting 168 MAC filter override 216
Index 305
MAC filtering 51, 61 modules 270
MAC learning disabled 216 MPLS 36, 140, 159, 278
MAC port 51, 61 MPLS EXP 41
MAC Port Security 221 MPLS L3VPNs 49
MAC spoofing 114 MPLS packets 182–183
Maintenance End Points 70 MPLS Virtual Leased Line 49
Maintenance Intermediate Points 70 MPLS VPNs 49
malicious user 226 MPLS-TE 49
man-in-the-middle 226 MRP 49, 59, 70, 138, 145
mapping table 181, 196 MRP Master switch 138
marking 169–170 MRP-II 70
marking packets 184, 198 m-series 31, 123, 288–289
marking process 166 m-series Ethernet Routers 36, 52
master 148–149 m-series key features 36
matching packets 169 MSTP 51, 61, 144
Max-Average-Q-Size 188 MTBF 39, 54
maximum burst 193 MTTR 39, 54
maximum distance 272 Multicast 51, 61
Max-Instantaneous-Q-Size 188 multicast 82
MD5 hash 228 multicast packet 42
MD5 key 228 multicast support 71
Mean Time Between Failures 39, 54 multicast switching 71
Mean Time To Repair 39, 54 multicast traffic 71
mechanisms 159 multi-device port authentication 222
Media Access Control 113 multi-dimensional designs 295
MEF 49 multi-homing 146
MEF 14 70 multi-mode 292
MEF 17 70 multiple computers 287
MEF 9 70 multiple paths 147
MEF14 36 Multiple Spanning Tree Protocol 51, 61, 144
MEF9 36 multiple switching paths 39
Megabits 113 multipoint services 70
memory 44, 56, 65, 73, 89, 105, 279 Multiprotocol Label Switching 159
MEP 70 Multi-Ring Protocol 138
merge priority value 181, 183 Multi-Service IronWare 40, 54, 69, 71, 193
merging 183–184 multi-service networks 71
meshed environment 144 Multi-VRF 52
metrics 90
Metro Ethernet Forum 49
metro networks 70
N
NAC 100
Metro Ring Protocol 49, 59, 70, 145
Named ACL 230
microprocessor 123
Naming convention 33
MIM 226
NAT 139
Min-Average-Q-Size 188
neighbor 228
MIP 70
NetIron 162
mirror 100
NetIron c-series QoS implementation 194
modular architecture 40, 54
NetIron m-series QoS 181
modular building blocks 292
NetIron m-series traffic policies 190
module insertion 39, 54
306 IBM b-type Data Center Networking: Design and Best Practices Introduction
network access 286 packet QoS attributes 195
network access control 100 packet QoS parameters 195
Network Address Translation 139 packet retransmission 157
network bus 112 packet size 39, 114
Network connectivity 284 packets 39, 114, 144, 171
network edge 67 parallel 113
network elements 158 passive backplane 44, 56, 90
network hub 112 PAT 139
Network Interface Backup 146 paths 143
network interface card 111, 113 pay-as-you-grow 95
network monitoring 143 PBR 139
network protocol 114 PBS 180
network range 231 PCP 165, 181, 196
network security 120 PCP decode 185
network traffic 163 PCP decode map 183
network wide QoS 161 PCP Decode Table 185
next hop address 147 PCP Encode Table 185
NIB 146 PCP forcing 183, 196
NIC 111, 113, 118 PCP ID 186
NIC sharing 28 PCP priority bits 183
NICs 115 PCP value 196
non-blocking 39 Peak Burst Size 180
non-rack installation 37, 53, 79 Peak Information Rate 180
non-stackable mode 168 peer device 228
non-stop operation 40, 54 peer-to-peer 274
peer-to-peer connectivity 275
Per VLAN Spanning Tree 51, 61
O performance 33, 44, 56, 65, 67, 73, 89–90, 105,
one-second interval 176
156
OSI Layer 2 114
performance management 162
OSI Layer 3 114, 123
performance metrics 65, 105
OSI layers 220
physical 42, 55
OSI model 146
physical access 220
OSPF 37, 71, 147, 153
physical location 274
OSPF Authentication 228
PIM 51, 62
Outbound rate limiting 174
PIR 180
outgoing interface 170
Pkt-Size-Max 188
output port 40
plug and play 82
overflow 160–161
Pmax 188
overheats 86
POE 271
overlapping rings 70
PoE 33, 43, 55, 86
over-subscription 273
PoE daughter card 87, 104
PoE for VoIP 211
P PoE IP surveillance 136
pace traffic 161 PoE power 86
packet 112, 114, 194 PoE Power Saving Features 135
packet dropping 156 PoE Power Supplies 87
packet header QoS fields 195 PoE Priority 150, 152
packet is dropped 227 PoE supply 152
Index 307
poisoned ARP entry 226 primary switch 144
policers 41 prioritization 168–169
policies 162 prioritize traffic 167
policing 159–161 prioritized traffic 163
Policy Based Routing 139, 229 priority 159, 181, 186, 198
policy maps 182–183 priority 0 166
polling statistics 176 priority based rate shaping 189, 199
port address translation 139 Priority Code Point 165, 181, 196
port default priority 163 priority devices 152
port densities 36 priority force 186, 198
port density 273 priority handling 159
port flap dampening 102, 215 priority mappings 169
Port Loop Detection 102 priority parameters 193
port priority 165–166, 168 priority queue 40, 163, 174–175, 187, 190
port rate shaper 175 priority value 196
port security setting 222 priority values 210
port-and-ACL-based 191 priority-based limit 174
port-and-priority-based 174, 191 priority-based scheduling 189, 199
port-based 173–174, 191 priority-based shaper 200
port-based fixed rate limiting 174 private address space 139
port-based priority 170 promiscuous mode 119
port-based priority settings 169 protected link group 149
port-based rate shaping 200 Protected Link Groups 102, 149
port-based shaper 190 protocol flooding 231
port-level feature 175 protocol packets 191
ports 44, 56, 65, 73, 89–90, 105 Protocol-Independent Multicast 51, 62
power circuit 143 protocols 49, 59, 66, 75, 93, 107, 158, 276
power consumption 86, 134, 136 PSE 151
power efficiencies 136
power failure 143
power feed 143
Q
QoS 40, 50, 60, 77, 96, 156
power loss 89
QoS adjustments 162
Power over Ethernet 33, 43, 55, 65, 86, 103, 150,
QoS architecture 158
271
QoS attributes 195
power parameters 86–87
QoS bits 186
power per port 137
QoS configuration 187
power port priority 212
QoS enabled protocols 158
power reduction 211
QoS goals 158
power redundancy 98
QoS information 170
power requirements 212
QoS management 162, 167
Power Source Equipment 151
QoS mappings 165, 167
power supply placement 88
QoS marking 166, 194
power usage scenarios 136
QoS mechanism 159, 175
power utilization 134
QoS operation 198
power-consuming devices 104
QoS policies 157
precedence 163
QoS priorities 166
predictable manner 147
QoS procedures 198
preference 167
QoS process 159
primary port 198
308 IBM b-type Data Center Networking: Design and Best Practices Introduction
QoS queues 166 replicate data traffic 100
QoS Remarker 195 replication 41
QoS values 185, 198 resiliency 80, 112
qosp0 166 resilient networks 101
qosp1 167 Resource Reservation Protocol 159
Quality of Service 50, 60, 77, 96 response packet 139
queue 159 response time 162
queue cycles 171 restart time 40, 54
queue names 166 retraining staff 287
queue weights 171 retransmit 157
queueing method 172 reuse 287
queues 71, 157, 160, 166, 170 RFC 1918 116, 139
queuing 158 RFC compliance 67, 108
queuing algorithm 170 RFN 102
queuing mechanism 171 RHP 138, 145
queuing method 171–172, 175 RHP interval 146
queuing methods 170 RHP packet 138
RIB 49, 60
ring environment 145
R Ring Health Packet 145
RADIUS 232
Ring Hello Packet 138
random early detection 159
ring latency 146
random process 188
ring master 145
Rapid Spanning Tree Protocol 49, 59, 144
ring resiliency protocol 70
Rapid STP 51, 61
ring topology 138, 145
rate limit counting 179
ring-based topologies 70
rate limited 193
RIP v2 147
rate limiting 155, 162, 173–177, 180, 189, 191, 199
risk 33
rate limiting ARP packets 193
RJ45 112, 291
rate limiting bucket 192
rogue routers 227
rate limiting policy 173, 177
root guard 51, 61, 223–224
rate shaper 175
round robin 172
rate shaping 155, 173, 189, 199
round-robin fashion 172
rate-limited direction 176
route 146
rates 173
Route Distinguisher 140
RD 140
route table 229
recipient 119
routers 33, 149
RED 159
routes 279
Redbooks publications website 298
Routing 120, 275
Contact us xiv
routing conflicts 139
redundancy 37, 39, 43, 55
routing decisions 123
redundant design 288
Routing Information Base 49, 60
redundant fabric architecture 36
routing protocol updates 147
redundant power supplies 86, 152
routing protocols 69
refresh cycles 222
routing technology 71
region increments 113
RPF logging 191
reliability 70
RSTP 49, 59, 137, 144
Remote Access 231
RSVP 159, 161
Remote Fault Notification 102
RSVP-TE 159
Index 309
rule 229 Simple Network Management Protocol 233
single fabric 285
single fabric solution 290
S single tiered 289
safety 136
single-rate three-color marker 179–180
same fabric 290
single-token 175
scalability 70, 273, 289
SLAs 70
scalable 81
Slots 44, 56, 89–90
scalable family 42, 54
slow link 160
scheduling 158
slow recovery time 144
scheduling traffic 189
smaller HPC 287
seamless cluster 287
smaller packets 160
seat allocation 151
SNMP 233
second generation of HPC 283
snooping 51, 62, 77, 101, 226
Secure Console Access 220
socket 156
secure MAC addresses 222
soft QoS 161
Secure Shell 83
SONET 156
security 100, 136, 220
Source Addr 115
security environment 100
Source Address 230
security options 219–220
source guard 227
security settings 221
source MAC address learning 191
security suite 83
source stream 189
security zones 50
Source-Specific 51, 62
segment 121
SP 171–172
segmented 160
spanning tree environment 223
sensor readings 72
Spanning Tree Protocol 51, 61, 143, 150, 175
serial 113
Spanning Tree Root Guard 100
serial port connectivity 220
Sparse 51, 62
serialization delay 160
spatial 42
server access switch 122
Spatial multicast support 42, 54
service improvement 33
speed 160
service level 161
spoke 112
Service management 70
spoofing 114, 227
Service OAM Framework and Specifications 70
srTCM 179–180
Service VLAN tag 185
srTCM conformance 179
serviceability 39, 54
s-series 31, 123
services 49, 59, 75
SSH 232
session number 156
SSL 233
setting rate shaping 190
SSL protocol 233
sFlow 167, 191
stability features 102
sFlow packet sampling 83
stack 273
sFlow packets 167
stack topology 168
sFlow traffic sampling 101
stackable 167, 270
SFM 39
stackable topology 168
SFMs 38
stacked configurations 98
SFP 291
stacking 167, 270–271
shaper 175
stacking links 167
shaping 160–161, 189
stacking mode 168, 171
shielded 113
stacking protocol 167
310 IBM b-type Data Center Networking: Design and Best Practices Introduction
stacking technology 98 Tagged ports 50, 61
S-TAG 185–186 tags 185
standard ACLs 229 tail drop 160
standard protocol 116 taildrop policy 40
standards 49, 59, 66, 75, 93, 107 TCP 191
standby management module 40, 54 TCP rate control 159
standby mode 149 TCP/IP 111, 114
state transitions 215 telecommunications feeds 143
static 146 Telnet 232
static environment 221 thermal 42, 55, 85
static list 222 thermal parameters 64, 72, 85, 103
static routing 147 Threat Detection 101
Statistical Average-Q-Size 188 threshold 189, 199
statistical preference 161 tiered approach 291
steal 228 tiered architecture 290
still 119 Tiered HPC 289
storage fabric 283, 285, 291 tiered HPC 2.0 288
Store & Forward 44, 56, 73, 105 tiers 147, 268
STP 51, 61, 137, 143, 175 timed retries 168
STP Root Guard 223 token 175
Streaming multimedia 158 Token bucket 159
strict priority 40, 171–172, 199 Token Ring 114
Strict priority-based scheduling 189 Top-of-Rack 33
striping 39 Topology Groups 138
subnet mask 116 TOR 33
super aggregate VLANs 186–187 TOS 41, 159
supercomputer 282 ToS 167
superior BPDU packets 223 ToS field 167
super-user password 221 TPD 177–178
supplicant 223 T-piece 111
switch fabric modules 38 Track Port 144
switch queues 210 track priority setting 149
switches 33 tracked link 149
switching 118 traditional design 276
switchover 40, 54 Traffic 159
SYS 86 traffic 148, 156, 159, 169, 172, 225
SYS power supply 86 traffic characteristics 162
system balances 172 traffic class 178, 194
system power 86 traffic classes 163, 169
traffic congestion 188
Traffic Engineering 159
T traffic flow 149, 161, 175
table of MAC addresses 119
traffic intercept 100
TACACS 232
traffic management 71
TACACS+ 232
traffic manager 187
tag value 186
Traffic Monitoring 100
tagged 165, 193, 196–197
traffic policed 192, 201
tagged frame 181
traffic policers 41
tagged packets 191
traffic policies 177, 180
Index 311
traffic policing 191, 193 V
traffic policing policy 193, 201 vampire tap 111
traffic policy 178 video services 42
Traffic policy definition 177 video surveillance 95
Traffic policy name 177 Video teleconferencing 158
traffic scheduling 189, 199 VIP traffic 81
traffic types 194 virtual group 149, 225
traffic-shaping 158 Virtual Leased Line 36, 193
transaction rate 162 Virtual Local Area Network 137
transceivers 74, 92 virtual MAC address 148
transit 114 Virtual Output Queuing 40
triple play 95 Virtual Private LAN 36
trTCM 179–180 Virtual Private LAN Service 49
trunk group 50, 61 virtual private network 140
trunks 50, 61 Virtual Router and Forwarding 138
trust criteria 165 virtual router group 148
trust level 163, 165 Virtual Router Redundancy Protocol 147
trusted port 227 Virtual Routing and Forwarding 52, 140
Tunnelling 139 Virtual Switch Redundancy Protocol 49, 51, 59, 61,
twisted pairs 112 137, 144, 225
twisting 112 virtual switching 29
two-rate three-color marker 179–180 virtualization 138
Type of Service 159, 167 virtualization software 221
VLAN 137, 168, 183, 224
VLAN circumvention 221
U
UDLD 102 VLAN group based 191
UDP 191 VLAN group-based policies 191
unauthorized traffic 119 VLAN Security 224
underground and 275 VLAN tag 168
unicast address 148 VLAN tagging 51, 61
unicast protocols 71 VLAN tags 224
unicast traffic flooding 231 VLAN-based 191
Unidirectional Link Detection 102 vlan-cpu-protection 194
uniform traffic flow 189 VLANs 51, 61, 275
uninterruptible power supply 143 VLL 49, 193
unique MAC address 221 VLLs 36
untagged 193, 196–197 vNIC 29
untagged interface 186 Voice over IP 49, 60
Untagged ports 224 VoIP 49, 60, 95, 136, 156, 158, 221, 225, 273
uplink 273 VoIP data 216
uplinks 269 VoIP packet 160
UPS 143 VoIP stream 216
uptime 70, 278 VoIP traffic statistic 211
user access switch 122 voltage 89
User data packets 167 VPLS 41, 49
user ports 273 VPLSes 36
VPN 50, 140
VRF 138, 140, 279
VRID 225
312 IBM b-type Data Center Networking: Design and Best Practices Introduction
VRRP 147–148
VRRP Authentication 227
VRRP Extension 148
VRRP group 227
VRRP hello packet 148
VRRP master 148
VRRP priority 148
VRRP session 148
VRRP virtual MAC address 148
VRRPE 144, 147–149
VRRPE interface 149
VSRP 49, 51, 59, 61, 137, 144–145, 150, 225
VSRP switches 144
VSWITCH 29
vulnerable tier 146
W
WebTV 95
weight 172
weight based bandwidth 199
weight distribution 199
weight-based scheduling 199
weight-based traffic scheduling 199
Weighted 40
Weighted fair queuing 159
weighted fair queuing 40
weighted queuing 171
weighted random early detection 159–160
weighted random early discard 187
weighted round robin 159, 170
weights 170–171
WFQ 159, 189, 199
WFQ destination-based 199
WFQ weight-based traffic scheduling 189
wired networks 146
wireless access 95
wireless networks 146
wire-speed 292
Wq 188
WRED 40, 159–160, 187–188
WRED algorithm 188
WRED calculations 188
WRR 81, 159, 170, 172
X
X.25 159
xbar 84
XFP 291
Index 313
314 IBM b-type Data Center Networking: Design and Best Practices Introduction
IBM b-type Data Center Networking: Design and Best Practices Introduction
(0.5” spine)
0.475”<->0.875”
250 <-> 459 pages
Back cover ®
IBM b-type
Data Center Networking
Design and Best Practices Introduction ®