Docu48800 Detailed Link Aggregation Configuration
Docu48800 Detailed Link Aggregation Configuration
Contents
Introduction Terminology References Link Aggregation Types Topologies Direct Connect Private Network Local Network Remote Network Data Domain Link Aggregation and Failover Bond Functions Available in Linux Distribution Hash Methods Used Link Failures Other Link Aggregation Cisco Sun Windows AIX HPUX Data Domain Link Aggregation and Failover in the Customers Environment Normal Link Aggregation Failover of NICs Failover Associated with Link Aggregation Recommended Link Aggregation Switch Information
Introduction
This document describes the use of link aggregation and failover techniques to maximize throughput and keep an interface up on networks with Data Domain systems installed. The basic topologies are described with notes on the usefulness of different aggregation methods, so the right method can be chosen for a specific site. The goal of Link Aggregation has two purposes: 1. Evenly split the network traffic across all the links or ports that are in the aggregation group 2. Continue to transfer data over connections even though a link fails With Link aggregation, failover of a link is provided with degradation. For example, if two 1 Gb links are aggregated to obtain a throughput of 1.8 Gb/second and one link goes down then the data transfer will continue but only up to 0.9 Gb/s over the single link that is still available. With failover only one link, referred to as the active link, is used at a time. The other links in the group are idle. The maximum throughput is the speed of a single link. If the active link fails then the traffic will continue on another link without losing the connection. In
both cases, link aggregation and failover, the failure would be due to loss of carrier, but when using Link aggregation with LACP the failure can also be the loss of a heartbeat of communication. Normally the aggregation is between the local system (i.e. DDR) and a network device (e.g. switch or router) or another system (e.g. media server) that it is directly connected. The goal of purpose number 1 is to achieve the maximum throughput across all the links that are aggregated. There are a few things that can impact how well the aggregation actually performs. 1. 2. 3. 4. 5. 6. 7. Speed of the switch How much the DDR can process Network overhead Acknowledging and coalescing to recover out of order packets Aggregation method may not effectively distribute the data evenly across all the links Number of clients Number of streams (connections) per client
For impact 1, normally the switch can handle the speed of each link that is connected to it, but it may lose some packets coming from several connected ports when sending to one uplink depending on the uplink speed. Note: this implies that only one switch can be used for port aggregation coming out of a system. For most of the implementations this is true, but there are some network topologies that allow for link aggregation across multiple switches, especially in conjunction with virtual switches or virtual COM ports. Impact 2 addresses the DDR systems. DDR systems and programs processing rate is limited. As the hardware gets faster and the use of parallel processing improves, DDR systems will support a higher network throughput, but as the processing speed increases the network link speed will also increase. For example, with the current systems it makes sense to aggregate 1 GbE links but not 10 GbE links because one 10 GbE can provide enough data to saturate the processing power of some of the current and older DDR systems. As the system speed improves it will make sense to aggregate 10 GbE links, but the link speeds will also increase. Impact 3 addresses the inherent overhead of the network programs. This overhead will guarantee that the transfer speed will never reach 100% of the line speed. The throughput will always be reduced by the overhead it takes to queue and send a packet of data through the system until it is put onto the wire. There is an inherent delay separating the sending of packets on Ethernet. The overhead is expected to cause 5 - 10% reduction in line speed per link. Impact 4 deals with the case that the packets may get out of order. The network program will need to coalesce out of order packets into the original order. If the link aggregation mode allows the packets to be sent or received out of order and the protocol requires that they be put back into the original order. It may also look like a lost packet cauing recovery techniques. This added overhead will impact the throughput speed to the point where the specific mode of link aggregation that causes out of order packets should not be used. Impact 5 has the biggest impact on how effective the data is split across the network links and therefore able to get the full throughput of the multiple links. Impact 6: Can a single client drive data fast enough to fully utilize multiple aggregated links? In some older systems, either the physical or OS resources cannot drive data at multiple Gbp/s. Also, due to hashing limitations, multiple clients may be required to push data at those speeds. For example if the mac address is used for the hash then more then one client will be needed to distribute the network traffic across the aggregated interfaces. Impact 7: The number of streams, which translates to separate connections, can play a significant role in link utilization depending on the hashing that is used. For example, if the hashing is done on TCP port numbers more connections will allow better distribution of the packets. A final impact deals with the effectiveness of the aggregation method used. If two systems are connected together by direct connect cables, the use of Layer 2 (MAC) hashing would not provide any aggregation at all. All the packets would go over the same link. If Layer 3 (IP) hashing is used the same thing would result unless IP aliasing or VLAN tagging is used to have multiple IPs for one interface. In that case Layer3 hashing could be used and in fact the IP addresses could be selected to guarantee the distribution of packets across the
interface. In general the number of systems that will be communicating with the Data Domain system will be small. So the aggregation method used will need to work for a limited number of client systems. The number of links that are aggregated will depend on the switch performance, the DDR system and application performance and the link aggregation mode used. There is no absolute limit on the DDR software, except for the actual number of physical links, as long as the switch or whatever it is connected to can handle the number of ports trunked together.
Terminology
The following are terms that number of links that are aggregated will depend on the switch performance, the DDR system and application performance and the link aggregation mode used.
DDR
Data Domain appliance, a Linux system used to perform only Data Domain operations.
Bond, Bonding
This is a term used by Linux community to describe the grouping of interface together to act as one interface to the outside world. The DDR uses a subset of the bonding available from Linux.
EtherChannel
This is a term used by Cisco to define the bundling of network links as described under Ethernet Channel. With Cisco there are three ways to form an EtherChannel: manually, automatically using PAgP, and automatically using LACP. If it is done manually both sides have to be setup by the administrator. If one of the protocols is used, the specific packets with the specific protocol are sent to the other side to where the EtherChannel is setup based on the information in the packets.
Ethernet Channel
This is multiple of individual Ethernet links that is bundled into a single logical link between systems. This provides a higher throughput than a single link does. The term used by Cisco to identify this is EtherChannel. The actual throughput is dependent on the number of links bundled together, the individual link speed of the individual links and the switch or router that is actually being used. If a link within the Ethernet Channel fails the normal traffic over the failed link is sent over the remaining links within the bundle.
LACP
Link Aggregation Control Protocol (LACP) provides a dynamic network aggregation as defined in IEEE 802.3ad standard (IEEE 802.3 clause 43). This is not available in DDOS 4.9 and before. In DDOS 5.0 LACP is only available for the 1 Gb interfaces. It is not available for 10 Gb until 5.1. This is not available for Chelsio NICs.
Link Aggregation
Using multiple Ethernet network cables or ports in parallel, Link Aggregation increases the link speed beyond the limits of any one single cable or port. Link aggregation is usually limited to being connected to the same switch. Other terms used are EtherChannel (from Cisco), Trunking, Port Trunking, Port aggregation, NIC bonding, and Load balancing. There are proprietary methods that are used, but the main standard method is IEEE 802.3ad. Link aggregation can be used for a type of failover too.
Load Balancing
Aggregation methods used to try to distribute loads across all available links or ports. This term is applied to switches and routers and with Cisco it is generally a global parameter applied across all ehterchannels.
Round Robin
In Linux round robin sends packets in sequence to each available slave that is available. This provides the best distribution across the bonded interfaces. Normally this would be the best aggregation to use, but the throughput can suffer because of packet ordering.
RSTP
Rapid Spanning Tree Protocol, IEEE 802.1W, allows a network topology with bridges to provide redundant paths. This allows for failover of network traffic among systems. This is an extension to the spanning tree protocol (STP). The two names are used inter-changeably.
TOE
TCP Offload Engine Network cards (NIC) that have the full TCP/IP stack on the card.
Trunking
Trunking is the use of multiple communication links to provide an aggregated data transfer among systems. For computers this may be referred to as port trunking to distinguish from other types of trunking such as frequencies sharing. Note: Cisco uses the term trunking to refer to VLAN tagging not link aggregation, whereas other vendors use this term in reference to link aggregation.
References:
Catalyst 4500 Series Switch Cisco IOS Software Configuration Guide (also used for the 4900 Series Switch too) Release 12.2(44)SG, available from the Cisco Documentation site. Cisco Documentation, http://www.cisco.com/univercd/home/home.htm IEEE 802.3 Standard http://standards.ieee.org/getieee802/802.3.html Also available under: http://iweb.datadomain.com/eweb/technical_library/Vendor/Cisco/ IEEE 802.3ad Standard is Clause 43 under IEEE802_3-sec3.pdf of the standards documents listed. Linux distribution documentation, http://www.kernel.org/ Linux Ethernet Bonding Driver HOWTO, http://www.cyberciti.biz/howto/question/static/linux-ethernet-bonding-driverhowto.php, http://www.cyberciti.biz/tips/linux-bond-or-team-multiple-network-interfaces-nic-into-single-interface.html Linux Ethernet Bonding Driver HOWTO: http://www.cyberciti.biz/howto/question/static/linux-ethernet-bonding-driverhowto.php, http://www.cyberciti.biz/tips/linux-bond-or-team-multiple-network-interfaces-nic-into-single-interface.html Wikipedia, http://en.wikipedia.org/wiki/Main_Page Various links on the web as noted within the document by hotlinks
There is also the question of when to use failover since aggregated interface handle failures. The link aggregation modes include an failover component by allowing data transfer to continue in a degraded state on less of the bonded interfaces. For example, one of the links goes down the link aggregation can recognize this and drop that link from the aggregation list and continue with one less link. The customer may feel full failover is more important than link aggregation and would rather have no degradation. Instead of aggregating over multiple links, these links can be configured in full failover mode where idle spares that carries no data would be setup until the active link fails. This way there would be no degradation of throughput if the one link fails and data is sent over the other. One or more would be kept in a standby mode until it is needed. A strong reason to use failover instead of link aggregation is the setup of parallel network paths. In this configuration there can be two switches and if a component in one path goes down the data traffic is moved to the other port and switch. Failover can fail across switches while except for special cases with virtual switches Link Aggregation must be setup with interfaces connected to the same switch. A caveat for this is the if a port fails and not the swicth then the new switch will need to be able to route the data to the target destination and the destinaion will need to send the packets to the new switch. Administration network interface is also needed with DDRs. For direct connections and one to one server connections there is a separate Ethernet interface for this, but this could also be part of the link aggregation unless there is a physical separation needed between the links.
Topologies
The basic types of network topologies are described below, along with their differing suitability for various types of aggregation methods.
Direct Connect
The Data Domain system is directly connected to one or more backup servers. To be able to provide link aggregation within this topology will require multiple links between each backup server and the Data Domain system. Usually link aggregation is not done with this topology, especially with multiple backup servers, because of the limited number of links available on the Data Domain system and one system does not drive data faster than one link especially if the link is 10 Gb. In the picture shown, there are two interfaces being aggregated (the blue lines). For backups the important hashing is done on the Media server beause the direction of the majority of data traffic is flowing from the Media Server to the DDR. In this case it may be useful to check if round robin can be used and evaluate if the performance is adequate. Otherwise consider multiple connections using IP alias.
Business Servers
Tape Library
Private Network
This topology is the same as the direct connect except the connections are through a switch rather than being directly connected. This would normally be used to connect multiple media server to multiple DDRs. The link aggregation would be between a DDR and the switch or between a media server and the switch. The aggregation would be to get the data to and from the switch. In this case the aggregation between the DDR and the switch would be independent of the aggregation used between the media server and the switch. Note: there is a possible special case where the switch would be only a pass through and would be transparent to the aggregation. That would not be the norm and is discussed in further detail later. In this case, if there is one media server, the use of multiple TCP connections and a load balanceing of src-dst-port on the switch may be the best choice.
Data Domain
Network switch
Backup/media server
Local Network
The Data Domain system is connected to the backup server through a common switch/router which is shared by many other systems. In the previous network topologies shown the Data Domain system may have a connection through the common switch to handle administration and maintenance tasks which need not be part of the aggregation. In this example the data is also being sent through the shared network. The setup of the aggregation is the same as with the private network, but it opens the possibility of a lot more media servers being available to backup to the DDR. This may make allow load balancing of layer 2 (mac address) more feasible.
Network switch
Business Servers
Remote Network
This is similar to the local network except that connection is through a router before it gets to the media server or other DDRs in the case of replication. There will normally be switch in between the DDR and the router unless the router also provides switch functionality. What is important to note in this diagram is that there is a gateway function that is involved in the network data flow. It is important to maximize the data throughput between the DDR and the media servers, but if there is a WAN involved one 1 Gb interface is normally enough because of banwidth limitation. Normally for performance reasons the DDR will be located on the same LAN and use the same switch as the media server. There may be cases where some of the media servers may be on separate LANs. The DDR would need to go through at least one gateway to get to them. It is not expected that the media servers will go across a WAN to get to the DDR, although there are some WANs that are high speed. In which case the backup could go over the WAN. The WAN topology is likely to be the case for DDR with replication. Normally the data flow in replication is low enough where it does not need aggregation, but MANs and some WANs are getting faster to where this configuration may be needed. Otherwise the WAN would tend to make aggregation ineffective. Yet there are customers that have asked about it. One reason is that it provides redundancy (failover).
Data Domain
Network router
Network router
Complex bonding
Starting with the DDOS 5.1 release there is the ability of creating failover on an aggregation group. In this case the failover slaves are virtual interfaces containing aggregations. The primary use of this is to allow the use aggregated interfaces, but to also have a full switch failover. In this diagram there are two parallel paths. If the path being used is the switch on the left the failover on the data domain system till switch to using the switch on the right. In this case the carrier controls the failure and the carrier to both the interfaces to the switch on the left have to go down. Otherwise it will run over one of the interfaces on the left in a degraded mode.
Aggregated interfaces
It is important to realistically set the expectation for this setup. Does the Media Server support this? if not then it would only have failover across single links and without multiple Media Servers it does no make sense to have aggregated links on the DDR. Another point, if one of the aggregated links goes down, the failover will not happen rather theother link(s) will handle all the traffic in a degraded mode. If both links go down then the failover will kick in but since the switch itself did not fail the Media Server side will still use the first switch. So the packets will need to be routed between the switches. Otherwise the connections will fail. Also if LACP is being used and the carrier stays up, but the heartbeat fails on both the interfaces the failover (which is based on carrier) will not change the the second switch and communication will stop. It turns out that this is only really effective if the whole switch goes down instead of part of it.
For DDOS version 5.0 and later with slot based port naming: net aggregate add veth0 mode balanced hash xor-L3L4 interfaces eth4a eth4b The aggregation used with this command will also be balanced-xor. The packets are distributed across eth2 and eth3 (or eth4a eth4b for later releases with slot based naming) based on the XOR of the source IP address, destination IP address, source port number, and the destination port number. The result gives a number in which the lowest bits are used to determine which link to use to send the packet. For this example an even result will go over one and an odd result will go over the other. With three links the result is divided by 3 with the remainder determining which interface to use. This aggregation would be used when there are a lot of connections (there is one connection per stream) or a lot of media servers or both. This is the mode of choice for Data Domain, but some switches do not support this type of hashing. For DDOS version 5.0 and later with slot based port naming: net aggregate add veth0 mode lacp hash xor-L3L4 interfaces eth4a eth4b The aggregation used with this command will also be lacp-xor. The packets are distributed across eth2 and eth3 based (or eth4a eth4b for later releases with slot based naming) on the XOR of the source IP address, destination IP address, source port number, and the destination port number. The data flow control follows the same mechanism used by balanced mode except it adds a control protocol to monitor the interfaces with a minimal amount automated administration of the interfaces including better sensing when a interface failure. The sensing goes beyond the sensing of carrier loss to the sensing of the ability to send and receive data. The heartbeat can be sent out every second or every 30 seconds. The default is every 30 seconds. The interval determines how fast the bonding will sense that the link is no longer communicating and will stop using the interface. Once every 30 seconds is less invasive, but it will take longer to make the link as down and there may be connection timeouts in while it is waiting. Net failover add veth0 interfaces eth2 eth3 This is not aggregation but the command will group together interfaces eth2 and eth3 (or eth4a eth4b for later releases with slot based naming) for failover. There is only one failover type supported. If the active physical link goes away (determined by loss of carrier) the data is sent to the second physical link. The active interface is determined by which link comes up first when it is setup. This is nondeterministic. It is dependent on several factors such as switch activity, network activity, and which interface is brought up first when they are enabled. The active one can be determined by specifying one of the links as primary. The primary interface will always be set as active if it can be UP and RUNNING. A down time and up time can also be specified. The time is set to the multiple of 0.9 seconds that is nearest but not greater than the value specified. For example, if 1.5 second is specified the actual value used is 0.9 seconds. For the command the actual value used is given in milliseconds so 1.5 seconds would be 1500 and the value used for this is 900. The up and down timer values are also used in the net aggregate commands too. The new hash of L2L3 which was released in DDOS 5.0 is the use of a combination of the source and destination mac and IP addresses to determine the interface to send the packets. .This gives more flexibility to in getting the data to be better aggregated.
Mode Options
1. balance-rr or BOND_MODE_ROUNDROBIN (0) - known as the roundrobin mode on the DDR Aggregation using Round Robin Failover with degradation Normally a good type to use with direct connect or something equivalent To get full matching aggregation both ends of the link needs to be set up to use round robin 2. active-backup or BOND_MODE_ACTIVEBACKUP (1) - known as failover on the DDR
Failover method used by Data Domain Works only when one or more standby links are in the group There is one active and all others in the group are stanby The active link is non-deterministic unless a primary is specified 3. balance-xor or BOND_MODE_XOR (2) - known as the balanced mode on the DDR Send transmit to a specific NIC based on specified hash method being used Default (Source MAC address XOR Destination MAC address) modulo size of aggregation group Note: this only aggregates transmissions. The receive needs to be aggregated on the other end This mode is referred to as static because of the manual setup that is needed. 4. 802.3ad or BOND_MODE_LACP (4) - know as the lacp mode on the DDR Send transmit to a specific NIC based on specified hash method being used Default (Source MAC address XOR Destination MAC address) modulo size of aggregation group Note: this aggregates transmissions and actively checks the aggregated interfaces. The receive needs to be aggregated on the other end This configuration js initially set manually, but is maintained automatically.
Assign a different IP address to each link and setup up each media server to send data to one unique IP address on the media server. That way the throughput will approach 4 times a single link speed verses around 2.5 times if aggregation is used. This is very dependent on the expected traffic pattern from the media servers.
Link failures
A link can fail at several places. It can occur in the driver, the wire, the switch, the router, or the remote system. For failover to work the program (this is the bonding module in the Data Domain case) must be able to determine that a link to the other side is down. This information is normally provided by the hardware driver. For a simple case consider a direct connect were the wire is disconnected. The driver can sense that the carrier is down and will report this back to the bonding module. The bonding module will mark it as down after the "down" time has expired and switch to a different link. The bonding module will continue to monitor the link and when it comes back up for the "up" time it will mark it as up. If the restored link is marked as the primary the data will be switched back to using that link again. Otherwise the data flow will stay on the current link. Note: the failover method that is currently supported is for directly attached hardware. The driver can sense when the directly attached link is no longer functioning, but beyond that it gets a little harder. Consider the case that there is a switch and the failover is between two different switches with no routing. Can the driver determine that the connection to the remote system has failed and therefore it needs to switch to the backup link going through the other switch? This is possible if the switch provides a link fault signaling similar to what is defined in IEEE 802.3ae. This is supported by the Fujitsu 10GbE switch and a similar thing is supported by Cisco. This is rather limited network topology where the systems are directly connected via switches and there are no other routes available. This would be an extension of the direct connect to the media server. Currently the driver and the bonding module does not support the link fault signaling because it is not widely available too limited of a network topology There are two types of failover. One is failover to a standby interface. The standby interface is not being used until a failure happens and the traffic is redirected to the standby link. This is a waste of resources if there is never a failover. This is the method used by Data Domain when the bonding method failover is specified: net add failover veth1 interfaces eth3 eth5 Another type of failover is failover with degradation. In this method there is no standby. All the links in the group are being used. If there is a failure the failed link is removed and the rest of the network traffic from that link is redirected to the other links in the group. This is the failover associated with link aggregation, but it can become complex if the bonding driver has to determine if a path to the target system no longer exists and it will to not send data to that link. There is also the question as to what bad is the failure. Maybe the carrier stays up but the data still fails to get transferred or the data has too many CRC or checksum errors to be effective. The lacp mode (available in DDOS5.0 for 1 Gb and DDOS 5.1 for 10 Gb) is the only mode that can determine this and mark the interface as down. Note that it is not up to the DDR to detect and adjust the network flow for a failure beyond the switch or router. That is a network/switch/router problem to come up with an alternative path. The DDR only senses the network between its local interface and the switch or system it is connected to.
Cisco
Some of the older Cisco switches and routers only support the older proprietary protocol, PAgP. The Data Domain system will not support this type of aggregation. Fortunately, the newer switches and routers support
the IEEE 802.3ad standard. When using Cisco switches and routers the IEEE 802.3ad should be used with Layer 3 and 4 hashing. It may be possible in some cases to set the aggregation with PAgP to round robin, but that is not currently supported for the DDR when connected to a switch or a router because of through put delays from potential packet ordering issues. At high speeds with fast retransmissions out of order packets can generate many more packets which would decrease the overall performance.
Nortel
Nortel supports an aggregation called Split Multi-Link Trunking which uses LACP_AUTO mode link aggregation
Sun
The initial version 10 Solaris and earlier models supported Sun Trunking. Later releases of Solaris 10 and beyond support IEEE 802.3ad standard in communicating with switches. Back-to-back link aggregation is supported in which two systems are directly connected over multiple ports. The balancing of the load can be done with L2 (MAC address), L3 (IP address), L4 (TCP port number), or any combination of these. Note the DDR currently only supports L2 or L3+L4. Link aggregation can run in either passive mode or active mode. At least one side must be in active mode. The DDR always uses active mode. Sun trunking supports round robin type of aggregation. This type of aggregation could be used if the DDR is connected directly to a Sun system. For more information on Sun Aggregation refer to the following: http://docs.sun.com/app/docs/doc/816-4554/fpjvl?l=en&q=%22link+aggregation%22&a=view For more information on Sun Trunking refer to the following: http://docs.sun.com/source/817-3374-11/preface.html
Windows
Microsofts view of Link aggregation is that it is a switch problem or a hardware problem. So Microsoft feels that it should be handled by the switch/router and the NIC card. There is nothing in the OS that directly supports it. Rather if the customer wants it they should get NIC cards that support it and either have a special driver to initiate it or use the switch to drive it. In the current documentation for their server 2008 they refer to the support of PAgP an old proprietary Cisco aggregation protocol: http://blogs.technet.com/winserverperformance/ They also refer to Receive-Side Scaling (RSS): http://www.microsoft.com/whdc/device/network/NDIS_RSS.mspx This refers to a way to allocate a program to handle packets across NIC cards which are normally tied to specific CPUs. There are drivers from outside of Microsoft that at least provide passive IEEE 802.3ad support if not active. Passive support means that the Windows system will respond to the to the IEEE 802.3ad protocol packets, but it will not generate them. For direct connect this may be the only way to have a directly connected aggregated link. The following link provides Microsofts view of servers for 2008: http://technet2.microsoft.com/windowsserver2008/en/library/59e1e955-3159-41a1-b8fd-047defcbd3f41033.mspx?mfr=true If the Windows serv er is not directly connected then it is not important to the DDR system if or how Link aggregation is provided by Windows. That would be between the windows server and the switch/router. It is still TBD for more specific information on which NIC cards support Link aggregation.
AIX
According to an AIX and Linux administration guide AIX supports EtherChannel and IEEE 802.3ad types of link aggregation as mentioned in the RSCT administration guide: http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cluster.rsct.doc/rsct_aix5l53/bl5adm05/bl 5adm0559.html
When using DDR, the round robin available through the EtherChannel can be used when directly connected. IEEE 803.3ad can be used if Layer 4 hashing is included. If it is not directly connected then it is dependent on the switch or router being used. AIX uses a variant of EtherChannel for backup, referred to as EtherChannel backup. This is similar to the active backup supported by the Linux bonding driver and does not need any handshake from the equipment connected to the links except to have multiple links available.
HPUX
The link aggregation product is referred to as HP Auto Port Aggregation (APA). As with the Link bonding this product also provides either a full standby failover or a degradation failover by overloading other links with in an aggregation group. The aggregation can use Layer 2, Layer 3, and/or Layer 4 hashing for aggregating across the links. It also supports the IEEE 802.3ad standard. A summary of the product is given here: http://h20392.www2.hp.com/portal/swdepot/displayProductInfo.do?productNumber=J4240AA The administration guide can be found here: http://docs.hp.com/en/J4240-90039/index.html According to the administration guide, direct connect server to server is supported, but round robin type of aggregation does not seem to be. This is further brought out in figure 3-4 in the document where for direct connect it is recommended to have many connections for load balancing to be effective. With round robin multiple connections are not required for effective aggregation. With this understanding the HPUX systems would not support round robin with a directly connected system
aggregation used is lacp and the switches are interconnected to share information. Cisco uses this to provide good data flow along with a failover ability. The packet routing is dependent on the best path, the least used and the concern with packet ordering. If one of the switches goes down or becomes unavailable the oackets are directed to the other. Another case is where the switches operate at a physical level where data is passed directly through to the remote system. It acts like a direct connect even though the systems can go over a WAN. The following diagram illustrates this. eth2 en5
In this case the aggregation is passed through ot the remote system and it prevides an increase throughput but also failover. Normally lacp is used in this case to provide a heartbeat falover. The challange with this is which transmit hash to is used. Multiple connections are used along with multiple IP addresses either from VLAN or IP alias. If one of the WAN links becomes unavailable both sides becomes aware of this and sends everything over the other link. The main concern here is handling the latency of the WAN using the heartbeat.
Failover of NICs
With the special aggregation cases the dependency on failover is reduced. Failover is still simpler to setup and easier to maintain, but as networks get more comples the use of lacp offers better automation of handling the conditions.
4. failover (if aggregation cannot be used) Private Network (more than 4 active client and the network path as no gateways) 1. mode lacp if supported (balanced if not), using hash xor-L2 2. mode lacp if supported (balanced if not), using hash xor-L3L4 3. separate NIC per media server (if there are enough NICs) 4. failover (if aggregation cannot be used) Local Network (less than 4 active client or route has gateways in the path) 1. separate NIC per media server (if there are enough NICs) 2. mode lacp if supported (balanced if not), using hash xor-L3L4 3. mode lacp if supported (balanced if not), using hash xor-L2 (if there are a suitable number of active clients) 4. failover (if aggregation can not be used) Local Network (more than 4 active client and the network path as no gateways) 1. mode lacp if supported (balanced if not), using hash xor-L2 2. mode lacp if supported (balanced if not), using hash xor-L3L4 3. separate NIC per media server (if there are enough NICs) 4. failover (if aggregation can not be used) Remote Network (normally through gateway and routers) 1. separate NIC per media server (if there are enough NICs) 2. mode lacp if supported (balanced if not), using hash xor-L3L4 3. failover (if aggregation can not be used) Note, hash xor-L2L3 can be substituted for xor-L2.
Switch information
Link aggregation is setup on both sides of a link. The link aggregation does not necessarily have to match on both sides of the link. For example, the DDR may be set to xor-L3L4 but the switch may be set to src-ip. A good rule of thumb to follow is to keep the aggregations close, such as xor-L3L4 on the DDR and src-dst-port on the switch. The reason for this is that if an aggregation is good enough for one direction it is good eough for the other direction. Aggregation on the switch is used to distribute traffic being received by the DDR. If the main set of operations being done is backup the switch aggregation is very important. Backup network traffic is mostly data being received by the DDR. Because of the limited number of clients communicating with the DDR the recommended aggregation method is balance-xor with Layer 3+4 hashing. To support this, the device directly connected to the DDR, e.g. switch or router (see the Normal Link Aggregation ), needs to support src-dst-port or at least src-port load balancing. This section uses the vendors documentation to provide potential switches that may work with the Layer 3+4 hashing and also some that may not. There are no plans to validate or certify these. The final authority whether a switch supports the desired aggregation is to physically try it. For example, there is at least one case where round robin was desired and tried and it worked satisfactory even though it is listed that it is not supported. Note again, even though round robin may be supported by a switch the aggregation performance is poor or even worst then not having it. This is mostly due to the out of order packets. Note: There are few switches that supports layer 3 + 4 aggregation. The supported aggregation may be for layer 3 only or layer 4 only. Matching layer 4 (port aggregation) with layer 3 + 4 (IP address and port aggregation) is not a problem, but be aware that it may cause data to be sent on one link and received on a different link, but the concern of out of order packets shold not occur. Which link the data is sent on is not important as long as all the data associated with a connection is sent on the same link. Definitions: Dest := Destination
IP := IP address L4 := Layer 4 of the network stack, i.e. TCP MAC := mac or hardware address Port := TCP port number Src := Source SW := software Switch brand & model Cisco Catalyst 6500 CatOS Cisco Catalyst 6500 IOS Cisco Catalyst 3560 Cisco Catalyst 2960 Cisco Catalyst 3750 Cisco Catalyst 4500/4948/4924
Switch Vendor SW Release Src MAC Dest MAC SrcDest MAC Src IP Dest IP SrcDest IP Src L4 Port Dest L4 Port SrcDest L4 Port Round Robin
No No No No No No
For directly connected systems the support for round robin is as follows: Sun - yes AIX - yes, it can HPUX - no Windows maybe, it depends on the NIC software, but dont count on it.
Cisco Configuration
Set the etherchannel mode to on: Manually set the ports to participate in the channel group DDR Configuration xor-l3l4 xor-l2 Cisco Load Balance Configuration src-dst-port src-dst-mac