0% found this document useful (0 votes)

418 views19 pages

Docu48800 Detailed Link Aggregation Configuration

The document discusses link aggregation and failover techniques used to maximize throughput and network availability for systems with Data Domain appliances installed. It describes the purposes of link aggregation as evenly distributing network traffic across multiple links and continuing data transfer even if a link fails. It also discusses different link aggregation topologies, methods, and considerations for choosing the right configuration for a specific network environment.

Uploaded by

ognarf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

418 views19 pages

Docu48800 Detailed Link Aggregation Configuration

Uploaded by

ognarf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Link Aggregation and Failover

Thom Bean 9/21/2011

Contents
Introduction Terminology References Link Aggregation Types Topologies Direct Connect Private Network Local Network Remote Network Data Domain Link Aggregation and Failover Bond Functions Available in Linux Distribution Hash Methods Used Link Failures Other Link Aggregation Cisco Sun Windows AIX HPUX Data Domain Link Aggregation and Failover in the Customers Environment Normal Link Aggregation Failover of NICs Failover Associated with Link Aggregation Recommended Link Aggregation Switch Information

Introduction
This document describes the use of link aggregation and failover techniques to maximize throughput and keep an interface up on networks with Data Domain systems installed. The basic topologies are described with notes on the usefulness of different aggregation methods, so the right method can be chosen for a specific site. The goal of Link Aggregation has two purposes: 1. Evenly split the network traffic across all the links or ports that are in the aggregation group 2. Continue to transfer data over connections even though a link fails With Link aggregation, failover of a link is provided with degradation. For example, if two 1 Gb links are aggregated to obtain a throughput of 1.8 Gb/second and one link goes down then the data transfer will continue but only up to 0.9 Gb/s over the single link that is still available. With failover only one link, referred to as the active link, is used at a time. The other links in the group are idle. The maximum throughput is the speed of a single link. If the active link fails then the traffic will continue on another link without losing the connection. In

both cases, link aggregation and failover, the failure would be due to loss of carrier, but when using Link aggregation with LACP the failure can also be the loss of a heartbeat of communication. Normally the aggregation is between the local system (i.e. DDR) and a network device (e.g. switch or router) or another system (e.g. media server) that it is directly connected. The goal of purpose number 1 is to achieve the maximum throughput across all the links that are aggregated. There are a few things that can impact how well the aggregation actually performs. 1. 2. 3. 4. 5. 6. 7. Speed of the switch How much the DDR can process Network overhead Acknowledging and coalescing to recover out of order packets Aggregation method may not effectively distribute the data evenly across all the links Number of clients Number of streams (connections) per client

For impact 1, normally the switch can handle the speed of each link that is connected to it, but it may lose some packets coming from several connected ports when sending to one uplink depending on the uplink speed. Note: this implies that only one switch can be used for port aggregation coming out of a system. For most of the implementations this is true, but there are some network topologies that allow for link aggregation across multiple switches, especially in conjunction with virtual switches or virtual COM ports. Impact 2 addresses the DDR systems. DDR systems and programs processing rate is limited. As the hardware gets faster and the use of parallel processing improves, DDR systems will support a higher network throughput, but as the processing speed increases the network link speed will also increase. For example, with the current systems it makes sense to aggregate 1 GbE links but not 10 GbE links because one 10 GbE can provide enough data to saturate the processing power of some of the current and older DDR systems. As the system speed improves it will make sense to aggregate 10 GbE links, but the link speeds will also increase. Impact 3 addresses the inherent overhead of the network programs. This overhead will guarantee that the transfer speed will never reach 100% of the line speed. The throughput will always be reduced by the overhead it takes to queue and send a packet of data through the system until it is put onto the wire. There is an inherent delay separating the sending of packets on Ethernet. The overhead is expected to cause 5 - 10% reduction in line speed per link. Impact 4 deals with the case that the packets may get out of order. The network program will need to coalesce out of order packets into the original order. If the link aggregation mode allows the packets to be sent or received out of order and the protocol requires that they be put back into the original order. It may also look like a lost packet cauing recovery techniques. This added overhead will impact the throughput speed to the point where the specific mode of link aggregation that causes out of order packets should not be used. Impact 5 has the biggest impact on how effective the data is split across the network links and therefore able to get the full throughput of the multiple links. Impact 6: Can a single client drive data fast enough to fully utilize multiple aggregated links? In some older systems, either the physical or OS resources cannot drive data at multiple Gbp/s. Also, due to hashing limitations, multiple clients may be required to push data at those speeds. For example if the mac address is used for the hash then more then one client will be needed to distribute the network traffic across the aggregated interfaces. Impact 7: The number of streams, which translates to separate connections, can play a significant role in link utilization depending on the hashing that is used. For example, if the hashing is done on TCP port numbers more connections will allow better distribution of the packets. A final impact deals with the effectiveness of the aggregation method used. If two systems are connected together by direct connect cables, the use of Layer 2 (MAC) hashing would not provide any aggregation at all. All the packets would go over the same link. If Layer 3 (IP) hashing is used the same thing would result unless IP aliasing or VLAN tagging is used to have multiple IPs for one interface. In that case Layer3 hashing could be used and in fact the IP addresses could be selected to guarantee the distribution of packets across the

interface. In general the number of systems that will be communicating with the Data Domain system will be small. So the aggregation method used will need to work for a limited number of client systems. The number of links that are aggregated will depend on the switch performance, the DDR system and application performance and the link aggregation mode used. There is no absolute limit on the DDR software, except for the actual number of physical links, as long as the switch or whatever it is connected to can handle the number of ports trunked together.

Terminology
The following are terms that number of links that are aggregated will depend on the switch performance, the DDR system and application performance and the link aggregation mode used.

DDR
Data Domain appliance, a Linux system used to perform only Data Domain operations.

Bond, Bonding
This is a term used by Linux community to describe the grouping of interface together to act as one interface to the outside world. The DDR uses a subset of the bonding available from Linux.

EtherChannel
This is a term used by Cisco to define the bundling of network links as described under Ethernet Channel. With Cisco there are three ways to form an EtherChannel: manually, automatically using PAgP, and automatically using LACP. If it is done manually both sides have to be setup by the administrator. If one of the protocols is used, the specific packets with the specific protocol are sent to the other side to where the EtherChannel is setup based on the information in the packets.

Ethernet Channel
This is multiple of individual Ethernet links that is bundled into a single logical link between systems. This provides a higher throughput than a single link does. The term used by Cisco to identify this is EtherChannel. The actual throughput is dependent on the number of links bundled together, the individual link speed of the individual links and the switch or router that is actually being used. If a link within the Ethernet Channel fails the normal traffic over the failed link is sent over the remaining links within the bundle.

LACP
Link Aggregation Control Protocol (LACP) provides a dynamic network aggregation as defined in IEEE 802.3ad standard (IEEE 802.3 clause 43). This is not available in DDOS 4.9 and before. In DDOS 5.0 LACP is only available for the 1 Gb interfaces. It is not available for 10 Gb until 5.1. This is not available for Chelsio NICs.

Link Aggregation
Using multiple Ethernet network cables or ports in parallel, Link Aggregation increases the link speed beyond the limits of any one single cable or port. Link aggregation is usually limited to being connected to the same switch. Other terms used are EtherChannel (from Cisco), Trunking, Port Trunking, Port aggregation, NIC bonding, and Load balancing. There are proprietary methods that are used, but the main standard method is IEEE 802.3ad. Link aggregation can be used for a type of failover too.

Load Balancing
Aggregation methods used to try to distribute loads across all available links or ports. This term is applied to switches and routers and with Cisco it is generally a global parameter applied across all ehterchannels.

Port Aggregation Protocol (PAgP)

This is Ciscos Proprietary networking protocol providing logical aggregation of Ethernet ports. This is used in Ciscos EtherChannel. This is the older method used by Cisco. Later releases of their software use the standard LACP to provide the same type of functions. Note PAgP EtherChannels do not interoperate with LACP EtherChannels. This is not supported by DDRs.

Round Robin
In Linux round robin sends packets in sequence to each available slave that is available. This provides the best distribution across the bonded interfaces. Normally this would be the best aggregation to use, but the throughput can suffer because of packet ordering.

RSTP
Rapid Spanning Tree Protocol, IEEE 802.1W, allows a network topology with bridges to provide redundant paths. This allows for failover of network traffic among systems. This is an extension to the spanning tree protocol (STP). The two names are used inter-changeably.

TOE
TCP Offload Engine Network cards (NIC) that have the full TCP/IP stack on the card.

Trunking
Trunking is the use of multiple communication links to provide an aggregated data transfer among systems. For computers this may be referred to as port trunking to distinguish from other types of trunking such as frequencies sharing. Note: Cisco uses the term trunking to refer to VLAN tagging not link aggregation, whereas other vendors use this term in reference to link aggregation.

References:
Catalyst 4500 Series Switch Cisco IOS Software Configuration Guide (also used for the 4900 Series Switch too) Release 12.2(44)SG, available from the Cisco Documentation site. Cisco Documentation, http://www.cisco.com/univercd/home/home.htm IEEE 802.3 Standard http://standards.ieee.org/getieee802/802.3.html Also available under: http://iweb.datadomain.com/eweb/technical_library/Vendor/Cisco/ IEEE 802.3ad Standard is Clause 43 under IEEE802_3-sec3.pdf of the standards documents listed. Linux distribution documentation, http://www.kernel.org/ Linux Ethernet Bonding Driver HOWTO, http://www.cyberciti.biz/howto/question/static/linux-ethernet-bonding-driverhowto.php, http://www.cyberciti.biz/tips/linux-bond-or-team-multiple-network-interfaces-nic-into-single-interface.html Linux Ethernet Bonding Driver HOWTO: http://www.cyberciti.biz/howto/question/static/linux-ethernet-bonding-driverhowto.php, http://www.cyberciti.biz/tips/linux-bond-or-team-multiple-network-interfaces-nic-into-single-interface.html Wikipedia, http://en.wikipedia.org/wiki/Main_Page Various links on the web as noted within the document by hotlinks

Link Aggregation Types

Link aggregation needs to balance the number of packet across all the links within the aggregation group with minimum impact on the splitting, assembling, and reordering of packets. Currently IEEE 802.3ad is the accepted standard. This can be used by most systems that can support Link aggregation, but there is no one size fits all. There are other aggregation types that may work better in some situations such as round robin which is not part of the IEEE 802.3ad standard. The IEEE 802.3ad standard is contained in clause 43 of the IEEE 802.3 standard that is freely available on the web. In the IEEE standards the term clause 43 can be thought of as chapter 43. Clause 43 is part of the IEEE 802.3-2005 Section Three pdf file on the IEEE web site. A large part of the IEEE 802.3ad standard is the LACP. This is a protocol that is used to coordinate the aggregation between the two systems that are directly connected. Note: This standard does not identify how the actual link is selected to send a packet, but it does emphatically mention two things: packets within a conversation should always be kept in order and packets should not be duplicated. For the purposes of this document, conversation is the same as data traffic sent over a single connection. The aggregation process is defined internally on the sending system, but the LACP operation coordinates with the connected system for the available ports to be used in the aggregation. The LACP provides the "heartbeat" over each interface to be used and therefore can tell when an interface can no longer be used when the control packets can no longer be sent or received. It also uses the carrier to help determined when an interface can no longer be used, but the heartbeat give it a slight advantage over the normal failover. Note: The LACP protocol is not supported in releases 4.9 and before. Also LACP is not supported on 10 Gb until 5.1 release. If the IEEE standard is not used there are two other link aggregation options on the DDR. One of the two other options is round robin and the other option is the Linux bonding modules balance XOR type to provide link aggregation. As implied by the name the aggregation is done by doing a XOR function on one or more of the addresses and/or port numbers within the packet headers. This aggregation has to be setup on both sides. Even though it my be desired the aggregation used on both sides does not need to match. For example if Layer 3+4 is used on the DDR the system connected to the DDR could use Layer 2 hashing. In the case of the DDR being connected to a Cisco switch only one hash can exactly match the transmit hash on the DDR and that is Layer 2. The hashing only impacts the transmitted packets and does not impact the received packets. An important consideration is the network topology. Important things to consider with the network topology are: The equipment directly connected to the DDR o It may be the media server or another DDR o If it is a switch or a router, the make and model number should also be noted. Whether the target system is local or remote, there may be a gateway involved The DDR may be on a private network or shared with the rest of the customers network The number of target computer systems that will be connected needs to be taken into account Single DDR or multiple DDRs. Each part of this information will have an impact on the type of Link aggregation, the transmit hash and the load balancing that is used. Consider what systems will be doing the link aggregation. Normally link aggregation configuration requires coordination from both the DDR system and the switch. There is at least one network topology where a switch may not be part of the configuration, i.e. direct connect. This will need the link aggregation to be configured between the DDR and the Media Servers. If the DDR is on the local network and is communicating with many systems then using Layer 2 (MAC address) could be acceptable. If connection path goes through a router/gateway then layer 3 (IP address only) or Layer 3+4 (IP address and the port number) may be desired. The DDR link aggregation only supports links with the same speed on a single bonded interface. The MTU will be set the same on on the interfaces in one bonding. With 1 Gb the media type can be either fiber or copper, but on the 10 Gb interfaces the media types must match. Link aggregation are not supported on the interfaces that have TOE enabled. This would be the cards with CX4 interfaces and the single port optical card. The dual port 10 GbE TOE cards can have failover on the cards but does not support failover off the card.

There is also the question of when to use failover since aggregated interface handle failures. The link aggregation modes include an failover component by allowing data transfer to continue in a degraded state on less of the bonded interfaces. For example, one of the links goes down the link aggregation can recognize this and drop that link from the aggregation list and continue with one less link. The customer may feel full failover is more important than link aggregation and would rather have no degradation. Instead of aggregating over multiple links, these links can be configured in full failover mode where idle spares that carries no data would be setup until the active link fails. This way there would be no degradation of throughput if the one link fails and data is sent over the other. One or more would be kept in a standby mode until it is needed. A strong reason to use failover instead of link aggregation is the setup of parallel network paths. In this configuration there can be two switches and if a component in one path goes down the data traffic is moved to the other port and switch. Failover can fail across switches while except for special cases with virtual switches Link Aggregation must be setup with interfaces connected to the same switch. A caveat for this is the if a port fails and not the swicth then the new switch will need to be able to route the data to the target destination and the destinaion will need to send the packets to the new switch. Administration network interface is also needed with DDRs. For direct connections and one to one server connections there is a separate Ethernet interface for this, but this could also be part of the link aggregation unless there is a physical separation needed between the links.

Topologies
The basic types of network topologies are described below, along with their differing suitability for various types of aggregation methods.

Direct Connect
The Data Domain system is directly connected to one or more backup servers. To be able to provide link aggregation within this topology will require multiple links between each backup server and the Data Domain system. Usually link aggregation is not done with this topology, especially with multiple backup servers, because of the limited number of links available on the Data Domain system and one system does not drive data faster than one link especially if the link is 10 Gb. In the picture shown, there are two interfaces being aggregated (the blue lines). For backups the important hashing is done on the Media server beause the direction of the majority of data traffic is flowing from the Media Server to the DDR. In this case it may be useful to check if round robin can be used and evaluate if the performance is adequate. Otherwise consider multiple connections using IP alias.

Data Domain Network switch Backup/media server

Business Servers

Tape Library

Private Network
This topology is the same as the direct connect except the connections are through a switch rather than being directly connected. This would normally be used to connect multiple media server to multiple DDRs. The link aggregation would be between a DDR and the switch or between a media server and the switch. The aggregation would be to get the data to and from the switch. In this case the aggregation between the DDR and the switch would be independent of the aggregation used between the media server and the switch. Note: there is a possible special case where the switch would be only a pass through and would be transparent to the aggregation. That would not be the norm and is discussed in further detail later. In this case, if there is one media server, the use of multiple TCP connections and a load balanceing of src-dst-port on the switch may be the best choice.

Data Domain

Network switch

Private network switch

Backup/media server

Business Servers Tape Library

Local Network
The Data Domain system is connected to the backup server through a common switch/router which is shared by many other systems. In the previous network topologies shown the Data Domain system may have a connection through the common switch to handle administration and maintenance tasks which need not be part of the aggregation. In this example the data is also being sent through the shared network. The setup of the aggregation is the same as with the private network, but it opens the possibility of a lot more media servers being available to backup to the DDR. This may make allow load balancing of layer 2 (mac address) more feasible.

Data Domain System

Network switch

Backup/media servers Tape Library

Business Servers

Remote Network
This is similar to the local network except that connection is through a router before it gets to the media server or other DDRs in the case of replication. There will normally be switch in between the DDR and the router unless the router also provides switch functionality. What is important to note in this diagram is that there is a gateway function that is involved in the network data flow. It is important to maximize the data throughput between the DDR and the media servers, but if there is a WAN involved one 1 Gb interface is normally enough because of banwidth limitation. Normally for performance reasons the DDR will be located on the same LAN and use the same switch as the media server. There may be cases where some of the media servers may be on separate LANs. The DDR would need to go through at least one gateway to get to them. It is not expected that the media servers will go across a WAN to get to the DDR, although there are some WANs that are high speed. In which case the backup could go over the WAN. The WAN topology is likely to be the case for DDR with replication. Normally the data flow in replication is low enough where it does not need aggregation, but MANs and some WANs are getting faster to where this configuration may be needed. Otherwise the WAN would tend to make aggregation ineffective. Yet there are customers that have asked about it. One reason is that it provides redundancy (failover).

Data Domain

Network router

Tape Library Business Servers Backup/media servers

Complex bonding
Starting with the DDOS 5.1 release there is the ability of creating failover on an aggregation group. In this case the failover slaves are virtual interfaces containing aggregations. The primary use of this is to allow the use aggregated interfaces, but to also have a full switch failover. In this diagram there are two parallel paths. If the path being used is the switch on the left the failover on the data domain system till switch to using the switch on the right. In this case the carrier controls the failure and the carrier to both the interfaces to the switch on the left have to go down. Otherwise it will run over one of the interfaces on the left in a degraded mode.

Data Domain System

Aggregated interfaces Network switch Aggregated interfaces Network switch

Aggregated interfaces Business Servers

Aggregated interfaces

Tape Library Backup/media servers

It is important to realistically set the expectation for this setup. Does the Media Server support this? if not then it would only have failover across single links and without multiple Media Servers it does no make sense to have aggregated links on the DDR. Another point, if one of the aggregated links goes down, the failover will not happen rather theother link(s) will handle all the traffic in a degraded mode. If both links go down then the failover will kick in but since the switch itself did not fail the Media Server side will still use the first switch. So the packets will need to be routed between the switches. Otherwise the connections will fail. Also if LACP is being used and the carrier stays up, but the heartbeat fails on both the interfaces the failover (which is based on carrier) will not change the the second switch and communication will stop. It turns out that this is only really effective if the whole switch goes down instead of part of it.

Data Domain Link Aggregation and Failover

There are three link aggregation methods supported by Data Domain: Round Robin Balanced (setup manually on both sides) LACP (starting in 5.0 for 1 Gb and 5.1 for 10 Gb) The balanced aggregation and the lacp will need to also provide a specific transmit hash that is supported: Layer 2 Layer 3+4 or Layer 2+3 (starting in 5.0). Virtual interfaces can be created to define the aggregation or as failover: The virtual interface names start with a v as in veth0 and veth34. The number of virtual interfaces can no more than the number of physical interfaces. This is normally not a problem except in testing because each virtual interface will usually have two or more vitual interfaces associated with it. Any of the physical links that are available on the system can be included: eth0, eth1, eth2, eth3, etc. using the legacy port naming or eth0a, eth0b, etc. using the slot based port naming. To specify aggregation of eth2 and eth3 (or eth4a eth4b for later releases with slot based naming) in the virtual interface veth0 one of the following commands would be used: net aggregate add veth0 mode roundrobin interfaces eth2 eth3 The first network transmit packet given to veth0 will be forwarded to one of the interfaces and the next packet would be forwarded to the other interface. Sending of packets will continue to alternate between the interfaces until there are no more packets or a link fails. If eth3 loses physical connection all packets are sent through eth2 until the eth3 link is brought back up. Round robin should be considered for the remote side too. For direct connect (the only topology that is recommended for round robin) the media server will have to be able to setup and support round robin. Note that even though round robin provides the best distribution of packets it does not necessarily provide the best utilization of the lines. For example suppoe the every other transmitted packet is an ACK packet and there are two interfaces. Than the small ACK packets will go over one interface and all the large data packets will go over the other interface. This normally will not happen, but it illustrates the potential problem. The commands used for adding balanced mode aggregation it changes at release 5.0 For DDOS version before 5.0 with slot based port naming: net aggregate add veth0 mode xor-L2 interfaces eth4a eth4b For DDOS version 5.0 and later with slot based port naming: net aggregate add veth0 mode balanced hash xor-L2 interfaces eth4a eth4b For this command the aggregation used is balanced-xor. The send packets are distributed across eth2 and eth3 (or eth4a eth4b for later releases with slot based naming) based on the XOR of the source and destination MAC addresses. Because there are only 2 links to be aggregated the lowest bit is used to determine the interface to use for the packet. If the result is 0 one interface will be chosen. If the result is 1 the other interface will be used. To get the packets to be spread across the two links requires that data is sent to more than one system and the MAC addresses of the destination and/or source needs to be different in such a way that XOR results provide a different number. This means that one address needs to be odd and the other needs to be even. If there are three links that are aggregated, the XOR result is split 3 ways. Starting with 2 bonded interfaces there has to be at least two media servers and there the mac addresses must be different enough to cause the packets to spread across the interfaces. In general, this aggregation should not be used with less than 4 media servers. For DDOS version before 5.0 with lagecy port naming: net aggregate add veth0 mode xor-L3L4 interfaces eth2 eth3

For DDOS version 5.0 and later with slot based port naming: net aggregate add veth0 mode balanced hash xor-L3L4 interfaces eth4a eth4b The aggregation used with this command will also be balanced-xor. The packets are distributed across eth2 and eth3 (or eth4a eth4b for later releases with slot based naming) based on the XOR of the source IP address, destination IP address, source port number, and the destination port number. The result gives a number in which the lowest bits are used to determine which link to use to send the packet. For this example an even result will go over one and an odd result will go over the other. With three links the result is divided by 3 with the remainder determining which interface to use. This aggregation would be used when there are a lot of connections (there is one connection per stream) or a lot of media servers or both. This is the mode of choice for Data Domain, but some switches do not support this type of hashing. For DDOS version 5.0 and later with slot based port naming: net aggregate add veth0 mode lacp hash xor-L3L4 interfaces eth4a eth4b The aggregation used with this command will also be lacp-xor. The packets are distributed across eth2 and eth3 based (or eth4a eth4b for later releases with slot based naming) on the XOR of the source IP address, destination IP address, source port number, and the destination port number. The data flow control follows the same mechanism used by balanced mode except it adds a control protocol to monitor the interfaces with a minimal amount automated administration of the interfaces including better sensing when a interface failure. The sensing goes beyond the sensing of carrier loss to the sensing of the ability to send and receive data. The heartbeat can be sent out every second or every 30 seconds. The default is every 30 seconds. The interval determines how fast the bonding will sense that the link is no longer communicating and will stop using the interface. Once every 30 seconds is less invasive, but it will take longer to make the link as down and there may be connection timeouts in while it is waiting. Net failover add veth0 interfaces eth2 eth3 This is not aggregation but the command will group together interfaces eth2 and eth3 (or eth4a eth4b for later releases with slot based naming) for failover. There is only one failover type supported. If the active physical link goes away (determined by loss of carrier) the data is sent to the second physical link. The active interface is determined by which link comes up first when it is setup. This is nondeterministic. It is dependent on several factors such as switch activity, network activity, and which interface is brought up first when they are enabled. The active one can be determined by specifying one of the links as primary. The primary interface will always be set as active if it can be UP and RUNNING. A down time and up time can also be specified. The time is set to the multiple of 0.9 seconds that is nearest but not greater than the value specified. For example, if 1.5 second is specified the actual value used is 0.9 seconds. For the command the actual value used is given in milliseconds so 1.5 seconds would be 1500 and the value used for this is 900. The up and down timer values are also used in the net aggregate commands too. The new hash of L2L3 which was released in DDOS 5.0 is the use of a combination of the source and destination mac and IP addresses to determine the interface to send the packets. .This gives more flexibility to in getting the data to be better aggregated.

Functions available in Linux distribution

The following is a summary of the aggregation and failover modes and hashing used in Linux. If the client/Media Server is a Linux system then these are what you will encounter when setting up the clietn. A more complete description can be found in Documentation/networking/bonding.txt in the Linux distribution:

Mode Options
1. balance-rr or BOND_MODE_ROUNDROBIN (0) - known as the roundrobin mode on the DDR Aggregation using Round Robin Failover with degradation Normally a good type to use with direct connect or something equivalent To get full matching aggregation both ends of the link needs to be set up to use round robin 2. active-backup or BOND_MODE_ACTIVEBACKUP (1) - known as failover on the DDR

Failover method used by Data Domain Works only when one or more standby links are in the group There is one active and all others in the group are stanby The active link is non-deterministic unless a primary is specified 3. balance-xor or BOND_MODE_XOR (2) - known as the balanced mode on the DDR Send transmit to a specific NIC based on specified hash method being used Default (Source MAC address XOR Destination MAC address) modulo size of aggregation group Note: this only aggregates transmissions. The receive needs to be aggregated on the other end This mode is referred to as static because of the manual setup that is needed. 4. 802.3ad or BOND_MODE_LACP (4) - know as the lacp mode on the DDR Send transmit to a specific NIC based on specified hash method being used Default (Source MAC address XOR Destination MAC address) modulo size of aggregation group Note: this aggregates transmissions and actively checks the aggregated interfaces. The receive needs to be aggregated on the other end This configuration js initially set manually, but is maintained automatically.

Hash method used:

1. Layer 2 Uses (source mac XOR destination mac) modulo count of links in aggregation group This works best when there are many hosts and they are connected to the same switch All packets to a specific MAC address goes through the same link 2. Layer 2+3 Uses ((source mac XOR dest mac) AND 0xffff) XOR ((source IP XOR dest IP) AND 0xffff) modulo count of links in aggregation group This works best with many IP addresses and/or many media servers This can work with as little as one media server if multiple addresses are used This hash works best with multiple connections with different IP addresses. If the number of clients are limited then IP aliases or VLANs can be used to allow multiple addresses to generate multiple connections over the same two interfaces. 3. Layer 3+4 Uses ((source port XOR dest port) XOR ((source IP XOR dest IP) AND 0xffff) modulo count of links in aggregation group This works best with many connections and/or many media servers This can work with as little as one media server For packets that do not include the port number such as IP fragmentation packets and non-TCP and non-UDP packets this method will use the IP address only. For non-IP packets the Layer 2 mode is used. It is because of these exceptions that this is not IEEE compliant. Note that the Data Domain network configuration is set up so that packets are not fragmented and almost never uses UDP. The aggregation method used is very important to getting the desired performance. In general the aggregation of choice is mode lacp hash xor-L3L4 (src-dst-port for cisco load balance), if the DDOS version supports it or mode xor-L3L4 otherwise, along with many streams. The desire to use lacp is enhanced by the improved failover ability. If the DDR and media servers are directly connected and there are enough links to do aggregation then mode roundrobin may work best. There are some switches that do not support port number hashing. In this case src-dst-port on the switch will not work. Consider also that the best aggregation may be to have each media server use a different link instead of grouping them together. Consider the following example: four media servers each media server is sending data at the same time there are 4 links available on the DDR,

Assign a different IP address to each link and setup up each media server to send data to one unique IP address on the media server. That way the throughput will approach 4 times a single link speed verses around 2.5 times if aggregation is used. This is very dependent on the expected traffic pattern from the media servers.

Link failures
A link can fail at several places. It can occur in the driver, the wire, the switch, the router, or the remote system. For failover to work the program (this is the bonding module in the Data Domain case) must be able to determine that a link to the other side is down. This information is normally provided by the hardware driver. For a simple case consider a direct connect were the wire is disconnected. The driver can sense that the carrier is down and will report this back to the bonding module. The bonding module will mark it as down after the "down" time has expired and switch to a different link. The bonding module will continue to monitor the link and when it comes back up for the "up" time it will mark it as up. If the restored link is marked as the primary the data will be switched back to using that link again. Otherwise the data flow will stay on the current link. Note: the failover method that is currently supported is for directly attached hardware. The driver can sense when the directly attached link is no longer functioning, but beyond that it gets a little harder. Consider the case that there is a switch and the failover is between two different switches with no routing. Can the driver determine that the connection to the remote system has failed and therefore it needs to switch to the backup link going through the other switch? This is possible if the switch provides a link fault signaling similar to what is defined in IEEE 802.3ae. This is supported by the Fujitsu 10GbE switch and a similar thing is supported by Cisco. This is rather limited network topology where the systems are directly connected via switches and there are no other routes available. This would be an extension of the direct connect to the media server. Currently the driver and the bonding module does not support the link fault signaling because it is not widely available too limited of a network topology There are two types of failover. One is failover to a standby interface. The standby interface is not being used until a failure happens and the traffic is redirected to the standby link. This is a waste of resources if there is never a failover. This is the method used by Data Domain when the bonding method failover is specified: net add failover veth1 interfaces eth3 eth5 Another type of failover is failover with degradation. In this method there is no standby. All the links in the group are being used. If there is a failure the failed link is removed and the rest of the network traffic from that link is redirected to the other links in the group. This is the failover associated with link aggregation, but it can become complex if the bonding driver has to determine if a path to the target system no longer exists and it will to not send data to that link. There is also the question as to what bad is the failure. Maybe the carrier stays up but the data still fails to get transferred or the data has too many CRC or checksum errors to be effective. The lacp mode (available in DDOS5.0 for 1 Gb and DDOS 5.1 for 10 Gb) is the only mode that can determine this and mark the interface as down. Note that it is not up to the DDR to detect and adjust the network flow for a failure beyond the switch or router. That is a network/switch/router problem to come up with an alternative path. The DDR only senses the network between its local interface and the switch or system it is connected to.

Data Domain Link Aggregation and Failover in the Customers Environment

The history of the use of failover and link aggregation in the Data Domain products is as follows: 1. Failover was added in release 4.3 2. Link aggregation was added in release 4.4 3. Source routing was set as the default in release 4.5 to allow separate NICs to reside on the same network and the response packets would get sent to the correct route. This means the following settings are the default: net option set net.ipv4.route.src_ip_routing 1 net option set net.ipv4.route.flush 2 4. Since July 2008 a primary can be specified for failover. If the primary is up it will always be the active link. 5. Since July 2008 release for 4.4.4, 4.5.2, and beyond the on-board NICs can be included in link aggregation. 6. Since release 5.0 the 1 Gb interfaces can be added to lacp bonding mode along with rate setting 7. Since release 5.0 there is a new command formats for net failover and net aggregate 8. Since release 5.0 the hash value of xor-L2L3 can be used 7. Since release 5.0 the user can specify the down and up time for sensing failover and the recovery of a failed interface 8. Since release 5.1 the 10 Gb interfaces can be added to lacp bonding mode 9.Since release 5.1 aggregated virtual interfaces can be added as slave to failover

Special Link Aggregation

Normally when the DDR has aggregated links they are connected to one switch. There is a case when the aggregated links are go to different switches. This is done using Cisco's virtual switch technology. Normally the

aggregation used is lacp and the switches are interconnected to share information. Cisco uses this to provide good data flow along with a failover ability. The packet routing is dependent on the best path, the least used and the concern with packet ordering. If one of the switches goes down or becomes unavailable the oackets are directed to the other. Another case is where the switches operate at a physical level where data is passed directly through to the remote system. It acts like a direct connect even though the systems can go over a WAN. The following diagram illustrates this. eth2 en5

en4 Data Domain Appliance eth3 Network switch B Backup/media servers

In this case the aggregation is passed through ot the remote system and it prevides an increase throughput but also failover. Normally lacp is used in this case to provide a heartbeat falover. The challange with this is which transmit hash to is used. Multiple connections are used along with multiple IP addresses either from VLAN or IP alias. If one of the WAN links becomes unavailable both sides becomes aware of this and sends everything over the other link. The main concern here is handling the latency of the WAN using the heartbeat.

Failover of NICs
With the special aggregation cases the dependency on failover is reduced. Failover is still simpler to setup and easier to maintain, but as networks get more comples the use of lacp offers better automation of handling the conditions.

8.6 12.2SXF 12.2(44)SE 12.2(44)SE 12.2(44)SE 12.2(37)SG

Yes Yes Yes Yes Yes Yes

Yes Yes No No No Yes

No No No No No No

For directly connected systems the support for round robin is as follows: Sun - yes AIX - yes, it can HPUX - no Windows maybe, it depends on the NIC software, but dont count on it.

Cisco Configuration
Set the etherchannel mode to on: Manually set the ports to participate in the channel group DDR Configuration xor-l3l4 xor-l2 Cisco Load Balance Configuration src-dst-port src-dst-mac

CMPE472 Quiz#1
100% (1)
CMPE472 Quiz#1
52 pages
IPDirector Userman IPEdit
No ratings yet
IPDirector Userman IPEdit
270 pages
Software-Defined Networks: A Systems Approach
From Everand
Software-Defined Networks: A Systems Approach
Larry Peterson
5/5 (1)
EVA Disk Matrix
No ratings yet
EVA Disk Matrix
40 pages
Edc15p Recovey With Mpps
No ratings yet
Edc15p Recovey With Mpps
5 pages
How Link Aggregation Works: Questions
No ratings yet
How Link Aggregation Works: Questions
4 pages
White Paper - Link Aggregation
No ratings yet
White Paper - Link Aggregation
22 pages
Link Aggregation
No ratings yet
Link Aggregation
5 pages
Link Aggregation Groups
No ratings yet
Link Aggregation Groups
3 pages
Introduction to Internet & Web Technology: Internet & Web Technology
From Everand
Introduction to Internet & Web Technology: Internet & Web Technology
Dr. Yashpal singh
No ratings yet
Chapter No 3 (RS)
No ratings yet
Chapter No 3 (RS)
77 pages
Link Aggregation Link Aggregation Control Protocol
No ratings yet
Link Aggregation Link Aggregation Control Protocol
2 pages
Junos Enterprise Switching: Chapter 7: High Availability
No ratings yet
Junos Enterprise Switching: Chapter 7: High Availability
57 pages
Routing in Wireless Mesh Networks
From Everand
Routing in Wireless Mesh Networks
Raghav Kumar
No ratings yet
Link Aggregation Lecture 1
No ratings yet
Link Aggregation Lecture 1
24 pages
CCNA Interview Questions You'll Most Likely Be Asked
From Everand
CCNA Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Link Aggregation Control Protocol On Software Defined Network
No ratings yet
Link Aggregation Control Protocol On Software Defined Network
7 pages
11.1 EtherChannel
No ratings yet
11.1 EtherChannel
3 pages
Networking Technologies For Continuity: 5.1 Local Area Networks
No ratings yet
Networking Technologies For Continuity: 5.1 Local Area Networks
7 pages
Towards An Open Data Center With An Interoperable Network (ODIN) Volume 2: ECMP Layer 3 Networks
No ratings yet
Towards An Open Data Center With An Interoperable Network (ODIN) Volume 2: ECMP Layer 3 Networks
7 pages
Link Aggregation Control Protocol in Detail
No ratings yet
Link Aggregation Control Protocol in Detail
5 pages
12-Link Aggregation Configuration PDF
No ratings yet
12-Link Aggregation Configuration PDF
5 pages
Lab Exer 8
No ratings yet
Lab Exer 8
8 pages
T 17A Local Area Network: Chapter - 9
No ratings yet
T 17A Local Area Network: Chapter - 9
4 pages
Link Aggregation: "IEEE 802.3ad" Redirects Here. It Is Not To Be Confused With
No ratings yet
Link Aggregation: "IEEE 802.3ad" Redirects Here. It Is Not To Be Confused With
6 pages
Application Layer: Cisco Documentation CD and Training Books
No ratings yet
Application Layer: Cisco Documentation CD and Training Books
21 pages
CCNA3 Study Guide
100% (1)
CCNA3 Study Guide
42 pages
Port Mirroring and Link Aggregation
No ratings yet
Port Mirroring and Link Aggregation
34 pages
25 PDF
No ratings yet
25 PDF
14 pages
HighAvailability Slide
No ratings yet
HighAvailability Slide
24 pages
Introduction of Data Communication
No ratings yet
Introduction of Data Communication
19 pages
I: Subject:: 24380 - Ethernet/IP, Switches, and Multicast Frames
No ratings yet
I: Subject:: 24380 - Ethernet/IP, Switches, and Multicast Frames
10 pages
Netact For Cisco Application Note
No ratings yet
Netact For Cisco Application Note
6 pages
Week 6 Ethernet Switching
No ratings yet
Week 6 Ethernet Switching
20 pages
Spanning Tree Protocol Essentials: Definitive Reference for Developers and Engineers
From Everand
Spanning Tree Protocol Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
LEARN MPLS FROM SCRATCH PART-A: A Beginner's Guide to Next Level of Networking
From Everand
LEARN MPLS FROM SCRATCH PART-A: A Beginner's Guide to Next Level of Networking
POONAM DEVI
No ratings yet
Switch Feature
No ratings yet
Switch Feature
15 pages
First Hop Redundancy Protocol: Network Redundancy Protocol
From Everand
First Hop Redundancy Protocol: Network Redundancy Protocol
Mulayam Singh
No ratings yet
HCIA-Routing & Switching V2.5 Intermediate Training Materials
No ratings yet
HCIA-Routing & Switching V2.5 Intermediate Training Materials
310 pages
Book Report PDF
No ratings yet
Book Report PDF
15 pages
UNIT-3-1
No ratings yet
UNIT-3-1
39 pages
HNE211 Demo Questions
No ratings yet
HNE211 Demo Questions
4 pages
Rapid Spanning Tree Protocol for Modern Networks: Definitive Reference for Developers and Engineers
From Everand
Rapid Spanning Tree Protocol for Modern Networks: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
L2TP Protocol Implementation and Configuration: Definitive Reference for Developers and Engineers
From Everand
L2TP Protocol Implementation and Configuration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Ce Ieee Link BNDL Xe
No ratings yet
Ce Ieee Link BNDL Xe
32 pages
Exercise 1 LAN Design
No ratings yet
Exercise 1 LAN Design
5 pages
Mastering Segment Routing: A Comprehensive Guide to Network Traffic Optimization
From Everand
Mastering Segment Routing: A Comprehensive Guide to Network Traffic Optimization
Robert Johnson
No ratings yet
OSI Reference: CCNA Study Notes
No ratings yet
OSI Reference: CCNA Study Notes
29 pages
CCNA1 Mod 6
No ratings yet
CCNA1 Mod 6
28 pages
WP Vlans and Trunks
No ratings yet
WP Vlans and Trunks
21 pages
Networking Programming with C++: Build Efficient Communication Systems
From Everand
Networking Programming with C++: Build Efficient Communication Systems
Robert Johnson
No ratings yet
Aggregation of Channels
No ratings yet
Aggregation of Channels
28 pages
EMC Data Domain Boost and Dynamic Interface Groups
No ratings yet
EMC Data Domain Boost and Dynamic Interface Groups
9 pages
Link - Aggregation 2016 Sep
No ratings yet
Link - Aggregation 2016 Sep
35 pages
Data Communications and Networking 3: Dashboard MIS6314 Week 4: Link Aggregation Short Quiz 003
No ratings yet
Data Communications and Networking 3: Dashboard MIS6314 Week 4: Link Aggregation Short Quiz 003
5 pages
LAN and Ethernet Basics - Deepak George: OSI Network Layers
No ratings yet
LAN and Ethernet Basics - Deepak George: OSI Network Layers
12 pages
Computer Networking Bootcamp: Routing, Switching And Troubleshooting
From Everand
Computer Networking Bootcamp: Routing, Switching And Troubleshooting
Rob Botwright
No ratings yet
Data Communications and Computer Networks A Business Users Approach 8th Edition White Solutions Manual instant download
100% (3)
Data Communications and Computer Networks A Business Users Approach 8th Edition White Solutions Manual instant download
48 pages
Capacity Planning Analyses of Large Data Networks
No ratings yet
Capacity Planning Analyses of Large Data Networks
6 pages
Ethernet For Real Time Embedded Systems White Paper PDF
No ratings yet
Ethernet For Real Time Embedded Systems White Paper PDF
5 pages
Introduction To LAN Protocols
100% (2)
Introduction To LAN Protocols
20 pages
Lan Project
100% (5)
Lan Project
22 pages
Embedded Ethernet and Internet Complete
From Everand
Embedded Ethernet and Internet Complete
Jan Axelson
4/5 (1)
HCS512 Ip
No ratings yet
HCS512 Ip
26 pages
K L Microcontroller-Based Code Hopping Encoder: EE OQ ®
No ratings yet
K L Microcontroller-Based Code Hopping Encoder: EE OQ ®
12 pages
Modular Mid-Range Picmicro K L Decoder in C: Ee Oq
No ratings yet
Modular Mid-Range Picmicro K L Decoder in C: Ee Oq
32 pages
K L Decryption Routines in C: EE OQ
No ratings yet
K L Decryption Routines in C: EE OQ
12 pages
Using The K L Encryption Algorithm: EE OQ
No ratings yet
Using The K L Encryption Algorithm: EE OQ
10 pages
All CLARiiON Disk and FLARE OE Matrices
No ratings yet
All CLARiiON Disk and FLARE OE Matrices
168 pages
C167CS
No ratings yet
C167CS
17 pages
Translated Funktionsrahmen Modules (10!01!2012)
No ratings yet
Translated Funktionsrahmen Modules (10!01!2012)
84 pages
Centera Monitoring and Reporting in CentraStar 3.1-1
No ratings yet
Centera Monitoring and Reporting in CentraStar 3.1-1
92 pages
L To Migration To Vtls
No ratings yet
L To Migration To Vtls
2 pages
FOS Password Recovery Notes
No ratings yet
FOS Password Recovery Notes
24 pages
DD OS 5.2 Initial Configuration Guide
No ratings yet
DD OS 5.2 Initial Configuration Guide
58 pages
Ps3q05 20050163 Brundridge OE
No ratings yet
Ps3q05 20050163 Brundridge OE
7 pages
EX2441-03 - FG300C - Rework - Instruction PDF
No ratings yet
EX2441-03 - FG300C - Rework - Instruction PDF
5 pages
Storage Diagnostics and Troubleshooting Guide
100% (1)
Storage Diagnostics and Troubleshooting Guide
295 pages
HD 13 Numerical Integration of MDOF 2008
No ratings yet
HD 13 Numerical Integration of MDOF 2008
18 pages
McNally Business Services Limited Case Study
No ratings yet
McNally Business Services Limited Case Study
6 pages
ND1 Engine Control Unit (ECU) - 4
No ratings yet
ND1 Engine Control Unit (ECU) - 4
7 pages
EE-308 Microprocessor Based System Design
No ratings yet
EE-308 Microprocessor Based System Design
1 page
DIAX04 Troubleshooting Guide Compressed
No ratings yet
DIAX04 Troubleshooting Guide Compressed
124 pages
STB AmiNet Configuration Manual
No ratings yet
STB AmiNet Configuration Manual
13 pages
MPC Software - User Guide - v2.14
No ratings yet
MPC Software - User Guide - v2.14
347 pages
Spinchillercontroller
No ratings yet
Spinchillercontroller
59 pages
Empowerment 1
0% (1)
Empowerment 1
4 pages
Bihar Daroga Set-55
No ratings yet
Bihar Daroga Set-55
6 pages
Event Driven Programming
No ratings yet
Event Driven Programming
4 pages
Password Attacks Manual
No ratings yet
Password Attacks Manual
3 pages
Opencv Essentials: Chapter No.4 "What'S in The Image? Segmentation"
No ratings yet
Opencv Essentials: Chapter No.4 "What'S in The Image? Segmentation"
31 pages
UX UI Design-1
No ratings yet
UX UI Design-1
25 pages
Embedded Systems
No ratings yet
Embedded Systems
57 pages
What's New For Smart Start 8.70
No ratings yet
What's New For Smart Start 8.70
24 pages
Structured Query Languages SQL
No ratings yet
Structured Query Languages SQL
170 pages
MQ Install Instructions Autocad
No ratings yet
MQ Install Instructions Autocad
2 pages
Thesis Service Oriented Architecture
100% (3)
Thesis Service Oriented Architecture
6 pages
Isatis Case Studies Mining
100% (1)
Isatis Case Studies Mining
292 pages
DL1 and Cobol
No ratings yet
DL1 and Cobol
6 pages
Software Development Plan Template - Its 332: Faculty of Computer Science and Mathematics
No ratings yet
Software Development Plan Template - Its 332: Faculty of Computer Science and Mathematics
1 page
PMP Crossword
No ratings yet
PMP Crossword
7 pages
PSCAD Tutorial
100% (2)
PSCAD Tutorial
42 pages
Markov Decision Process
No ratings yet
Markov Decision Process
8 pages
PL-600 28jan
No ratings yet
PL-600 28jan
20 pages
MPL Labmanual
No ratings yet
MPL Labmanual
76 pages
Configuration A Profibus-DP Node Using Step7 and WAGO-I/O Components
No ratings yet
Configuration A Profibus-DP Node Using Step7 and WAGO-I/O Components
18 pages
Chapter 1
No ratings yet
Chapter 1
8 pages

Docu48800 Detailed Link Aggregation Configuration

Uploaded by

Docu48800 Detailed Link Aggregation Configuration

Uploaded by

Link Aggregation and Failover

Thom Bean 9/21/2011

Port Aggregation Protocol (PAgP)

Link Aggregation Types

Data Domain Network switch Backup/media server

Private network switch

Business Servers Tape Library

Data Domain System

Backup/media servers Tape Library

Tape Library Business Servers Backup/media servers

Data Domain System

Aggregated interfaces Network switch Aggregated interfaces Network switch

Aggregated interfaces Business Servers

Tape Library Backup/media servers

Data Domain Link Aggregation and Failover

Functions available in Linux distribution

Hash method used:

Other Link Aggregation

Data Domain Link Aggregation and Failover in the Customers Environment

Special Link Aggregation

en4 Data Domain Appliance eth3 Network switch B Backup/media servers

Recommended Link Aggregation

8.6 12.2SXF 12.2(44)SE 12.2(44)SE 12.2(44)SE 12.2(37)SG

Yes Yes Yes Yes Yes Yes

Yes Yes Yes Yes Yes Yes

Yes Yes Yes Yes Yes Yes

Yes Yes Yes Yes Yes Yes

Yes Yes Yes Yes Yes Yes

Yes Yes Yes Yes Yes Yes

Yes Yes No No No Yes

Yes Yes No No No Yes

Yes Yes No No No Yes

You might also like