BFD

BFD
(Bidirectional Forwarding Detection)
Does it work and is it worth it?
Tom Scholl, AT&T Labs

NANOG 45
What is BFD?
• BFD provides a method to validate the operation of the

forwarding plane between two routers.
• Upon detecting a failure, triggers an action in a routing
protocol (severing a session or adjacency).
• Operates in two modes:
• Asynchronous
• Demand
• In either mode, BFD provides an Echo function in which one
side can request its neighbor to loop back a series of packets.
2
Why would an operator use this?
• BFD can rapidly propagate awareness of forwarding plane

failures up to routing/signaling protocols.
• Relying solely upon hellos, KEEPALIVEs, etc. to validate
forwarding behaviors can be a bad idea.
• Routing/signaling protocols tend to be treated differently than forwarded
traffic.
• Most routing/signaling protocol implementations are not designed to
operate with sub-second keepalive intervals.
• Often, BFD runs on the line card, not the route processor, so it
is unaffected by RP CPU utilization.
3
Understanding the layers
4
Router architectures and BFD
• An example of BFD in a distributed router architecture
IGP SNMP
Route Processor
BGP Telnet/SSH BFD Master
BFD Agent
Linecard
FIB Downloader
BFD Agent
Linecard
FIB Downloader
BFD Agent
Linecard
FIB Downloader
5
What protocols does BFD work with?
• Static routes
• IGPs (OSPF, IS-IS)
• BGP (eBGP, iBGP)
• LDP
• RSVP
6
Static Routes
• Static routes only use next-hop reachability information to

determine whether they are valid.
• BFD provides a nice alternative to validate the forwarding path
and provide liveliness detection for the actual next-hop.
7
IGP
• Some mechanisms exist within the IGP to determine

a failure rapidly (even at sub-second intervals).
• These capabilities (“fast hellos”) only work by verifying the
IGP keepalive mechanisms.
• IGP protocols generally are punted to the route-processor
in a distributed system, often bypassing standard packet
forwarding.
• Because IGPs generally run on the route-processor, heavy
CPU usage can cause IGP adjacencies to fail.
• BFD can help by severing an IGP adjacency in the
event of forwarding path failure.
8
BGP
• Like IGPs, BGP has its own keepalive mechanism.

• BGP tears down a session when it has not received a
KEEPALIVE message from its neighbor before the hold
timer expires.
• BGP is generally executed on route-processors, just like an
IGP, so high RP CPU utilization can also cause BGP
failure.
• BFD can shutdown the BGP session in under a
second after a forwarding path failure.
9
iBGP
• BFD can be enabled on an iBGP session between

router loopbacks to verify forwarding path.
• Can be an alternative to reliance upon the IGP to
notify you of a router going offline.
• No longer need to rely upon event-driven or periodic next-
hop scanning.
• Can improve iBGP convergence by rapidly detecting BGP
neighbor failure.
10
eBGP
• BGP timers aren’t great for fast failure detection.

• BFD is great for situations where:
• You and your neighbor have an L2 device in the middle. (like Internet
Exchange LANs or MPLS transport)
• Transport between neighbors lacks reliable link state notification.
(wavelengths)
• BFD allows for ranges to be specified for minimum detection
thresholds.
• Neighbors may have various timers due to their own limitations or
preferences.
• Timers are continuously negotiated and can be altered at any time.
11
MPLS LDP
• BFD can be enabled to provide OAM on a particular LDP FEC.

• The LSP is bootstrapped with LSP-Ping and BFD can be operated at a
variety of intervals.
• This is useful for informational purposes as LDP really doesn’t

have a mechanism to select an alternate path (it sticks with
what the IGP tells it).
• One benefit is the ability for LDP to “fork” across ECMP paths
in a network, providing validation across the ECMP tree.
12
MPLS LDP and ECMPs (cont’d)
R2 R4
BFD Session #1 to R6
R1 R6
BFD Session #2 to R6
R3 R5
13
MPLS RSVP
• BFD can be used with RSVP to provide liveliness detection on

a path built by RSVP-TE.
• Upon BFD declaring a failure on a particular RSVP-TE path,

the head-end router (the router initiating the BFD session) can
trigger the use of secondary paths.
• This provides an operator with a nice method to verify multiple

forwarding paths as well as provide an automated method to
select an alternate path.
14
MPLS RSVP – Point-to-Multipoint LSPs
• BFD can be used to operate within the Point-to-Multipoint

environment to support BFD for each downstream router PE.
• P2MP LSPs are very popular for providing linear broadcast of
media, typically with the requirement of rapid-convergence
(FRR), bandwidth-reservation and explicit routing (SRLG-free
paths).
15
Pseudowires
• BFD can be used with a pseudowires VCCV (Virtual Circuit

Connectivity Verification) control channel.
• This provides a rapid method to detect faults between the
endpoints of a pseudowire.
• The fault information could then be translated to other
protocols native OAM capabilities (ATM, FR, Ethernet).
16
What are the caveats?
• Two main ones:
1. BFD can have high resource demands depending on your

scale.
2. BFD is not visible to Layer 2 bundling protocols. (Ethernet

LAGs or POS bundles)
17
BFD Resource Demands
• The number of BFD sessions on each linecard or router can

impact how well BFD scales for you.
• Each unique platform has its own limits.
• Bundled interfaces supporting min tx/rx of 250ms or 2 seconds have
been seen.
• In some cases, BFD instances on a router may need to be operated on
the route-processor depending on the implementation (non-adjacency
based BFD sessions).
• Test your platform first before deploying BFD. Attempt to put
load on the RP or LC CPU with your configured settings. This
can be done by:
• Executing CPU-heavy commands
• Flooding packets to TTL expire on the destination
18
BFD Resource Demands (cont’d)
• What values are safe to try?

• Based upon speaking to several operators, 300ms with a
multiplier of 3 (900ms detection) appears to be a safe value
that works on most equipment fairly well.
• This is a significant improvement over some of the
alternatives.
19
BFD and L2 link-bundling
• BFD is unaware of underlying L2 link bundle members.
• A 4x10GigE L2 bundle (802.3ad) would appear as a single L3

adjacency. BFD packets would be transmitted on a single
member link, rather than out all 4 links.
• A failure of the link with BFD on it would result in the entire L3

adjacency failing.
• However, in some scenarios the failed member link may result in only a
single BFD packet being dropped. Subsequent packets may route over
working member links.
20
BFD and L2 link-bundling (cont’d)
L3 Interface
L3 Interface
21
BFD and L2 link-bundling (cont’d)
• This can be a showstopper because it’s generally preferable to

build L2 bundles rather than to use L3 ECMP, to avoid
platform-specific scaling issues and polluting your IGP.
• Having BFD “fork” across each individual link would be great,

but it would have its own scaling impact. Each individual
member link would have to have a separate BFD session. No
vendor currently supports this mode of operation, nor is there
a published draft describing it.
22
Conclusion
• Routers do still have faults in the forwarding plane where IGP
and other control-plane protocols continue to work.
• These events do happen and result in major outages (you’ve seen
some in the press in 2008…)
• The default hello/keepalive intervals of some protocols (BGP,
IGP, RSVP) are still too high to be optimal for failure detection.
• There needs to be a way to support L2 link bundling as
networks continue to grow links (we don’t have 100GE yet, so
scaling Nx10G and Nx40G is going to be important).
• Always remember to stress-test your configurations to make
sure that you and your equipment is comfortable with what
you’ve selected.
23
Send questions, comments, complaints to:
Tom Scholl, AT&T Labs

[email protected]

BFD - Is It Worth It and Does It Work in Production Networks

Uploaded by

Copyright:

Available Formats

BFD - Is It Worth It and Does It Work in Production Networks

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BFD - Is It Worth It and Does It Work in Production Networks

Uploaded by

Copyright:

Available Formats

(Bidirectional Forwarding Detection)

Does it work and is it worth it?

Tom Scholl, AT&T Labs

• BFD provides a method to validate the operation of the

• BFD can rapidly propagate awareness of forwarding plane

• An example of BFD in a distributed router architecture

• Static routes only use next-hop reachability information to

• Some mechanisms exist within the IGP to determine

• Like IGPs, BGP has its own keepalive mechanism.

• BFD can be enabled on an iBGP session between

• BGP timers aren’t great for fast failure detection.

• BFD can be enabled to provide OAM on a particular LDP FEC.

• This is useful for informational purposes as LDP really doesn’t

• BFD can be used with RSVP to provide liveliness detection on

• Upon BFD declaring a failure on a particular RSVP-TE path,

• This provides an operator with a nice method to verify multiple

• BFD can be used to operate within the Point-to-Multipoint

• BFD can be used with a pseudowires VCCV (Virtual Circuit

• Two main ones:

1. BFD can have high resource demands depending on your

2. BFD is not visible to Layer 2 bundling protocols. (Ethernet

• The number of BFD sessions on each linecard or router can

• What values are safe to try?

• BFD is unaware of underlying L2 link bundle members.

• A 4x10GigE L2 bundle (802.3ad) would appear as a single L3

• A failure of the link with BFD on it would result in the entire L3

• This can be a showstopper because it’s generally preferable to

• Having BFD “fork” across each individual link would be great,

Tom Scholl, AT&T Labs

You might also like