Routed Fast Convergence and High Availability: L3 Design and Architecture BRKRST-3363

Download as pdf or txt
Download as pdf or txt
You are on page 1of 95

Octubre 5-8, 2009 Santiago, Chile

Routed Fast Convergence and


High Availability
L3 Design and Architecture
BRKRST-3363

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 1
Abstract
• As IP networks carry greater varieties of real time
traffic, the convergence speed and availability of the
network becomes more critical. Many networks carrying
voice and other real time data must converge in less
than 3 seconds to effectively carry traffic, and
convergence times under one second are highly
desirable or required in some situations. This session
discusses various mechanisms network engineers can
use to improve their network's convergence time and
availability, including nonstop forwarding, tuning for fast
convergence. This session also considers the tradeoffs
involved in adding redundancy in terms of network
convergence times.

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 2
Agenda
• High-Availability Overview
• IP Event Dampening
• Graceful Restart
• Fast Convergence
• IP Fast ReRoute
• Operational Features
• Summary

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 3
Overview

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 4
Availability Definitions
Availability
• The probability that an item (or network, etc.) is operational, and
functional as needed, at any point in time

• Or, the expected or measured fraction of time the defined service,


device or area is operational; annual uptime is the amount (in
days, hrs., min., etc.) the item is operational in a year

Network Provider
Shared Network
User Server
Network Network

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 5
Availability Definitions
Availability
• Availability = (MTBF—MTTR)/MTBF
Useful definition for theoretical and practical

• MTBF is mean time between failure


What, when, why and how does it fail?

• MTTR is mean time to repair


How long does it take to fix?

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 6
What Is High Availability?

Availability DPM Downtime Per Year (24x365)

99.000% 10000 3 Days 15 Hours 36 Minutes

99.500% 5000 1 Day 19 Hours 48 Minutes

99.900% 1000 8 Hours 46 Minutes

99.950% 500 4 Hours 23 Minutes

99.990% 100 53 Minutes

99.999% 10 5 Minutes “High


1 Availability”
99.9999% 30 Seconds

DPM = Defects per Million (Hours of Running Time)


Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 7
Downtime
Causes of Unscheduled Downtime
Network Operations Failures 87%

Physical Link Failures 87%

Network Hardware Failures 79%

Network Software Failures 67%

Customer Premises Equipment Failure 67%

Physical Environment Failures 62%

Congestion/Overload 44%

Unknown 37%

Acts of Nature 37%

Malicious Damage 25%

0% 20% 40% 60% 80% 100%


% of Respondents

Source: Sage Research, IP Service Provider Downtime Study: Analysis of Downtime Causes,
Costs and Containment Strategies, August 17, 2001, Prepared for Cisco SPLOB
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 9
Hardware Redundancy Options

Highly Available Hardware Network Replication of Hardware


- Failover redundant modules only + All modules are redundant
Operating System determines failover Protocols determine failover
+ Typically cost effective - Increased cost and complexity
+ Often the only option at the edge + Load balancing

Highly Available Networks Tend to Have Both

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 10
The Culture Of Availability
What’s Your Availability Level?
• Analyze the Gaps: Reactive ~99%
• Few if any identified processes (except maybe to fix
problems as reported by users)
• Significant number of SPFs
• Low tool utilization
• Low level of consistency (HW, SW, config, design)
• No quality improvement processes

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 12
The Culture Of Availability
What’s Your Availability Level?
• Analyze the Gaps: Proactive ~99.9%
• Good change management processes including what-if
analysis and change validation
• Low number of SPFs
• Fault and configuration management tools
• Improved consistency (HW, SW, Config, design)
• Typically no quality improvement process

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 13
The Culture Of Availability
What’s Your Availability Level?
• Analyze the Gaps: Predictive ~99.99+%
• Consistent processes for fault, configuration,
performance and security
• No SPFs except at edge of network
• Fault, configuration, performance and workflow process
tools
• Excellent consistency (HW, SW, config, design)
• HA culture of quality improvement

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 14
IP Event Dampening

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 15
IP Event Dampening
• Prevents routing protocol churn caused by constant
interface state changes
• Supports all IP routing protocols
Static Routing, RIP, EIGRP, OSPF, IS-IS, BGP
In addition, it supports HSRP and CLNS routing
Applies on physical interfaces and can’t be applied on sub-
interfaces individually
• Available in 12.0(22)S, 12.2(13)T

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 16
IP Event Dampening
Concept
• Takes the concept of BGP route-flap dampening and
applies it at the interface level, so all IP routing
protocols can benefit
• Tracks interface flapping, applying a “penalty”
to a flapping interface
• Puts the interface in “down” state from routing protocol
perspective if the penalty is over a threshold tolerance
• Uses exponential decay algorithm to decrease the
penalty over time and brings the interface back to “up”
state

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 17
IP Event Dampening
Deployment
Primary
Link
R1
Remote
HQ/ISP Office/Ent
R3

Backup
R2 Link

IP Event Dampening Absorbs Link Flapping Effects on Routing Protocols


Physical Up
State of
Primary
Link Down
Logical Up
State of
Primary
Link Down

R3 Path to
HQ/ISP P B P B P

Duration of Packet Loss

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 19
IP Event Dampening
Algorithm
Actual Interface State

Accumulated Penalty Maximum Penalty

Suppress Threshold

Reuse Threshold

Perceived Interface State


by Routing Protocols

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 21
Graceful Restart

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 22
NSF/SSO
• Standby Route Processor (RP) takes
control of router after a hardware or
State Information
software fault on the Active RP
• SSO allows standby RP to take
immediate control and maintain
connectivity protocols Active Standby
RP RP
• NSF continues to forward packets
until route convergence is complete
Line Card Line Card

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 23
NSF/SSO
Design Goals
• Provide a scalable solution
Architecture must scale with workloads and features and meet network
requirements
• Minimize state that must be synchronized
Minimize impact of HA on service
• Detect and react to failures quickly
Continuously monitor Active components
Continuously verify operation of Standby components

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 24
Graceful Restart
OSPF
• OSPF uses an extension to the
hello packets called link local
signaling. Control Data A

• The first hello A sends to B has

Empty Hello + Restart


an empty neighbor list; this tells B
that something is wrong with the
neighbor relationship.
• A sets the restart bit in its hello,
which tells B that A is still
forwarding traffic, and would like
to resynchronize its database.
Control Data B

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 25
Graceful Restart
OSPF
• B moves A into the exchange
state, and uses out of band
signaling (OOB) to Control Data A
resynchronize their databases.
• This process is the same as

LSA exchange
DBD exchange
initial database synchronization,
set A to
but it uses different packet types. exchange

Control Data B

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 26
OSPF GR/NSF Fundamentals

• When A and B have


resynchronized their
databases, they place each Control Data A
other in full state, and run SPF.
• After running SPF, the local
routing table is updated, and
OSPF notifies CEF.
• CEF then updates the
forwarding tables, and
removes all information
marked as stale.
Control Data B

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 27
Graceful Restart
OSPF
• Use the nsf command under
the router ospf configuration router ospf 100
mode to enable graceful nsf A
....
restart.
• Show ip ospf can be used to
verify graceful restart is
operational.
router ospf 100
nsf
....

router#sh ip ospf
Routing Process "ospf 100" with ID 10.1.1.1
.... B
Non-Stop Forwarding enabled, last NSF restart 00:02:06
ago (took 44 secs)
router#show ip ospf neighbor detail
Neighbor 3.3.3.3, interface address 170.10.10.3
....
Options is 0x52
LLS Options is 0x1 (LR), last OOB-Resync 00:02:22 ago

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 28
Graceful Restart
BGP
• When the BGP peering session
is brought up, the graceful restart
capability is negotiated. If both Control Data A
peers state they are capable of

New TCP Session


GR capability
GR, it’s enabled on the peering
session.
• When A restarts, it opens a new Restart; close
old session
TCP session to B, using the
same router ID.
• B interprets this as a restart, and
closes the old TCP session.
Control Data B

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 30
Graceful Restart
BGP
• B transmits updates containing its
BGP table (it’s local RIB out).
Control Data A
• A goes into read only mode, and
does not run the bestpath

Updates
End of RIB Marker
calculations until its B has
finished sending updates.
Read only
• When B has finished sending mode
updates, it sends an end of RIB
marker, which is an update with
an empty withdrawn NLRI TLV.

Control Data B

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 31
Graceful Restart
BGP
• When A receives the end of RIB
marker, it runs bestpath, and
installs the best routes in the Control Data A
routing table.
• After the local routing table is
updated, BGP notifies CEF.
• CEF then updates the forwarding
tables, and removes all
information marked as stale.

Control Data B

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 32
Graceful Restart
BGP
• Use the bgp graceful-restart
command under the router bgp router bgp 65000
configuration mode to enable bgp graceful-restart A
....
graceful restart.
• Show ip bgp neighbors can be
used to verify graceful restart
is operational.
router bgp 65501
bgp graceful-restart
....

router#show ip bgp neighbors x.x.x.x


.... B
Neighbor capabilities:
....
Graceful Restart Capabilty:advertised and received
Remote Restart timer is 120 seconds
Address families preserved by peer:
IPv4 Unicast, IPv4 Multicast

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 33
Fast Convergence

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 35
Network Convergence
• Network convergence is the time needed for traffic to
be rerouted to the alternative or more optimal path after
the network event
• Network convergence requires all affected routers to
process the event and update the appropriate data
structures used for forwarding

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 36
Network Convergence
• Network Convergence is the time required to:
Detect event has occurred
Propagate the event
Process the event
Update related forwarding structures

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 37
Event Propagation
OSPF
• Initial LSA Generation Delay
OSPF_LSA_DELAY_INTERVAL - 500ms delay
Only Router and Network LSA Generation Delayed

• Recurring LSA Origination Delay


MinLSInterval
The minimum time between distinct originations of any particular LSA.
The value of MinLSInterval is set to 5 seconds.

• LSA Arrival Throttling


MinLSArrival
“For any particular LSA, the minimum time that must elapse between
reception of new LSA instances during flooding. LSA instances
received at higher frequencies are discarded. The value of MinLSArrival
is set to 1 second.”

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 41
Event Propagation
OSPF Exponential Backoff
• Fast LSA Generation after Initial Event
• Repeated events increase regeneration delay
• Supported: 12.0(25)S, 12.2(18)S, 12.3(2)T
• Configuration:
timers throttle lsa all <lsa-start> <lsa-hold> <lsa-
max>
timers lsa arrival <timer>
All values are in ms
NOTE: MinLSArrival must be <= lsa-hold

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 42
Event Propagation
OSPF Exponential Backoff

timers throttle lsa all 10 500 5000

Events Causing LSA Generation previous LSA generation at t0 (t1 – t0) > 5000 ms

1000

t1 t2 time [ms]
LSA Generation

500 5000 5000

time [ms]
LSA Generation – Back-off Alg.

500 1000 2000 4000 5000

t1+10 t2+10 time [ms]

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 43
Event Propagation
OSPF
• With default values and no retransmission each node
can add 33ms delay to the event propagation
• Supported: 12.2(4)T, 12.2(18)S, 12.0(25)S
• Configuration:
Default values are 33 msec / 66 msec
timers pacing flood <timer>
timers pacing retransmission <timer>

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 46
Event Processing
OSPF Exponential Backoff
• SPF-DELAY and SPF-HOLDTIME protect the router as
the cost of convergence time
• Supported: 12.0(25)S, 12.2(18)S, 12.3(2)T
• Configuration:
timers throttle spf <spf-start> <spf-hold> <spf-max>
All values are in ms

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 47
Event Processing
SPF Triggers
• Router/Network LSA triggers full SPF
Some changes does not represent the topology change:
Stub network UP/DOWN
IP address change on link
During the full SPF the whole SPT is rebuilt
Change in the topology may not require the whole SPT
rebuild
Major part of the tree may stay the same in many cases

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 50
ISIS
• Prefix Prioritization
4 priorities: Critical, High, Medium, Low
/32 IPv4 and /128 IPv6 prefixes are classified by default in Medium Priority
Rest is classified by default in Low Priority
• Customization
spf prefix-priority
This command supports prefix list for the first three priorities. The unmatched
prefixes will be updated with low priority.
As soon as the “prefix priority” command is used, then the /32 heuristic is no
longer applied. If you then want to keep the /32’s in medium, you need to
configure the medium ACL so.

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 51
ISIS

• Prefix Prioritization is THE key behavior


CRITICAL: IPTV SSM sources
HIGH: Most Important PE’s
MEDIUM: All other PE’s
LOW: All other prefixes

• Prefix prioritization customization is required for


CRITICAL and HIGH

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 52
ISIS: Prefix Priority Customization
ipv4 prefix-list isis-critical-acl
10 permit 0.0.0.0/0 eq 32
ipv4 prefix-list isis-high-acl
10 permit 0.0.0.0/0 eq 30
ipv4 prefix-list isis-med-acl
10 permit 0.0.0.0/0 eq 29
router isis 1
address-family ipv4 unicast
spf prefix-priority critical isis-critical-acl
spf prefix-priority high isis-high-acl
spf prefix-priority medium isis-med-acl

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 53
Event Processing
Summary
• Set the link state generation initial wait time to 5ms.
The dampens some of the faster link flaps in the network.
Consider using IP Event Dampening to quell link flaps, as well.

• Set the increment and the maximum wait times to the


same values as you’ve set the SPF and PRC timers.
No point in generating LSP’s faster than the routers will actually
process them!

• Tune Carrier Delay down to 0, IP Event Dampening will


handled any instability from a flapping link
• Remember: Exponential Backoff is NOT Dampening

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 56
Agenda
• BGP Fast Convergence
BGP Scanner
NHT – Next Hop Tracking
FSD - Fast Session Deactivation
Event Driven Route Origination
MRAI – Min Route Advertisement Interval
TCP PMTU – Path MTU Discovery
Software Improvements

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 57
BGP Convergence
• BGP and IGP Convergence tuning have a different focus
IGP Convergence - Rebuild the topology quickly following an event
BGP Convergence - Transfer large amounts of prefix information very
quickly
• The magnitude of time involved is different
IGP - Sub-Second
BGP - Seconds to Minutes
• Fast IGP Convergence plays a role in maintaining availability for
BGP prefixes
Often topological changes can result in no BGP changes, the IGP
updates the next-hop information for BGP prefixes

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 58
Faster Convergence
• Increased focus on faster BGP convergence
Critical for voice
VPN customers want IGP-like convergence
• Several factors influence BGP convergence
Detection of Change
Propagation of Information
Network Topology and Complexity
Network Stability

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 59
Faster Convergence

• Typically two scenarios where we need faster convergence


• Single route convergence
A bestpath change occurs for one prefix
How quickly can BGP propagate the change throughout the network?
How quickly can the entire BGP network converge?
Key for VPNs and voice networks
• Router startup or “clear ip bgp *” convergence
Most stressful scenario for BGP
CPU may be busy for several minutes
Limiting factor in terms of scalability
Key for any router with a full Internet table and many peers

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 60
Convergence Basics – BGP Scanner
• BGP Scanner plays a key role in convergence
• Full BGP table scan happens every 60 seconds
bgp scan-time X
Lowering this value is not recommended
• Full scan performs multiple housekeeping tasks
Validate nexthop reachability
Validate bestpath selection
Route redistribution and network statements
Conditional advertisement
Route dampening
BGP Database cleanup
• Import scanner runs once every 15 seconds
Imports VPNv4 routes into vrfs
bgp scan-time import X

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 61
Convergence Basics – BGP Nexthops

• Every 60 seconds the BGP scanner recalculates bestpath for all


prefixes
• Changes to the IGP cost of a BGP nexthop will go unnoticed until
scanner’s next run
IGP may converge in less than a second
BGP may not react for as long as 60 seconds 
• Need to change from a polling model to an event driven model to
improve convergence
Polling model – Check each BGP nexthop’s IGP cost every 60 seconds
Event driven model – BGP is informed by a 3rd party when the IGP cost
to a BGP nexthop changes

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 62
ATF – Address Tracking Filter

• BGP tells ATF to let us know about BGP BGP Nexthops


any changes to 10.1.1.3 and 10.1.1.3
10.1.1.5 10.1.1.5

• ATF filters out any changes for


10.1.1.1/32, 10.1.1.2/32, and ATF
10.1.1.4/32

RIB
10.1.1.1/32
• Changes to 10.1.1.3/32 and 10.1.1.2/32
10.1.1.5/32 are passed along to 10.1.1.3/32
BGP 10.1.1.4/32
10.1.1.5/32

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 64
NHT – Next Hop Tracking

• BGP Next Hop Tracking


Enabled by default
12.0(29)S, 12.3(14)T
[no] bgp nexthop trigger enable
• BGP registers all nexthops with ATF
Hidden command will let you see a list of nexthops
show ip bgp attr nexthop
• ATF will let BGP know when a route change occurs for a nexthop
• ATF notification will trigger a lightweight “BGP Scanner” run
Bestpaths will be calculated
None of the other “Full Scan” work will happen

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 65
NHT – Next Hop Tracking

• Once an ATF notification is received BGP waits 5 seconds


before triggering NHT scan
bgp nexthop trigger delay <0-100>
May lower default value as we gain experience
• Event driven model allows BGP to react quickly to IGP
changes
No longer need to wait as long as 60 seconds for BGP to scan the
table and recalculate bestpaths
Tuning your IGP for fast convergence is recommended
• Dampening is used to reduce frequency of triggered scans

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 66
FSD – Fast Session Deactivation
• Register a peer’s addresses with ATF
• ATF will let BGP know if there is a change in the route to reach the
peer
• If we lose our route to the peer, tear down the session
No need to wait for the hold timer to expire!
• Ideal for multihop eBGP peers
• Very dangerous for iBGP peers
IGP may not have a route to a peer for a split second
FSD would tear down the BGP session
Imagine if you lose your IGP route to your RR (Route Reflector) for
just 100ms 
• Off by default
neighbor x.x.x.x fall-over
• Introduced in 12.0(29)S, 12.3(14)T
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 68
BGP Convergence
Peer Groups
• Peer Groups are not just to simplify configuration,
leading to the requirement for common outbound policy
• Update is formatted once for Peer Group Leader,
replicated for additional peers, provided they are in
sync
• Update replication is much faster than Update
formatting
• 12.0(24)S provides support for dynamic update groups,
which groups peers dynamically to provide the update
replication

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 72
BGP Convergence
Peer Groups: Test
Base Line Convergence

350 500

450
300
400
Convergence Time (Seconds)

250 350

Input Drop Count


300
200
Convergence Times
250
Input Drop Count
150
200

100 150

100
50
50

0 0
70,000 80,000 90,000 100,000 110,000 120,000 130,000 140,000
Convergence Times 158 178 195 227 244 254 277 315
Input Drop Count 202 264 257 264 252 464 366 467
Prefix Count

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 74
BGP Convergence
Peer Groups: Test
Peer Groups Convergence

250 1600

1400
200
Convergence Time (Seconds)

1200

Input Drop Count


1000
150
Convergence Times
800
Input Drop Count
100
600

400
50
200

0 0
70,000 80,000 90,000 100,000 110,000 120,000 130,000 140,000
Convergence Times 123 138 148 170 192 202 216 229
Input Drop Count 705 775 854 904 1076 1144 1263 1343
Prefix Count

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 75
TCP Path MTU Discovery
• MSS (Max Segment Size) – Limit on the largest segment that can
traverse a TCP session
Anything larger must be fragmented & re-assembled at the TCP layer
MSS is 536 bytes by default
• 536 bytes is inefficient for Ethernet (MTU of 1500) or POS (MTU of
4470) networks
TCP is forced to break large segments into 536 byte chunks
Adds overheads
Slows BGP convergence and reduces scalability
• “ip tcp path-mtu-discovery”
MSS = Lowest MTU between destinations - IP overhead (20 bytes) – TCP
overhead (20 bytes)
1460 bytes for Ethernet network
4430 bytes for POS network

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 76
BGP Convergence
Peer Groups and PMTU: Test

Peer Groups & Path MTU Discovery Convergence

200 500

180 450

160 400
Convergence Time (Seconds)

140 350

Input Drop Count


120 300
Convergence Times
100 250
Input Drop Count
80 200

60 150

40 100

20 50

0 0
70,000 80,000 90,000 100,000 110,000 120,000 130,000 140,000
Convergence Times 102 112 127 125 157 165 165 185
Input Drop Count 122 93 116 134 139 190 213 431
Prefix Count

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 78
BGP Convergence
Packet Drops
• The use of Peer Groups greatly increases the rate at
which the router can send BGP UPDATE messages.
• The returning TCP ACK’s can overflow the Input Hold
Queue, resulting in lost ACK’s and TCP backoff
• Will result in peers losing sync with Peer Group Leader

Hold Window Size


Queue = * Peer Count
Size 2 * MSS

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 79
BGP Convergence
Peer Groups, PMTU, and Queue Tuning: Test
Peer Groups/PMTUD/Queues Comparison

250
Convergence Time (Seconds)

200

150 Peer Groups Only


Peer Groups & PMTUD
Peer Groups & Queues
100 Peer Groups, PMTUD, Queues

50

0
70,000 80,000 90,000 100,000 110,000 120,000 130,000 140,000
Peer Groups Only 123 138 148 170 192 202 216 229
Peer Groups & PMTUD 102 112 127 125 157 165 165 185
Peer Groups & Queues 119 121 134 146 163 174 190 202
Peer Groups, PMTUD, Queues 82 90 114 118 136 153 157 170
Prefix Count

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 80
BGP Convergence
Update Packing
• BGP UPDATES are based on a set of attributes and a
list of prefixes that share that particular set of attributes
• Prior to 12.0(19)S, BGP UPDATE messages were not
packed optimally.
• Waiting until all prefixes are received from a peer prior
to processing them and sending updates can further
increase the ability to pack efficiently
• Fully supported in 12.0(23)S

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 81
BGP Convergence
Complete Optimization Test
Final Summary

350
Convergence Time (Seconds)

300
Base Line
250 Peer Groups
200 Peer Groups, PMTUD
Peer Groups, Queues
150
Peer Groups, PMTUD, Queues
100 Peer Groups, PMTUD, Queues, 12.0(23)S

50

0
70,000 80,000 90,000 100,000 110,000 120,000 130,000 140,000
Base Line 158 178 195 227 244 256 277 315
Peer Groups 123 138 148 170 192 202 216 229
Peer Groups, PMTUD 102 112 127 125 157 165 165 185
Peer Groups, Queues 119 121 134 146 163 174 190 202
Peer Groups, PMTUD, 82 90 114 118 136 153 157 170
Queues
Peer Groups, PMTUD, 36 41 54 65 78 93 107 129
Queues 12 0(23)S
Prefix Count

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 82
EIGRP Fast Convergence
Feasible Successors
 Whether an alternate path is a
feasible successor or not

0 00 00 00 00 00 00 00
10 20 30 40 50 60 70
makes a large difference in
convergence. Without feasible
successors

Milliseconds
 In this test, switching from the
best path to a feasible With feasible
successors
successor takes less than 1
second; switching to some
other neighbor takes about 6
seconds.

00

00

00

00

0
00
20

40

60

80

10
 It’s important to consider not Routes
only the best paths through an
EIGRP network, but also the
feasible successors.

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 85
EIGRP Fast Convergence
Feasible Successors
• In this network, B and C have equal 172.17.1.0/24
cost paths to A; neither one will see
the other as a feasible successor A
because the feasible distance is equal
to the reported distance.
B: 1544 B: 1544
• If either link fails, at least one D: 20000 D: 20000
query/reply will be required to
converge. B C

router-b#sho ip eigrp topo 172.17.1.0


IP-EIGRP (AS 100): Topology entry for 172.17.1.0/24
State is Passive, Query origin flag is 1, 1 Successor(s), FD is 2172416
Routing Descriptor Blocks:
172.17.2.1 (Serial0/0), from 208.0.8.4, Send flag is 0x0
Composite metric is (2172416/18944), Route is Internal
....
172.17.1.0 (Serial0/3), from 172.17.3.1, Send flag is 0x0
Composite metric is (2684416/2172416), Route is Internal
....

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 86
EIGRP Fast Convergence
Feasible Successors
• Increasing the C to A metric, and 172.17.1.0/24
decreasing the C to B metric by
the same amount, will allow B to A
become C’s feasible successor.
B: 1544 B: 1544
• But this only works one way; D: 20000 D: 20000
20020
there’s no way to make B and C
point at each other as feasible B C
successors of each other. Reduce
delay by 20
router-b#sho ip eigrp topo 172.17.1.0
IP-EIGRP (AS 100): Topology entry for 172.17.1.0/24
State is Passive, Query origin flag is 1, 1 Successor(s), FD is 2172416
Routing Descriptor Blocks:
172.17.2.1 (Serial0/0), from 208.0.8.4, Send flag is 0x0
Composite metric is (2172416/18944), Route is Internal
....
172.17.1.0 (Serial0/3), from 172.17.3.1, Send flag is 0x0
Composite metric is (2684416/2167296),
(2684416/2172416), Route is Internal
....

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 87
EIGRP Fast Convergence
Feasible Successors
• Whether the next best path is considered loop free by
EIGRP (a feasible successor) or not has a large impact
on convergence times.
• Don’t just consider the best path from every point in
your network, but also the next best path.
• Determine how best to set up your path metrics to
improve convergence performance.
• Always use the delay metric to engineer your routing,
never the bandwidth metric!

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 88
IP Fast ReRoute

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 89
Objective

• Provide fast re-route in pure IP networks and


MPLS/LDP networks without deploying RSVP-TE.
• To restore productive forwarding to all reachable
addresses within 50ms.
• Control the transition of the network from repair to
normal forwarding without further packet loss or micro-
looping.

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 90
The Four Stages of IPFRR

1. Pre-computation of repair paths


2. Detection of failure
3. Invocation of appropriate repair
4. Controlled re-convergence of network

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 91
Basic Repair

• Uses ECMP and Loop Free Alternates (LFA) where


available
• LFAs easily computed in OSPF and IS-IS
• Analogous to feasible successors in EIGRP

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 92
Triangle topology - ECMP

S N

Si Si

P O
Si Si

A B

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 93
Square topology - LFA

S N

Si Si

P
Si Si

A B

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 94
Basic Repair Properties
• In general topologies around 80% of failures allow all destinations
to be repaired
• For the remaining 20%, only a subset of destinations can be
repaired
These packets are dropped until convergence is complete
Loop free re-convergence mechanisms delay convergence (see later…)
Packet loss may be worse than conventional

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 95
More complex topology – no LFA available

S M N

Si Si Si

P
Si Si

A B

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 96
Complex topology

S M N

Si Si Si

Final Solution
in Process
P
Si Si

Ap

A B

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 100
Micro-loops

• Fast-reroute prevents all packet loss once a failure has


been detected.
• BUT packets can still be lost due to micro-looping when
the network re-converges.

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 103
Ordered FIB changes

• For any isolated link/node change


• Determine “safe” ordering for FIB installation
bad news: update from edge to failure,
good news: update from change to edge
• Each router computes its “rank” with respect to the
change.
• Delays for a number of worst-case FIB compute/install
times proportional to its rank.

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 110
Ordered Change Example

• Ensure the changes are in the order B,A,S

A A
1
B B

1 1

S S
4
F

1 3

1
E D

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 111
Ordered Change Example

• Ensure the changes are in the order B,A,S

A A
1
B B

1 1

S S
4
F

1 3

1
E D

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 112
Ordered Change Example

• Ensure the changes are in the order B,A,S

A A
1
B B

1 1

S S
4
F

1 3

1
E D

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 113
Ordered Change Example

• Ensure the changes are in the order B,A,S

A A
1
B B

1 1

S S
4
F

1 3

1
E D

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 114
Ordered FIB Properties

• No forwarding changes required.


• Complete prevention of loops for isolated node or link
changes.

• Requires cooperation from all routers


• Delay is proportional to network diameter.
May delay re-convergence for tens of seconds (unless optional
signalling used)

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 115
Operational Features

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 118
Graceful Shutdown
• You want to bring B down for
maintenance; the routing protocol A
Traffic is
will reroute around B. dropped at B
until routing
• The packets “in flight” will be lost converges
when B is taken off line, though—
and this could be a lot of packets, B C
if these are high speed links.
• It’s better to get A and D to route original New best
around B while B can still forward best path path
traffic, so B can be taken off line
gracefully. D

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 119
Graceful Shutdown
• Graceful shutdown will allow the
routing protocols to tear their B continues A
adjacencies down without forwarding until
impacting the forwarding tables routing
for some short period of time. converges

• Once the protocol has torn its B C


adjacencies down, it will then
clean up the forwarding tables.
Original New best
best path path

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 120
Graceful Shutdown

Protocol Specific Signaling

BGP CEASE NOTIFICATION

EIGRP UPDATE with INIT-bit set

LAN interfaces : IIH with an empty “IS Neighbors” TLV


Point-to-Point Links: “p2p Adjacency State” TLV set to “Init”.
ISIS

OSPF Hello with an empty Neighbor list

Graceful Shutdown Is Currently Under Development!


IS-IS Supported: 12.0(27)S

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 121
Wait For BGP
• E is learning 10.1.1.0/24 10.1.1.0/24
through iBGP from D with
a next hop of A
C starts and A
• E examines the path to A, and provides a
finds an IGP route through D better path to eBGP
A
to A. It installs this route in the Original
routing table best path
10 B 20 to A
• C is now inserted into the
circuit; after a few seconds, the C D
IGP has converged, and E now
10 20
chooses C as the best path E
to A Full iBGP
mesh

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 122
Wait For BGP
• However, BGP takes much 10.1.1.0/24
longer to converge if C is
accepting full routes (about
150,000 routes) from A; at A
least 5 minutes.
eBGP
• When E forwards packets to C
for 10.1.1.1, C hasn’t finished
10 B 20
building its BGP tables, so it
doesn’t know how to reach this C D
destination.
10 20
• C drops the packets. C has no path E
to 10.1.1.0/24 Full iBGP
mesh

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 123
Wait For BGP
• Instead, once the IGP has 10.1.1.0/24
converged, C signals its IGP
neighbors that they should not
route this direction. A

• The IGP remains in this state eBGP


until BGP notifies the IGP it
has converged.
10 B 20
• E will continue using D as its
best path to A, even though a C D
better one is available, until
10 20
BGP converges on C. E
Don’t use me yet! Full iBGP
mesh

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 124
Summary

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 126
Getting To 4 Nines
Roadblocks
• Single point of failure (edge card, edge router,
single trunk)
• Outage required for hardware and software upgrades
• Long recovery time for reboot or switchover
• No tested hardware spares available on site
• Long repair times due to a lack of
troubleshooting guides and process
• Inappropriate environmental conditions

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 127
Getting To 5 Nines
Roadblocks
• High probability of redundancy failure (failure
not detected—redundancy not implemented)
• High probability of double failures
• Long convergence time for rerouting traffic around a
failed trunk or router in the core
• Rely on manual operations

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 128
Network Design Conclusion

• Reduce complexity, increase modularity and


consistency
• Consider solution manageability
• Minimize failure domain size (reduce single points
of failure)
• Consider control plane resource requirements and the
affect of busy CPU/memory
• Consider protocol attributes
• Consider budget, requirements, areas of network
contributing the most downtime or at the greatest risk
• Test, test, test before deployment
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 129
Q&A

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 131
Complete Your Session Evaluation

 Please give us your feedback!!


Complete the evaluation form you were
given when you entered the room
 This is session BRKRST-3363
Don’t forget to complete the overall
event evaluation form included in
your registration kit

YOUR FEEDBACK IS VERY


IMPORTANT FOR US!!! THANKS

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 132
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 133
Backup Slides / Extra Material

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Public 134

You might also like