Slides 12
Slides 12
David Tipper
Graduate Telecommunications and Networking Program
University of Pittsburgh
Telcom 2110 Slides 12
Motivation
• Communications networks need to be
survivable?
• Communication Networks are Critical
I f t t
Infrastructure (CI) (PCCIP 1996) the
th systems,
t
assets and services upon which society and the
economy depend
• Communication infrastructure often considered
most important CI due to reliance on it by other
infrastructures
– banking and finance, government services
– power grid SCADA, etc.
• Increasing Impact and Rate of Failures
– Increased bandwidth of links (WDM technology in
fiber optic network)
– Increased societal dependence
– Multiple network operators and vendor equipment
1
Causes of Network Outages
• According to Sprint a link outage in IP backbone every 30 min
on average
• Accidents
– cable cuts
cuts, car wreck
wreck, etc
etc.
– According to AT&T 4.39 Cable cuts / year / 1000 km
• Human errors
– incorrect maintenance, installation
• Environmental hazards
– fire, flood, etc.
• Sabotage
– physical, electronic
• Operational disruptions
– schedule upgrades, maintenance, power outage
• Hardware/Software failures
– Line card failure, faulty laser, software crash, etc.
Backbone Failures
Other Unknown
9%
23%
Router Failures
Software failures
Hardware failures
Source: University of Michigan, 2000 DOS Attacks
2
Network Survivability
• Definition
– Ability of the network to support the committed Quality of
Services (QoS) continuously in the presence of various
failure scenarios
– Includes performance as well as availability
• Survivability Components
– Analysis: understand failures and system functionality after failures
3
Survivability – Basic Concepts
• Working path and Backup path (recovery path):
• Working path: carry traffic under normal
operation
• Backup path: an alternate path to carry the traffic
in case of failures
3 4 W orking route
Backup route
Backup
route
D CS
Custom er
1 X 2
A B
BP
BP
AP AP
4
Shared Risk Link Group
(SRLG)
C
Logical intent
Actual routing
A Physical Cables
Classification of
Survivability Techniques
5
Path-based versus Link-based
2 3
Working path
Backup path
1 6
4 5
Working path
Backup path
1 6
4 5
6
Partial Path Scheme
2 3 2 3
1 6 1 6
4 5 4 5
Working path
Backup path
Bandwidth Faster
efficient Simpler recovery speed
Path-based
Link-based
7
What Does Survivability Get You?
BP
WP
Source (S) Destination (D)
• Ai is an availability of link i
• Availability of a connection between S-D:
Ano protection Ai
iWP
Aprotection Ai Ai Ai
iWP iBP iWP BP
• Given Ai = 0.998297,
- Ano-protection = 0.996597, Aprotection= 0.999983
W ki path
Working th 6
12
10 5
Failure Dependent backup 4
8
Protection versus Restoration
• When to establish the backup paths?
• Protection
– Backup paths are fully setup before a failure occurs.
occurs
– When failure occurs, no additional signaling is needed to establish
the backup path
– Faster recovery time W
P
• Restoration
– Backup paths are established after a failure occurs
– More flexible with regard to the failure scenarios BP
• backup
b k paths
h are setup after
f theh llocation
i off failure
f il is
i known
k
– More capacity efficient
• due to its shared-backup nature,
• Utilize any spare capacity available in the network
– But cannot guarantee 100% restorability after failures
Protection
• Protection Variants
– 1+1 Protection (dedicated protection)
• Traffic is duplicated and transmitted over both working and backup
paths
– Fastest recovery speed, but not bandwidth efficient
– 1:1 Protection (dedicated protection with extra traffic)
• During normal operation (failure free), traffic is transmitted only
over working path; backup path can be used to transmit extra traffic
y traffic)) better bandwidth utilization
((low ppriority
• When the working path fails, extra traffic is preempted, and traffic
is switched to the backup path
BP
Source WP
Destination
9
Protection
– 1:N Protection (shared recovery with extra
traffic)
• One protection entity for N working entities
Protection Channel
Working Channel 1
APS
APS
Working Channel 2
Working Channel n
Node 1 Node 2
– M:N Protection (M N)
• M protection entities for N working entities
– Self Healing Rings are a form of Protection
Link Redundancy
Simultaneous Physical
• Link Bundling Connections
10
Types of Self-healing Rings
Working ring
Working ring
Protection ring
P
Protection
i ring
i
ADM ADM
ADM ADM
4 6
BP1
2
Working path Link 5-7:
5 7 dedicated spare capacity = 15 units
Backup path
1 shared spare capacity = 10 units
BP2
3 8
WP2 (traffic 10 units)
11
Dedicated versus Shared - Backup
• Dedicated-Backup Capacity
– Backup resource can be used only by a particular working path
• Shared-Backup
Shared Backup Capacity
– Backup resource between several working paths can be shared
– Rule: backup resource can be shared only when corresponding
working paths are not expected to fail at the same time
– More capacity efficient
WP1 (traffic 5 units)
4 6
BP1
2
Working path 5 7 Link 5-7:
Backup path dedicated spare capacity = 15 units
1 shared spare capacity = 10 units
BP2
3 8
WP2 (traffic 10 units)
12
P Cycles
Protection (P) Cycle
– Closed cycles are formulated in the mesh network.
– Affected traffic is rerouted along these cycles
cycles.
– For a large network will have a number of p-cycles
((a)) A pre-configure
p g cycle
y (b) A link on the cycle fails
(c) A link not on the cycle fails (d) Another link not on the cycle fails
P-Cycles: Basics
• For meshed networks
• Pre-reserved protection paths (before failure)
• Based on cycles, like rings
• Also protects straddling failures, unlike rings
• Local protection action, adjacent to failure (in the
order of some 10 milliseconds)
• Shared capacity
(c) A link not on the cycle fails
13
P-Cycles: Basics
p-Cycles: Basics
• Protected spans:
• 9 „on-cycle“ (1 protection path)
14
p-Cycles: Basics
• Protected spans:
• 9 ``on-cycle’’
y ((1 pprotection ppath))
• 8 ``straddling’’ (2 protection paths)
15
Mesh Survivability Techniques
Protection Restoration
Path-based
Link-based Restoration
Link-based
Shared-backup Protection
Path-based
Link-based
P-cycle
16
Transport Survivability
• Number of techniques exist
– APS
– Multi-homing (with or without trunk diversity)
– Link restoration
– Path restoration
– Self healing rings
– p-cycles
• See a mixture of techniques in real networks
• Usually little or no survivability at the far edge (CPE – last mile)
• Edges are multi-homed to MAN or WAN
Access
Core
Access
Dual/Multi-homing Topologies
• Dual-homing • Multi-homing
– Customer host is connected to – Customer host is connected to more
two switched-hubs. than two switched hubs.
– Traffic may be split between – Greater protection against a failure.
primary and secondary paths
connecting to the hubs.
– Each path serves as a backup for
another.
switch
customer host
17
Dual-homing in Telephone
Network
SDH/SONET Facility Transmission Network
Protection
Small
X Radius of
Class 4 Toll
Damage
Network Diverse
Switch
Locations
X Small Radius
of Service
Multiple Routes Transmission Network
Loss
Between Offices
Class 5
Local
Network
Customer
ISP
18
Virtual Router Redundancy Protocol
• Redundant default gateways:VRRP (RFC 2338)
Master Back-up
Multiple routers on
the subnet negotiate
who will be “Master”
All other routers
and own the Virtual
are backups. Backup
Router IP Address.
priority is configurable.
GE GE
Master sends
periodic hellos Ethernet Hosts are
Switch preconfigured with Virtual
to communicate
Router IP address as
alive state. default for
traffic exiting the LAN.
Host Host
Host
Customer Edge
(CE) Router
Provider Edge
(PE) Router
19
Implementation
• Multi-layered:
– Demand Topology
– Logical Transport Topology
– Fiber/Optical Topology
• Can implement survivability
techniques at each layer
• Need to consider
– Failure propagation
– Alarm Setting
– Speed of recovery
– Cost
– Management
– Traffic Grooming
– Etc.
20
Steps in Traffic Recovery
Detection
Reconfiguration
Repair
p p process
process
Notification
Path selection
Repair Rerouting
Normalization
IP Survivability Options
21
IP Dynamic Routing
New York
San
Francisco
New Path Computed
New York
San
Francisco
Primary LSP
Backup LSP
22
MPLS Fast Reroute
• Increasing demand for
“APS-like” redundancy
– MPLS resilience
ili tto li
link/node
k/ d failures
f il D t
Detour
1:1 Protection
• For each LSP, for each node
– Set up one LSP as backup
– Merge into primary LSP further downstream
– Backs up link and downstream node
23
1:1 LSP Protection
24
1:N Link Protection
Link Fails
25
MPLS Fast Reroute
26
Multilayer Networks
• WAN networks have multiple technology layers
• Converging toward IP/MPLS/WDM
• Multiple Layers present several survivability challenges
• Coordination of recovery actions at different layers
– Which layer is responsible for fault recovery?
• Spare Capacity Allocation (SCA)
– How to prevent over allocation, when each layer provides spare resources?
• Failure Propagation
– Lower layer failure can affect multiple higher layer links!
3
1 MPLS connections
5 WDM Physical Path
2 3
4 5
27
Spare Capacity Allocation
• Single Layer Spare Capacity Allocation (SCA) Problem
– given working paths and network (or virtual network) topology
– provision spare capacity and find backup routes for fault tolerance
– Goal: minimum spare capacity or cost
• Matrix based formulation*
– P path link incident matrix, Q backup link incident matrix
– Relate to spare provision matrix G, and spare capacity reservation s
– Assume path restoration with disjoint backup routes
– Shared backup path protection for any single link failure
1 4
l
2 3
5 6
* Y.Liu, D.Tipper, and P. Siripongwutikorn, “Approximating Optimal Spare Capacity Allocation by Successive
Survivable Routing,'' ACM/IEEE Transactions on Networking, Vol. 13., No. 1, pp. 198-211, Feb., 2005 .
28
Optimization model for link
failures
min S = eT s Total spare capacity
Q,s
s.t. sG Enough spare capacity on each link
G = QT M P Calculation of spare provision matrix
P+Q1
Link-disjointed backup paths
Q BT = D (mod 2)
Flow conservation of backup
Q is a binary matrix
Decision variable: Q, s Integer programming
29
SSR flowchart of flow r
Numerical comparison
30
Experiment networks
1 2 3 4
1 2 13 9 7
1 10 9 7
13
17 4 2 3
4 5 2 12 11 9 6
4 2 3
6 12
3 6 12
10 4 5
8
7 8 3 10 4 5
16
6 11 8
4 7 14
9 10 11 8
5
15
5 8
6 7 8 1 2 4 9
13
14
17 26
13 2 3 4 5 6 7 8 46
3 8
6 5
1
2
3 12 16
1 21 20 47
18 45
17 16 11 16 7 38
13 20 22 21
5 9
23
6 25
5 15 48
10 8
14 15 17 22
9 1 24
14 18
39
15 18 19 23 10 49 50
19 19 27
3 17 9
2 12 6 24 44
7 12 11
4 25 11 20 23 28 40
4 16
7
43
18 10 26
15 14 13 12 11 10 22 29 41
32 42
36
30
21 33
34
35
31 37
Worse
64 random
d cases with
ith solutions,
65
different flow orders fast
60 RAFT
Redundancy (%)
• Range of solutions
55 SPI Near optimal
• Time is the sum of solutions, fast
50
time to compute all Better solutions,
64 cases 45 slow, not
40
scalable
SR SSR SA
LP BB
35
Infeasible
30
-2 0 2 4
10 10 10 10
Time (second)
31
State of the Art
• Survivable Network Design
– Important in WAN Backbones
• Basic approach
pp
– Given particular technology (e.g., WDM, MPLS, etc)
assume
• Traffic restoration scheme (e.g., failure independent path restoration)
• Failure scenario (any single link failure)
– Determine least cost survivable network design using
optimization formulations with heuristic solutions
• Manyy tradeoffs
deo s identified
de ed andd studied
s ud ed
– Protection vs. Restoration
– Reactive vs. Proactive
W
– Shared vs. Dedicated P
32