0% found this document useful (0 votes)
194 views11 pages

ALU - BFD Session Down - RCA - TAC

On March 7, 2016 the customer reported that multiple BFD sessions with Nokia and ZTE RNCs were down and not recovering, though the OSPF protocol was up. The TAC investigation found that resetting the BFD process at the RNC caused the BFD sessions to come up briefly before going down again. OSPF was showing as full, and ping tests between devices were successful, but the BFD sessions remained down. Examining the BFD sessions on the 7750 SR router showed them in the down state. The issue was determined to be related to a mismatch in how the SR and RNC were handling changes to the BFD discriminator values during session renegotiation.

Uploaded by

ravi kant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
194 views11 pages

ALU - BFD Session Down - RCA - TAC

On March 7, 2016 the customer reported that multiple BFD sessions with Nokia and ZTE RNCs were down and not recovering, though the OSPF protocol was up. The TAC investigation found that resetting the BFD process at the RNC caused the BFD sessions to come up briefly before going down again. OSPF was showing as full, and ping tests between devices were successful, but the BFD sessions remained down. Examining the BFD sessions on the 7750 SR router showed them in the down state. The issue was determined to be related to a mismatch in how the SR and RNC were handling changes to the BFD discriminator values during session renegotiation.

Uploaded by

ravi kant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

March 24, 2016

AR 1-6186571 | 1-6053841 | BFD session down - RCA

RCA Requestor TSO, MSM, Customer


Version Final
Report Attributes 7750 TAC/TEC

7750/7210/7705 – Technical Assistance Center 1


Problem Description:

On March 7, 2016 Customer reported Multiple BFD sessions with NOKIA and ZTE RNC are down and not
recovering but the OSPF protocol is up.

Chassis: 7750 SR-a8


REL: 12.0R10

TAC/TEC Investigation:

NOKIA RNC Analysis

Customer reported RNC is directly connected to ALU Router - IUCS UP Service under VPRN 2001
context is running with BFD over OSPF client and BFD sessions went down and are not recovering.

The BFD session came UP after resetting BFD Process at RNC end but again it went down. While
checked OSPF is showing in Full state, able to PING point to point IP’s and getting ARP as well but
BFD session is showing down.

A:UPE_VAR_TAR_CR01_SRa8_R01# show router 2001 interface


=============================================================================
Interface Table (Service: 2001)
=============================================================================
Interface-Name Adm Opr(v4/v6) Mode Port/SapId
IP-Address PfxState
-----------------------------------------------------------------------------
RNC1_IUCS_UP_1_B1_E0 Up Up/Down VPRN rvpls
10.177.72.14/28 n/a
RNC1_IUCS_UP_3_B2_E1 Up Up/Down VPRN rvpls
10.177.72.46/28 n/a
RNC1_IUR_UP_1_B1_E0 Up Up/Down VPRN rvpls
10.177.72.142/28 n/a
RNC1_IUR_UP_3_B2_E1 Up Up/Down VPRN rvpls
10.177.72.174/28 n/a
To_SR1_IUCS_UP_NB Up Up/Down VPRN 1/1/1:206
10.50.15.73/30 n/a
To_VAR_TAR_CR02_IUCS_UP Up Up/Down VPRN lag-1:201
10.50.15.89/30 n/a
-----------------------------------------------------------------------------
Interfaces : 6
=============================================================================

*A:UPE_VAR_TAR_CR01_SRa8_R01# show router 2001 ospf neighbor

=============================================================================
OSPFv2 (0) all neighbors

7750/7210/7705 – Technical Assistance Center 2


=============================================================================
Interface-Name Rtr Id State Pri RetxQ TTL
Area-Id
-----------------------------------------------------------------------------
RNC1_IUCS_UP_1_B1_E0 10.177.72.1 Full 0 0 3
0.0.0.182
RNC1_IUCS_UP_1_B1_E0 10.177.72.2 Full 0 0 3
0.0.0.182
RNC1_IUR_UP_1_B1_E0 10.177.72.129 Full 0 0 3
0.0.0.185
RNC1_IUR_UP_1_B1_E0 10.177.72.130 Full 0 0 3
0.0.0.185
RNC1_IUCS_UP_3_B2_E1 10.177.72.49 Full 0 0 4
0.0.0.182
RNC1_IUCS_UP_3_B2_E1 10.177.72.50 Full 0 0 3
0.0.0.182
RNC1_IUR_UP_3_B2_E1 10.177.72.177 Full 0 0 4
0.0.0.185
RNC1_IUR_UP_3_B2_E1 10.177.72.178 Full 0 0 3
0.0.0.185
To_VAR_TAR_CR02_IUCS_UP 10.34.74.77 Full 1 0 31
0.0.0.0
-----------------------------------------------------------------------------
No. of Neighbors: 9
=============================================================================

*A:UPE_VAR_TAR_CR01_SRa8_R01# show router 2001 bfd session src 10.177.72.14


=============================================================================
Legend: wp = Working path pp = Protecting path
=============================================================================
BFD Session
=============================================================================
If/Lsp Name/Svc-Id State Tx Intvl Rx
Intvl Multipl
Rem Addr/Info/SdpId:VcId Protocols Tx Pkts Rx Pkts Type
LAG port LAG ID
-----------------------------------------------------------------------------
RNC1_IUCS_UP_1_B1_E0 Down 1000 150 3
10.177.72.1 ospf2 N/A N/A cpm-
np
RNC1_IUCS_UP_1_B1_E0 Down 1000 150 3
10.177.72.2 ospf2 N/A N/A cpm-
np
-----------------------------------------------------------------------------
No. of BFD sessions: 2

*A:UPE_VAR_TAR_CR01_SRa8_R01# show router 2001 bfd session src 10.177.72.14


detail

=============================================================================
BFD Session
=============================================================================
Remote Address : 10.177.72.1
Admin State : Up Oper State : Down
Protocols : ospf2

7750/7210/7705 – Technical Assistance Center 3


Rx Interval : 150 Tx Interval : 150
Multiplier : 3 Echo Interval : 0
Up Time : None Up Transitions : 0
Down Time : 0d 00:23:51 Down Transitions : 0
Version Mismatch : 0

Forwarding Information

Local Discr : 173 Local State : Down


Local Diag : 0 (None) Local Mode : Async
Local Min Tx : 1000 Local Mult : 3
Last Sent (ms) : 0 Local Min Rx : 150
Type : cpm-np
Remote : Unheard
=============================================================================
Remote Address : 10.177.72.2
Admin State : Up Oper State : Down
Protocols : ospf2
Rx Interval : 150 Tx Interval : 150
Multiplier : 3 Echo Interval : 0
Up Time : None Up Transitions : 0
Down Time : 0d 00:24:01 Down Transitions : 0
Version Mismatch : 0

Forwarding Information

Local Discr : 171 Local State : Down


Local Diag : 0 (None) Local Mode : Async
Local Min Tx : 1000 Local Mult : 3
Last Sent (ms) : 0 Local Min Rx : 150
Type : cpm-np
Remote : Unheard
=============================================================================

*A:UPE_VAR_TAR_CR01_SRa8_R01# ping router 2001 10.177.72.1 rapid


PING 10.177.72.1 56 data bytes
!!!!!
---- 10.177.72.1 PING Statistics ----
5 packets transmitted, 5 packets received, 0.00% packet loss
round-trip min = 0.084ms, avg = 0.094ms, max = 0.108ms, stddev = 0.008ms
*A:UPE_VAR_TAR_CR01_SRa8_R01# ping router 2001 10.177.72.2 rapid
PING 10.177.72.2 56 data bytes
!!!!!
---- 10.177.72.2 PING Statistics ----
5 packets transmitted, 5 packets received, 0.00% packet loss
round-trip min = 0.080ms, avg = 0.095ms, max = 0.120ms, stddev = 0.013ms

A:UPE_VAR_TAR_CR01_SRa8_R01# show router 2001


arp

=============================================================================
ARP Table (Service: 2001)
=============================================================================

7750/7210/7705 – Technical Assistance Center 4


IP Address MAC Address Expiry Type Interface
-----------------------------------------------------------------------------
--
10.50.15.73 a4:7b:2c:05:0d:b9 00h00m00s Oth[I] To_SR1_IUCS_UP_NB
10.50.15.74 00:30:88:18:c8:67 03h47m13s Dyn[I] To_SR1_IUCS_UP_NB
10.177.72.1 74:fe:48:0d:a5:da 03h58m07s Dyn[I] RNC1_IUCS_UP_1_B1_E0
10.177.72.2 74:fe:48:0d:95:f9 03h56m14s Dyn[I] RNC1_IUCS_UP_1_B1_E0
10.177.72.14 e4:81:84:8b:99:15 00h00m00s Oth[I] RNC1_IUCS_UP_1_B1_E0
10.177.72.129 74:fe:48:0d:a5:da 03h56m29s Dyn[I] RNC1_IUR_UP_1_B1_E0
10.177.72.130 74:fe:48:0d:95:f9 03h55m18s Dyn[I] RNC1_IUR_UP_1_B1_E0
10.177.72.142 e4:81:84:8b:99:1d 00h00m00s Oth[I] RNC1_IUR_UP_1_B1_E0
10.177.72.33 74:fe:48:0d:b0:1a 03h53m15s Dyn[I] RNC1_IUCS_UP_3_B2_E1
10.177.72.34 74:fe:48:0d:ac:9a 03h57m23s Dyn[I] RNC1_IUCS_UP_3_B2_E1
10.177.72.46 e4:81:84:8b:99:17 00h00m00s Oth[I] RNC1_IUCS_UP_3_B2_E1
10.177.72.161 74:fe:48:0d:b0:1a 03h59m49s Dyn[I] RNC1_IUR_UP_3_B2_E1
10.177.72.162 74:fe:48:0d:ac:9a 03h57m01s Dyn[I] RNC1_IUR_UP_3_B2_E1
10.177.72.174 e4:81:84:8b:99:1f 00h00m00s Oth[I] RNC1_IUR_UP_3_B2_E1
10.50.15.89 e4:81:84:8b:97:27 00h00m00s Oth[I] To_VAR_TAR_CR02_IUCS_UP
10.50.15.90 e4:81:84:8b:db:27 01h23m37s Dyn[I] To_VAR_TAR_CR02_IUCS_UP
_____________________________________________________________________

Based on the PCAP shared, BFD enabled and disabled mode on the SR. When BFD is disabled in SR, then
SR sends AdminDown with No Diag State which is expected behavior but RNC is marking the received
message into DOWN with Diag state Administratively Down.

Later when BFD is enabled on SR, it has been observed that SR starts new BFD session request by
incrementing the DISCRIMINATOR value and sets your DISCRIMINATOR to 00 with AdminDown message
but RNC doesn’t process these AdminDown BFD control packets as RNC is expecting DOWN message and
keep continuous to send previously negotiate DISCRIMINATOR values since SR didn’t find new
DISCRIMINATOR in BFD control message hence discards as its doesn’t see its value in the BFD control
packet sent and BFD runs in stuck condition and doesn’t recovers.

7750/7210/7705 – Technical Assistance Center 5


7750/7210/7705 – Technical Assistance Center 6
ZTE RNC Analysis:

Node Location: Mohali SR core routers

TiMOS Used: 12.0R10

Customer has reported the BFD down issue for the link connected to ZTE RNC. Whenever there is a flap
on BFD session, BFD stuck in DOWN state and manual BFD flap at RNC is required to restore the BFD
session. TAC has investigated the packet capture files shared by customer. Please find our observations
given below,

During the BFD DOWN state,

1. SR sends the BFD ADMIN DOWN packet to the ZTE RNC (seq# 13519)
2. But the SR is not receiving the BFD DOWN packet from ZTE RNC for the BFD session to initiate.

7750/7210/7705 – Technical Assistance Center 7


3. Hence the BFD session stick in the DOWN state.

During the BFD UP state,

In this scenario, the BFD DOWN packet was sent by ZTE RNC on response to ADMIN DOWN BFD packet
which was sent by SR. Later on the receipt of the BFD DOWN packet, SR initiated the session and the
session came up without any issue. This proves that ZTE RNC has the mechanism to send the BFD DOWN
packet on receipt of BFD ADMIN DOWN packet from SR.

Additional finding in this scenario is that, transaction between ZTE and SR happened within a second
i.e. from 8234 to 8237. Hence the ZTE RNC didn’t wait till hold time or the faster rate (1sec on SR
during down event) of BFD ADMIN DOWN packet from SR didn’t block the BFD DOWN packet
transmission.

7750/7210/7705 – Technical Assistance Center 8


During the joint call discussion with ZTE vendor, it has been agreed that ZTE TAC will check with their
R&D for findings (from sequence# 8229 to 8250) based on packet capture files. We are waiting for an
update from ZTE side for the reason for responding once to ADMIN DOWN packet which was sent by SR
and why not consistently. Also, we received an update that ZTE is expecting BFD DOWN packet from SR
to bring up the BFD session.

Aside, we continued our investigation to know why SR didn’t send BFD DOWN packet to the connected
device. Please find our observation,

1. When the IOM based BFD is configured on SR, SR sends BFD ADMIN DOWN and DOWN packet.

2. When the CPM based BFD is configured on SR, SR sends only BFD ADMIN DOWN packet.

On consultation with the R&D, it has been confirmed as the limitaion in CPM based BFD where it sends
only the BFD ADMIN DOWN packet. Hence the BFD session didn’t recover on its own with the ZTE RNC.
To overcome this limitation, the workaround is to configure the IOM based BFD instead of CPM based
one.

We have compared the configuration of Mohali and Ludhiana core routers and found that Ludhiana core
routers have not configured CPM based BFD with ZTE RNC hence no BFD issues have been reported.

Working Configuration (LUD CR2):

interface "RNC1_IUB_P_3/26/1" create

address 10.157.68.141/30

bfd 100 receive 100 multiplier 3

7750/7210/7705 – Technical Assistance Center 9


sap 1/3/5:2044 create

exit

exit

Non-working Configuration (MOH CR2):

interface "RNC1_IUB_P_3/26/1" create

address 10.157.64.141/30

bfd 100 receive 100 multiplier 3 type cpm-np

sap 1/2/4:2044 create

exit

exit

Snapshot of PCAP file for working router (LUD CR2 - refer sequence# 6132 & 6236):

Root Cause Analysis:


As per the discussion with TEC/R&D, this issue is a limitation with CPM based BFD in current REL i.e.
(along with AdminDown message, DOWN message will not sent by SR) whereas in case of IOM based
BFD, both AdminDown and DOWN is being sent by SR.

Workaround: Use IOM based BFD

7750/7210/7705 – Technical Assistance Center 10


Ravi Kant R

From: Hazarika, Partha Pratim (Nokia - IN) <[email protected]>


Sent: Monday, March 28, 2016 3:20 AM
To: Hazarika, Partha Pratim (Nokia - IN); Anuj Kumar Pandey
Cc: Ajit .; Amit Sharma W; Ankit Chaudhary B; Anoop K P K; Anubhav .; Arjun Rana A;
Ashish Kumar W; Azmul Hussain A; pooja kaushal; Balaguru S; Bharat Arora B;
Bhaskar Mishra; H Sindogi, Chandra (Nokia - IN); [email protected];
[email protected]; Chitwan Seedhar; [email protected]; Goutam
Prakash; IP-TAC-INDIA; Veluchamy, Jagankumar (Nokia - IN); Kumar Ankit K;
Manish Kumar NN; Mayank Yadav; MD Iqubal Zafar Quadri; Mohammad Ahmad
Raza; Noc, Mpbn (EXT - IN); Mukesh Kumar C; Mukesh Singh; Mukund Desai; Namit
Pant; Pawan Kumar G; BO RAN NORTH (EGI); Pradeep P P; Raheelullah Hussain;
Amit Sood; Gupta, Rajat 1. (Nokia - IN); Raman Katoch; Ravi Kant Sharma; Ravi Kant
R; Robin Goel; Rohit Saini; Seema Choudhary; Yaligar, Shahajan (Nokia - IN); Shibu
Melodan; [email protected]; Shyamanta Kr Dutta S; Sunil
Pratap Singh; Vinay Sharma A; Vivek Kumar V; Jatinder Singh E; Nagoor,
Thiruvenkatam (Nokia - IN)
Subject: RE: 1-6053841 ::: Bfd down with ZTE RNC (point to point IP is reachable) High
Priority_With Attachment_
Attachments: BFD session down - RCA.pdf

“Resending as last sent mail not received properly by many recipient” 
 
Dear Anuj, 
In accordance to the ongoing BFD issue, final analysis reveals that there is an BFD interop issue identified between 
ALU‐ZTE and ALU‐NOKIA devices for the CPM‐NP based bfd session. We have raised it  to higher level for a 
enhancement release for this issue.  
 
As a workaround for this issue we have to plan for reconfiguration some of the existing BFD sessions from CPM to 
IOM.  
 
1. Reconfiguration to be done mainly on the directly connected bfd sessions between ALU‐ZTE and ALU‐NOKIA.
2. As the issue identified only for interop therefore  ALU‐ALU bfd session won’t be impacted. But Considering a 
scaling limit (Which is 50 bfd sessions per IOM) we may need to reconfigure the ALU‐ALU BFD session from 
IOM based to CPM. 
 
In case of need we will support for the planning. 
 
Note:‐ A detailed RCA is attached for this issue. 
 
 
Thanking you... 
BR/// 
Partha Hazarika 
CTA 
ION, Nokia 
M: +91 8800361999
 
 
 

You might also like