ALU - BFD Session Down - RCA - TAC
ALU - BFD Session Down - RCA - TAC
On March 7, 2016 Customer reported Multiple BFD sessions with NOKIA and ZTE RNC are down and not
recovering but the OSPF protocol is up.
TAC/TEC Investigation:
Customer reported RNC is directly connected to ALU Router - IUCS UP Service under VPRN 2001
context is running with BFD over OSPF client and BFD sessions went down and are not recovering.
The BFD session came UP after resetting BFD Process at RNC end but again it went down. While
checked OSPF is showing in Full state, able to PING point to point IP’s and getting ARP as well but
BFD session is showing down.
=============================================================================
OSPFv2 (0) all neighbors
=============================================================================
BFD Session
=============================================================================
Remote Address : 10.177.72.1
Admin State : Up Oper State : Down
Protocols : ospf2
Forwarding Information
Forwarding Information
=============================================================================
ARP Table (Service: 2001)
=============================================================================
Based on the PCAP shared, BFD enabled and disabled mode on the SR. When BFD is disabled in SR, then
SR sends AdminDown with No Diag State which is expected behavior but RNC is marking the received
message into DOWN with Diag state Administratively Down.
Later when BFD is enabled on SR, it has been observed that SR starts new BFD session request by
incrementing the DISCRIMINATOR value and sets your DISCRIMINATOR to 00 with AdminDown message
but RNC doesn’t process these AdminDown BFD control packets as RNC is expecting DOWN message and
keep continuous to send previously negotiate DISCRIMINATOR values since SR didn’t find new
DISCRIMINATOR in BFD control message hence discards as its doesn’t see its value in the BFD control
packet sent and BFD runs in stuck condition and doesn’t recovers.
Customer has reported the BFD down issue for the link connected to ZTE RNC. Whenever there is a flap
on BFD session, BFD stuck in DOWN state and manual BFD flap at RNC is required to restore the BFD
session. TAC has investigated the packet capture files shared by customer. Please find our observations
given below,
1. SR sends the BFD ADMIN DOWN packet to the ZTE RNC (seq# 13519)
2. But the SR is not receiving the BFD DOWN packet from ZTE RNC for the BFD session to initiate.
In this scenario, the BFD DOWN packet was sent by ZTE RNC on response to ADMIN DOWN BFD packet
which was sent by SR. Later on the receipt of the BFD DOWN packet, SR initiated the session and the
session came up without any issue. This proves that ZTE RNC has the mechanism to send the BFD DOWN
packet on receipt of BFD ADMIN DOWN packet from SR.
Additional finding in this scenario is that, transaction between ZTE and SR happened within a second
i.e. from 8234 to 8237. Hence the ZTE RNC didn’t wait till hold time or the faster rate (1sec on SR
during down event) of BFD ADMIN DOWN packet from SR didn’t block the BFD DOWN packet
transmission.
Aside, we continued our investigation to know why SR didn’t send BFD DOWN packet to the connected
device. Please find our observation,
1. When the IOM based BFD is configured on SR, SR sends BFD ADMIN DOWN and DOWN packet.
2. When the CPM based BFD is configured on SR, SR sends only BFD ADMIN DOWN packet.
On consultation with the R&D, it has been confirmed as the limitaion in CPM based BFD where it sends
only the BFD ADMIN DOWN packet. Hence the BFD session didn’t recover on its own with the ZTE RNC.
To overcome this limitation, the workaround is to configure the IOM based BFD instead of CPM based
one.
We have compared the configuration of Mohali and Ludhiana core routers and found that Ludhiana core
routers have not configured CPM based BFD with ZTE RNC hence no BFD issues have been reported.
address 10.157.68.141/30
exit
exit
address 10.157.64.141/30
exit
exit
Snapshot of PCAP file for working router (LUD CR2 - refer sequence# 6132 & 6236):
“Resending as last sent mail not received properly by many recipient”
Dear Anuj,
In accordance to the ongoing BFD issue, final analysis reveals that there is an BFD interop issue identified between
ALU‐ZTE and ALU‐NOKIA devices for the CPM‐NP based bfd session. We have raised it to higher level for a
enhancement release for this issue.
As a workaround for this issue we have to plan for reconfiguration some of the existing BFD sessions from CPM to
IOM.
1. Reconfiguration to be done mainly on the directly connected bfd sessions between ALU‐ZTE and ALU‐NOKIA.
2. As the issue identified only for interop therefore ALU‐ALU bfd session won’t be impacted. But Considering a
scaling limit (Which is 50 bfd sessions per IOM) we may need to reconfigure the ALU‐ALU BFD session from
IOM based to CPM.
In case of need we will support for the planning.
Note:‐ A detailed RCA is attached for this issue.
Thanking you...
BR///
Partha Hazarika
CTA
ION, Nokia
M: +91 8800361999