Troubleshoot SD-WAN Control Connections - Cisco
Troubleshoot SD-WAN Control Connections - Cisco
Troubleshoot SD-WAN Control Connections - Cisco
Contents
Introduction
Background Information
Scenarios
DTLS Connection Failure (DCONFAIL)
TLOC Disabled (DISTLOC)
Board-ID Not Initialized (BIDNTPR)
BDSGVERFL - Board ID Signature Failure
Stuck in 'Connect': Routing Issues
Socket Errors (LISFD)
Peer Timeout Issue (VM_TMO)
Serial Number(s) Not Present (CRTREJSER, BIDNTVRFD)
For Issues with vEdge/vSmart
For Issues with Controllers
Wrong Chassis-Num/Unique-Id
Organization Mismatch (CTORGNMMIS)
vEdge/vSmart Certificate Revoked/Invalidated (VSCRTREV/CRTVERFL)
vEdge Template Not Attached in vManage
Transient Conditions (DISCVBD, SYSIPCHNG)
Introduction
This document describes some of the probable causes that lead to a problem with Control Connections
and how to troubleshoot them.
Background Information
Refer to Troubleshoot Control Connections on Viptela Site for more information.
Note: Most of the command outputs presented in this document are from vEdge routers. However,
the troubleshooting approach is the same for routers that run Cisco IOS® XE SD-WAN software.
Enter the sdwan keyword in order to get the same outputs on Cisco IOS XE SD-WAN software. For
example, show sdwan control connections instead of show control connections.
Before you start to troubleshoot, ensure that the vEdge that is in question has been configured properly.
It includes:
https://www.cisco.com/c/en/us/support/docs/routers/sd-wan/214509-troubleshoot-control-connections.html 1/14
8/15/22, 1:36 AM Troubleshoot SD-WAN Control Connections - Cisco
Organization-Name
vBond address
VPN 0 Transport interface that is configured with the Tunnel option and IP address.
System Clock that is configured correctly on the vEdge and those that match with other
devices/controllers:
The show clock command confirms the current time set.
Enter the clock set command in order to set the correct time on the device.
For all the cases mentioned earlier, ensure that Transport Locator (TLOC) is up. Check this with the
show control local-properties CLI command.
An example of valid output is shown here:
certificate-validity Valid
certificate-not-valid-before Sep 06 22:39:01 2018 GMT
certificate-not-valid-after Sep 06 22:39:01 2019 GMT
dns-name trainingvbond.viptela.com
site-id 10
domain-id 1
protocol dtls
tls-port 0
system-ip 10.1.10.1
chassis-num/unique-id 66cb2a8b-2eeb-479b-83d0-0682b64d8190
serial-num 12345718
vsmart-list-version 0
keygen-interval 1:00:00:00
retry-interval 0:00:00:17
no-activity-exp-interval 0:00:00:12
dns-cache-ttl 0:00:02:00
port-hopped TRUE
time-since-last-port-hop 20:16:24:43
number-vbond-peers 2
INDEX IP PORT
-------------------------------
0 10.3.25.25 12346
1 10.4.30.30 12346
number-active-wan-interfaces 2
https://www.cisco.com/c/en/us/support/docs/routers/sd-wan/214509-troubleshoot-control-connections.html 2/14
8/15/22, 1:36 AM Troubleshoot SD-WAN Control Connections - Cisco
In vEdge software Version 16.3 and later, the output has a few additional fields:
number-vbond-peers 1
number-active-wan-interfaces 1
--------------------------------------------------------------------------------
ge0/4 172.16.0.20 12386 192.168.0.20 2601:647:4380:ca75::c2 12386 2/1 pu
Scenarios
When you have a DTLS connection failure, you might see it in the show control connections-history
command output.
https://www.cisco.com/c/en/us/support/docs/routers/sd-wan/214509-troubleshoot-control-connections.html 3/14
8/15/22, 1:36 AM Troubleshoot SD-WAN Control Connections - Cisco
This is what happens when large packets do not reach vEdge when you use tcpdump, for example on
the SD-WAN (vSmart) side:
Note: On Cisco IOS XE SD-WAN software, you can use Embedded Packet Capture (EPC) instead of
tcpdump.
You can use traceroute or nping utilities as well in order to generate traffic with different packet sizes and
Differentiated Services Code Point (DSCP) marks in order to check connectivity because your service
provider might have problems with the delivery of larger UDP packets, fragmented UDP packets
(especially UDP small fragments) or DSCP marked packet. Here is an example with nping when
connectivity is successful.
From vSmart:
vSmart# tools nping vpn 0 198.51.100.162 options "--udp -p 12406 -g 12846 --sour
Nping in VPN 0
Starting Nping 0.6.47 ( http://nmap.org/nping ) at 2019-05-17 23:28 UTC
SENT (0.0220s) UDP 172.18.10.130:12846 > 198.51.100.162:12406 ttl=64 id=16578 ip
SENT (1.0240s) UDP 172.18.10.130:12846 > 198.51.100.162:12406 ttl=64 id=16578 ip
https://www.cisco.com/c/en/us/support/docs/routers/sd-wan/214509-troubleshoot-control-connections.html 4/14
8/15/22, 1:36 AM Troubleshoot SD-WAN Control Connections - Cisco
vEdge# tcpdump vpn 0 interface ge0/1 options "-n host 203.0.113.147 and udp"
tcpdump -i ge0_1 -s 128 -n host 203.0.113.147 and udp in VPN 0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ge0_1, link-type EN10MB (Ethernet), capture size 128 bytes
18:29:43.492632 IP 203.0.113.147.12846 > 198.51.100.162.12406: UDP, length 555
18:29:44.494591 IP 203.0.113.147.12846 > 198.51.100.162.12406: UDP, length 555
And here is an example of unsuccessful connectivity with the traceroute command (that runs from
vShell) on vSmart:
vEdge does not receive packets sent from vSmart (only some other traffic or fragments):
vEdge# tcpdump vpn 0 interface ge0/1 options "-n host 203.0.113.147 and udp"
tcpdump -i ge0_1 -s 128 -n host 203.0.113.147 and udp in VPN 0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
https://www.cisco.com/c/en/us/support/docs/routers/sd-wan/214509-troubleshoot-control-connections.html 5/14
8/15/22, 1:36 AM Troubleshoot SD-WAN Control Connections - Cisco
p p p pp , p
listening on ge0_1, link-type EN10MB (Ethernet), capture size 128 bytes
18:16:30.232959 IP 203.0.113.147.12846 > 198.51.100.162.12386: UDP, length 65
18:16:30.232969 IP 203.0.113.147.12846 > 198.51.100.162.12386: UDP, length 25
18:16:33.399412 IP 203.0.113.147.12846 > 198.51.100.162.12386: UDP, length 16
18:16:34.225796 IP 198.51.100.162.12386 > 203.0.113.147.12846: UDP, length 140
18:16:38.406256 IP 203.0.113.147.12846 > 198.51.100.162.12386: UDP, length 16
18:16:43.413314 IP 203.0.113.147.12846 > 198.51.100.162.12386: UDP, length 16
PEER
PEER PEER PEER SITE DOMAIN PEER PRIVATE
TYPE PROTOCOL SYSTEM IP ID ID PRIVATE IP PORT
--------------------------------------------------------------------------------
vmanage dtls 192.168.30.101 1 0 192.168.20.101 12346
vsmart dtls 192.168.30.103 1 1 192.168.20.103 12346
vbond dtls 0.0.0.0 0 0 192.168.20.102 12346
PEER
PEER PEER PEER SITE DOMAIN PEER PRIV
TYPE PROTOCOL SYSTEM IP ID ID PRIVATE IP PORT
--------------------------------------------------------------------------------
vbond dtls - 0 0 203.0.113.109 1234
vbond dtls - 0 0 203.0.113.56 1234
https://www.cisco.com/c/en/us/support/docs/routers/sd-wan/214509-troubleshoot-control-connections.html 6/14
8/15/22, 1:36 AM Troubleshoot SD-WAN Control Connections - Cisco
If it does exist, check for duplicate entries in the valid-vEdge table and engage the Cisco Technical
Assistance Center (TAC) to troubleshoot this further
Look for the distance value and the protocol for the IP-Prefix.
vEdge tries to establish a control connection with no success or connections to controllers keep flapping.
Verify with the show control connections and/or the show control connections-history commands.
PEER
PEER PEER PEER SITE DOMAIN PEER PRIV
TYPE PROTOCOL SYSTEM IP ID ID PRIVATE IP PORT
--------------------------------------------------------------------------------
vbond dtls - 0 0 203.0.113.21 1234
As part of troubleshooting, ensure that you have the connectivity to the controller. Use Internet Control
Message Protocol (ICMP) and/or traceroute to the IP-address in question. Cases where there are lots of
traffic drops (loss is high). Rapid ping and ensure that it is good.
PEER
PEER PEER PEER SITE DOMAIN PEER PRIV
TYPE PROTOCOL SYSTEM IP ID ID PRIVATE IP PORT
--------------------------------------------------------------------------------
vmanage tls 10.0.1.3 3 0 10.0.2.42 234
In addition, check the show control connections-history detail command output in order to look at the
TX/RX control statistics to see if there is any significant discrepancy in the counters. Notice in the output
the difference between RX and TX hello packets numbers.
--------------------------------------------------------------------------------
LOCAL-COLOR- biz-internet SYSTEM-IP- 192.168.30.103 PEER-PERSONALITY- vsmart
--------------------------------------------------------------------------------
site-id 1
domain-id 1
protocol dtls
private-ip 192.168.20.103
private-port 12346
public-ip 192.168.20.103
public-port 12346
UUID/chassis-number 4fc4bf2c-f170-46ac-b217-16fb150fef1d
state tear_down [Local Err: ERR_DISABLE_TLOC] [Remote Err: NO_ERRO
downtime 2019-06-01T14:52:49+0200
repeat count 5
previous downtime 2019-06-01T14:43:11+0200
Tx Statistics-
--------------
hello 597
connects 0
registers 0
register-replies 0
challenge 0
challenge-response 1
challenge-ack 0
teardown 1
teardown-all 0
vmanage-to-peer 0
register-to-vmanage 0
Rx Statistics-
--------------
hello 553
connects 0
registers 0
register-replies 0
https://www.cisco.com/c/en/us/support/docs/routers/sd-wan/214509-troubleshoot-control-connections.html 8/14
8/15/22, 1:36 AM Troubleshoot SD-WAN Control Connections - Cisco
challenge 1
challenge-response 0
challenge-ack 1
teardown 0
vmanage-to-peer 0
register-to-vmanage 0
When you troubleshoot such a problem, ensure that the correct serial number and device model was
configured and provisioned on PnP portal (software.cisco.com) and vManage.
In order to check the chassis number and the certificate serial number, this command can be used on
vEdge routers:
On a router that runs Cisco IOS XE SD-WAN software, enter this command:
or this command:
Validity Date:
start date: 15:33:46 UTC Sep 27 2018
end date: 20:58:26 UTC Aug 9 2099
Associated Trustpoints: CISCO_IDEVID_SUDI
Here is how the error looks on vEdge/vSmart in the show control connections-history command output:
PEER
PEER PEER PEER SITE DOMAIN PEER PRIVATE
TYPE PROTOCOL SYSTEM IP ID ID PRIVATE IP PORT
--------------------------------------------------------------------------------
vbond dtls 0.0.0.0 0 0 192.168.0.231 12346
Also, the device serial number on vBond is not in the list of valid vEdges:
SERIAL
NUMBER ORG
-----------------------
0A SAMPLE - ORGNAME
0B SAMPLE - ORGNAME
0C SAMPLE - ORGNAME
0D SAMPLE - ORGNAME
On affected vSmart/vManage:
https://www.cisco.com/c/en/us/support/docs/routers/sd-wan/214509-troubleshoot-control-connections.html 10/14
8/15/22, 1:36 AM Troubleshoot SD-WAN Control Connections - Cisco
Also, you might see ORPTMO messages on the affected vSmart with regards to vEdge:
On vEdge affected vSmart, in the show control connections-history output the "SERNTPRES" error is
seen:
PEER
PEER PEER PEER SITE DOMAIN PEER PRIVATE
TYPE PROTOCOL SYSTEM IP ID ID PRIVATE IP PORT
--------------------------------------------------------------------------------
vsmart tls 10.10.10.229 1 1 192.168.0.229 23456
vsmart tls 10.10.10.229 1 1 192.168.0.229 23456
Wrong Chassis-Num/Unique-Id
Another example of the same error "CRTREJSER/NOERR" can be seen if the wrong Product ID (model) is
used on the PnP portal. For example:
However, the real device model is different (note that "DNA" postfix is not in the name):
https://www.cisco.com/c/en/us/support/docs/routers/sd-wan/214509-troubleshoot-control-connections.html 11/14
8/15/22, 1:36 AM Troubleshoot SD-WAN Control Connections - Cisco
PEER
PEER PEER PEER SITE DOMAIN PEER PRIVATE
TYPE PROTOCOL SYSTEM IP ID ID PRIVATE IP PORT
--------------------------------------------------------------------------------
vbond dtls - 0 0 203.0.113.197 12346
vbond dtls - 0 0 198.51.100.137 12346
PEER
PEER PEER PEER SITE DOMAIN PEER
INSTANCE TYPE PROTOCOL SYSTEM IP ID ID PRIVATE IP
--------------------------------------------------------------------------------
0 vbond dtls 0.0.0.0 0 0 192.168.0.231
1 vbond dtls 0.0.0.0 0 0 192.168.0.231
Likewise, on another vSmart in the same overlay, this is how it sees the vSmart whose certificate is
revoked:
PEER
PEER PEER PEER SITE DOMAIN PEER
INSTANCE TYPE PROTOCOL SYSTEM IP ID ID PRIVATE IP
--------------------------------------------------------------------------------
0 vsmart tls 10.10.10.229 1 1 192.168.0.229
Certification verification failure is when the certificate cannot be verified with the root certificate installed:
1. Check the time with the show clock command. It must be at least within vBond's certificate validity
range (check with the show orchestrator local-properties command).
2. This can be caused by root certificate corruption on vEdge.
Then show control connections-history command on the vEdge router might show similar output:
PEER
PEER PEER PEER SITE DOMAIN PEER PRIV
TYPE PROTOCOL SYSTEM IP ID ID PRIVATE IP PORT
--------------------------------------------------------------------------------
https://www.cisco.com/c/en/us/support/docs/routers/sd-wan/214509-troubleshoot-control-connections.html 12/14
8/15/22, 1:36 AM Troubleshoot SD-WAN Control Connections - Cisco
In this case, vEdge cannot validate the controller certificate as well. In order to fix this issue, you can
reinstall the root certificate chain. In case the Symantec Certificate Authority is used, you can copy the
Root certificate chain from the read-only filesystem:
vEdge1# vshell
vEdge1:~$ cp /rootfs.ro/usr/share/viptela/root-ca-sha1-sha2.crt /home/admin/
vEdge1:~$ exit
exit
vEdge1# request root-cert-chain install /home/admin/root-ca-sha1-sha2.crt
Uploading root-ca-cert-chain via VPN 0
Copying ... /home/admin/root-ca-sha1-sha2.crt via VPN 0
Installing the new root certificate chain
Successfully installed the root certificate chain
PEER
PEER PEER PEER SITE DOMAIN PEER PRIVATE
TYPE PROTOCOL SYSTEM IP ID ID PRIVATE IP PORT
--------------------------------------------------------------------------------
vmanage dtls 10.0.1.1 1 0 10.0.2.80 12546
PEER
PEER PEER PEER SITE DOMAIN PEER PRIVATE
TYPE PROTOCOL SYSTEM IP ID ID PRIVATE IP PORT
--------------------------------------------------------------------------------
vmanage dtls 10.0.0.1 1 0 198.51.100.92 12646
Revision History
https://www.cisco.com/c/en/us/support/docs/routers/sd-wan/214509-troubleshoot-control-connections.html 13/14
8/15/22, 1:36 AM Troubleshoot SD-WAN Control Connections - Cisco
Quick Links -
About Cisco
Contact Us
Careers
Help
Privacy Statement
Cookies
Trademarks
Sitemap
https://www.cisco.com/c/en/us/support/docs/routers/sd-wan/214509-troubleshoot-control-connections.html 14/14