A Quick Guide For PANOS Troubleshooting - v2
A Quick Guide For PANOS Troubleshooting - v2
Daemons:
masterd
Manage all other daemons. Use CLI “show system software status” to show all daemon status.
sysd
manages inter-daemon communications
mgmtsrvr
management backend, take care of configuration management, commit, reporting, etc.
devsrvr
take care of pushing configuration to DP, plus misc. communication with DP, such as URL
filtering request response etc.
useridd
userid features such as communicating with user-id agents, can also act as agent to other
firewalls
authd
all user authentication, lock account etc.
ha-agent
manage HA status, configuration sync etc.
logrcvr
recording traffic and threat log send by DP, run compression on log blocks and generate index
on the blocks on the fly
varrcvr
receive pcap send by DP, receive file send from DP and forward files to wildfire cloud
l3svc
Serve web pages for captive-portal, NTLM auth, URL admin override page, URL block page.
Websrvr
Serve web pages for admin UI.
Sslvpn
Serve web pages for GlobalProtect feature.
rasmgr
Backend logic for GlobalProtect feature.
sslmgr
Fulfill OCSP and CRL query request by daemons and DP, manage OCSP and CRL repository.
routed
routing daemon and dynamic routing state-machine
cryptod
encrypt/decrypt password, private key etc. so we can include them as part of config file.
ikemgr/keymgr
ISAKMP daemon and IPSec key repository management.
Cron jobs:
2 MP Typical Symptoms
When troubleshooting the system is necessary, we should start with identifying the main symptom.
There are special types of commit, such as “auto-commit”, “HA-Sync” and “commit-all”, which are
triggered by special events such as data plane boot-up, HA peer commit and Panorama commit
correspondingly, and different actions are involved depend on the types.
Configuration commit involves config preprocessing, validation, phase-1 and phase-2. Other daemon
participate phase-1 and phase-2 as “management clients”. So in order to pin point the root cause of
commit failure, first step is to identify which step or party rejected/failed the commit.
CLI examples:
show job all
show job id <#>
show management-clients
2.2 MP daemon crash
To find out which daemon crashed, check for the backtrace files. Usually in this case gather tech support
tarball and in addition the core file is good enough. Knowing how to recreate the crash usually is the key
for further troubleshooting.
If the crash is associated with config change/commit, getting the “candidate” config is also important.
CLI examples:
show system files
scp export core-file control-plane from …
If the daemon enters an infinite loop, then you should see constant high CPU usage, and maybe
accompanied with repeated log messages in debug log. If the daemon is in deadlock or waiting
indefinitely, then its CPU usage should not be high, and debug logging might appear as if stopped or
looping.
Since most daemons are multi-threaded, so it is possible that only part of the functionality is lost.
By taking and analyzing multiple back-trace of the daemon gathered in short intervals, it is possible to
tell where the code got stuck. Next step is generating a coredump file for that daemon for further
analysis.
CLI examples:
show system software status
show system resource follow
debug software trace <daemon>
debug software core <daemon>
debug software restart <daemon>
There is no good way to troubleshoot memory leak in the field, so if there is no other issues need to
bring up attention, the only thing might help is get as much information as possible about the box’s past
activity. In addition, acquire a coredump file for the daemon in question is also helpful.
By specify limit on a daemon’s virtual memory usage size, admin can make system restart daemon that
is leaking memory.
CLI examples:
debug software virt-limit service <daemon name> limit <size_in_KB>
2.6 MP lockup
If MP lost response such as does not responds to ping, or cannot login through serial console, is it
possible due to kernel issues. In this case have to ask customer to monitor serial console print out
messages.
3 DP quick walk-through
In a nutshell, PANOS DP software is running on a Linux PC with multiple CPU, plus various hardware
engines to offload/accelerate networking, security and content processing.
Daemons:
Supervisor
Initializing DP engines and memory pools
Sysdagent
Communicating with sysd on MP
Brdagent
Config, manage and monitor peripheral chips
comm/pan_comm
Communicate with devsrvr, participate in commit and other config change.
dha/pan_dha
Implement link/path monitoring, implement status change on interface status etc.
mprelay
Communicate with routed, keymgr etc, implement vpn and pbf monitoring
pan_tasks
The packet forwarding daemons, runs on dedicated CPU cores.
Core 0:
Generic daemons other than pan_tasks
Core 1:
flow_mgmt, the pan_task that dedicate on session management
Core 2+:
Regular pan_tasks that can process network traffic
4 DP Typical Symptoms
DP daemon can crash due to various reasons, and it can be determined by check existence of backtrace
files. However due to compiler optimization, the backtrace file usually does not contain sufficient
information to determine root cause, and due to use of shared memory, the coredump files might not
able to give us complete information as well. But it is still a good practice to collect the backtrace file and
coredump file (also pcap files if available).
Usually the crash is the result of particular traffic pattern combined with particular configuration, so if
possible, we can try to tweak configuration to prevent the crash from happen too frequently.
It is possible for DP to restart by itself due to severe memory corruption. In this case, the only clue might
be left is some error messages in the “dataplane-consoleoutput.log”.
Sometimes the DP might appear lost response from MP monitoring software perspective, in this case DP
will be rebooted automatically in order to recover from this status. This can be the result of real
hardware failure but more often is caused by either MP or DP being over loaded or some other bugs.
It is important to determine which side (DP vs MP) contains root cause of the issue, which might not be
easy to tell. Gathering techsupport tarball is the best bet to start the investigation.
CLI examples:
debug dataplane pool statistics
show system setting ctd state
show system setting ssl-decrypt memory
There could be many reason for packet loss, so being able to find out what is causing it is the key. There
are many tools on the box can help us figure out this. The tools most relevant are packet-diag pcap and
global counters. Is it also important to validate that packet loss is introduced by the box, not by other
devices on the network. It is important to keep in mind that packet can be dropped by other network
device along the path, even the cable. Packet can also get forward incorrectly.
When QoS feature is enabled, it might introduce latency and packet drop as well.
Some traffic patterns are known to cause high CPU usage, such as zip decompression, SSL decryption,
VPN, software based content scanning, DNS traffic scanning etc.
Generic steps to troubleshoot this type of issue usually start with “app-override” some traffic. This is
important step as it isolate session setup problem from layer-7 scanning problem. By reducing CPU load,
it can also provide insight about whether other problems are involved at the same time.
“show counter global” and “show counter global filter delta yes”
Phy/Mac chips:
PA-4060 use “Puma FPGA” as MAC chip, other PA-4000 use vitesse
PA-500, PA-2000, PA-3000, PA-5000 use Marvel as Phy/MAC chip
PA-5000 also use Petra as 10G Phy/Mac
PA-200 use GMX interface on Octeon
Main features of NP is to support basic packet ingress (parsing, logic interface matching and
classification), flow cut-through (flow match, packet header rewrite such as NAT, TTL) and packet
forwarding (route, ARP, MAC lookup).
Troubleshooting for specific chips usually requires specific knowledge for the platform and components
involved. However in most cases it is still possible to narrow down the issue to specific chip, or specific
component/engine of specific chip.