0% found this document useful (0 votes)
240 views7 pages

A Quick Guide For PANOS Troubleshooting - v2

This document provides a quick guide to troubleshooting issues with PANOS firewalls. It describes the key daemons that run on the management plane (MP) and data plane (DP) of PANOS version 5.0. It outlines common symptoms that may occur like configuration commit failures, daemon crashes, lockups, and resource leaks. It provides examples of CLI commands that can help diagnose specific problems on the MP and DP.

Uploaded by

Arun Somashekar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
240 views7 pages

A Quick Guide For PANOS Troubleshooting - v2

This document provides a quick guide to troubleshooting issues with PANOS firewalls. It describes the key daemons that run on the management plane (MP) and data plane (DP) of PANOS version 5.0. It outlines common symptoms that may occur like configuration commit failures, daemon crashes, lockups, and resource leaks. It provides examples of CLI commands that can help diagnose specific problems on the MP and DP.

Uploaded by

Arun Somashekar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

A quick guide for PANOS troubleshooting

PANOS version 5.0


Sept 2013
Yonghui Cheng

1 MP quick walk through


In a nutshell, PANOS MP software is running on a Linux PC.

Daemons:

 masterd
Manage all other daemons. Use CLI “show system software status” to show all daemon status.
 sysd
manages inter-daemon communications
 mgmtsrvr
management backend, take care of configuration management, commit, reporting, etc.
 devsrvr
take care of pushing configuration to DP, plus misc. communication with DP, such as URL
filtering request response etc.
 useridd
userid features such as communicating with user-id agents, can also act as agent to other
firewalls
 authd
all user authentication, lock account etc.
 ha-agent
manage HA status, configuration sync etc.
 logrcvr
recording traffic and threat log send by DP, run compression on log blocks and generate index
on the blocks on the fly
 varrcvr
receive pcap send by DP, receive file send from DP and forward files to wildfire cloud
 l3svc
Serve web pages for captive-portal, NTLM auth, URL admin override page, URL block page.
 Websrvr
Serve web pages for admin UI.
 Sslvpn
Serve web pages for GlobalProtect feature.
 rasmgr
Backend logic for GlobalProtect feature.
 sslmgr
Fulfill OCSP and CRL query request by daemons and DP, manage OCSP and CRL repository.
 routed
routing daemon and dynamic routing state-machine
 cryptod
encrypt/decrypt password, private key etc. so we can include them as part of config file.
 ikemgr/keymgr
ISAKMP daemon and IPSec key repository management.

Cron jobs:

 log indexing, summary-gen


runs every 15 minutes
 Report-gen
Once per day, configurable
 Content/AV/URL update
Once per day, configurable

2 MP Typical Symptoms
When troubleshooting the system is necessary, we should start with identifying the main symptom.

2.1 Configuration commit failure


Configuration commit is a collective process involves almost all MP and DP daemons so it can fail due to
various reasons. It is however managed by the “mgmtsrvr” as “jobs”, and the job status can be shown
through both CLI and WebUI.

There are special types of commit, such as “auto-commit”, “HA-Sync” and “commit-all”, which are
triggered by special events such as data plane boot-up, HA peer commit and Panorama commit
correspondingly, and different actions are involved depend on the types.

Configuration commit involves config preprocessing, validation, phase-1 and phase-2. Other daemon
participate phase-1 and phase-2 as “management clients”. So in order to pin point the root cause of
commit failure, first step is to identify which step or party rejected/failed the commit.

CLI examples:
show job all
show job id <#>
show management-clients
2.2 MP daemon crash
To find out which daemon crashed, check for the backtrace files. Usually in this case gather tech support
tarball and in addition the core file is good enough. Knowing how to recreate the crash usually is the key
for further troubleshooting.

If the crash is associated with config change/commit, getting the “candidate” config is also important.

CLI examples:
show system files
scp export core-file control-plane from …

2.3 MP daemon not responsive


When this happens, it might catch attention due to time out when loading WebUI pages or executing CLI
commands. Further validation is required before declare a daemon is lost response. Sometimes the
daemon might keep printing out same debug log over and over.

If the daemon enters an infinite loop, then you should see constant high CPU usage, and maybe
accompanied with repeated log messages in debug log. If the daemon is in deadlock or waiting
indefinitely, then its CPU usage should not be high, and debug logging might appear as if stopped or
looping.

Since most daemons are multi-threaded, so it is possible that only part of the functionality is lost.

By taking and analyzing multiple back-trace of the daemon gathered in short intervals, it is possible to
tell where the code got stuck. Next step is generating a coredump file for that daemon for further
analysis.

To restore the functionality, try to restart the daemon manually.

CLI examples:
show system software status
show system resource follow
debug software trace <daemon>
debug software core <daemon>
debug software restart <daemon>

2.4 MP daemon memory leak


Memory leak could be noticed due to system slow down in general or unable to perform certain
operations. To check current memory usage, run the “show system resource follow” command and hit
“M” to sort by memory usage. Some fluctuation of memory usage is expected under normal usage, so it
is necessary to find out a baseline memory usage by check the log file “mp-monitor.log”.

There is no good way to troubleshoot memory leak in the field, so if there is no other issues need to
bring up attention, the only thing might help is get as much information as possible about the box’s past
activity. In addition, acquire a coredump file for the daemon in question is also helpful.
By specify limit on a daemon’s virtual memory usage size, admin can make system restart daemon that
is leaking memory.

CLI examples:
debug software virt-limit service <daemon name> limit <size_in_KB>

2.5 MP daemon resource leak (other than memory)


Other resources a daemon might leak include sockets, file descriptors etc. In this case, the most
important thing is to find out how to reproduce the leak.

2.6 MP lockup
If MP lost response such as does not responds to ping, or cannot login through serial console, is it
possible due to kernel issues. In this case have to ask customer to monitor serial console print out
messages.

2.7 Maintenance mode

3 DP quick walk-through
In a nutshell, PANOS DP software is running on a Linux PC with multiple CPU, plus various hardware
engines to offload/accelerate networking, security and content processing.

Daemons:

 Supervisor
Initializing DP engines and memory pools
 Sysdagent
Communicating with sysd on MP
 Brdagent
Config, manage and monitor peripheral chips
 comm/pan_comm
Communicate with devsrvr, participate in commit and other config change.
 dha/pan_dha
Implement link/path monitoring, implement status change on interface status etc.
 mprelay
Communicate with routed, keymgr etc, implement vpn and pbf monitoring
 pan_tasks
The packet forwarding daemons, runs on dedicated CPU cores.

Typical CPU core assignment

 Core 0:
Generic daemons other than pan_tasks
 Core 1:
flow_mgmt, the pan_task that dedicate on session management
 Core 2+:
Regular pan_tasks that can process network traffic

4 DP Typical Symptoms

4.1 DP daemon crash


Dataplane (DP) has two types of daemons, packet processing (pan_task) and other daemons. Packet
processing tasks each exclusively occupy CPU core 1 and beyond. CPU core 0 is reserved to run only
“other” type of processes and not packet processing tasks.

DP daemon can crash due to various reasons, and it can be determined by check existence of backtrace
files. However due to compiler optimization, the backtrace file usually does not contain sufficient
information to determine root cause, and due to use of shared memory, the coredump files might not
able to give us complete information as well. But it is still a good practice to collect the backtrace file and
coredump file (also pcap files if available).

Usually the crash is the result of particular traffic pattern combined with particular configuration, so if
possible, we can try to tweak configuration to prevent the crash from happen too frequently.

4.2 DP restart (data-plane down or lost heartbeat)


Dataplane can restart altogether due to various reason, such as repeated daemon crash can trigger the
monitoring software take escalated action and restart DP. So in such cases the issue should be treated as
DP daemon crash issue.

It is possible for DP to restart by itself due to severe memory corruption. In this case, the only clue might
be left is some error messages in the “dataplane-consoleoutput.log”.

Sometimes the DP might appear lost response from MP monitoring software perspective, in this case DP
will be rebooted automatically in order to recover from this status. This can be the result of real
hardware failure but more often is caused by either MP or DP being over loaded or some other bugs.

It is important to determine which side (DP vs MP) contains root cause of the issue, which might not be
easy to tell. Gathering techsupport tarball is the best bet to start the investigation.

4.3 DP resource leak or resource shortage


Dataplane packet, memory and buffer usage can be checked by following CLI commands. Resource
shortage can cause failure of corresponding operation or even malfunctions. Resource leak will
eventually cause permanent resource shortage.

CLI examples:
debug dataplane pool statistics
show system setting ctd state
show system setting ssl-decrypt memory

4.4 Network performance issue


In this case, there is no obvious bottleneck observed on the box, yet the network throughput or latency
is far below average. In this case it is usually due to packet loss or excessive latency.

There could be many reason for packet loss, so being able to find out what is causing it is the key. There
are many tools on the box can help us figure out this. The tools most relevant are packet-diag pcap and
global counters. Is it also important to validate that packet loss is introduced by the box, not by other
devices on the network. It is important to keep in mind that packet can be dropped by other network
device along the path, even the cable. Packet can also get forward incorrectly.

When QoS feature is enabled, it might introduce latency and packet drop as well.

4.5 DP performance issue


In this case there are clear signs of bottleneck on the box, such as DP CPU usage too high, hardware
engine is saturated, excessive queue length observed etc.

Some traffic patterns are known to cause high CPU usage, such as zip decompression, SSL decryption,
VPN, software based content scanning, DNS traffic scanning etc.

Generic steps to troubleshoot this type of issue usually start with “app-override” some traffic. This is
important step as it isolate session setup problem from layer-7 scanning problem. By reducing CPU load,
it can also provide insight about whether other problems are involved at the same time.

4.6 Recoverable/intermittent network issue


If customer is using script to monitor their network and reports intermittent packet drop problem, the
ideal solution is to ask them integrate some CLI commands in their script so that the failure status can be
captured. Relevant CLI commands are:

“show counter global” and “show counter global filter delta yes”

“show session info” and “show session all”

More sophisticated scripts can be developed to start/stop packet-diag debugging.

4.7 datapath HW component issues


Some platforms has specialized hardware chip to help accelerate packet parsing, flow cut-through
and/or special operations such as DFA/AHO pattern match.

Phy/Mac chips:

 PA-4060 use “Puma FPGA” as MAC chip, other PA-4000 use vitesse
 PA-500, PA-2000, PA-3000, PA-5000 use Marvel as Phy/MAC chip
 PA-5000 also use Petra as 10G Phy/Mac
 PA-200 use GMX interface on Octeon

NP (network processor) chip:

Main features of NP is to support basic packet ingress (parsing, logic interface matching and
classification), flow cut-through (flow match, packet header rewrite such as NAT, TTL) and packet
forwarding (route, ARP, MAC lookup).

 PA-4000 use “EZ chip”, which support traffic mangement


 PA-2000 use “Lion” FPGA
 PA-5000 use “Tiger” FPGA, and Petra support traffic management
 PA-3000 use “Liger” FPGA
 Other platforms use software for these tasks

Troubleshooting for specific chips usually requires specific knowledge for the platform and components
involved. However in most cases it is still possible to narrow down the issue to specific chip, or specific
component/engine of specific chip.

You might also like