Troubleshooting NetScaler - Sample Chapter
Troubleshooting NetScaler - Sample Chapter
pl
e
P r o f e s s i o n a l
P U B L I S H I N G
Sa
Troubleshooting NetScaler
Troubleshooting
NetScaler
ee
E x p e r t i s e
D i s t i l l e d
Troubleshooting
NetScaler
Gain essential knowledge and keep your NetScaler environment
in top form
P U B L I S H I N G
10 years, with a good part of this time at Citrix working with NetScaler in various
support roles. As someone who thoroughly enjoys packet analysis, he finds that
NetScaler lends itself perfectly to troubleshooting if you just know where to look,
and he would like to share some of the techniques he's picked up over the years.
Preface
NetScaler is a high performance Application Delivery Controller (ADC). Making the
most of it requires knowledge that straddles the application and networking worlds.
As an ADC owner, you will also likely be the first person to be solicited when your
business applications fail. You will need to be quick in identifying whether the
problem is with the Application, the Server, the network, or NetScaler itself.
This book provides you with the vital troubleshooting knowledge needed to act fast
when issues happen. It gives you a thorough understanding of the NetScaler layout,
how it integrates with the network and what issues to expect when working with the
Traffic Management, Authentication, NetScaler Gateway and Application Firewall
features. We will also look at what information to seek out in the logs, how to
use tracing and explore utilities that exist on the NetScaler to help you root
cause your issues.
Preface
Preface
Preface
NetScaler Concepts
at a Glance
The first chapter in this book naturally is a review of concepts that are key to the rest
of the book. In this chapter, we will look at:
How the NetScaler file system is laid out and what folders we are likely to
often visit when troubleshooting
While the base layout will be familiar to anyone familiar with UNIX-based systems,
the files that we would look at when troubleshooting are custom to the NetScaler.
[1]
Start by using df. This is also a great way to see how you are doing in terms of
disk space:
The command df stands for diskfree, a Unix command to show disk usage statistics.
By using the ah option, we are asking for all the folders to be displayed in a human
readable format, with percentages, for easy comprehension.
Let's take a look at the two important ones here for troubleshooting: /flash and
/var.
/flash, as you've probably guessed, maps directly to the Flash drive/SSD
installed in the NetScaler. This is the most important partition on the NetScaler as it
contains the operating system along with the configuration, license, and essentially
everything needed to boot the NetScaler.
The /var, which is the largest of partition and equals the hard disk on the NetScaler,
contains: logs, crashes, traces, and other items that are to do with the maintenance
and monitoring of the NetScaler.
In the case of a VPX, which is a virtual appliance with no physical drives, these
folders become references to virtual partitions on the drive. Let's have a brief
look at the important subfolders among these.
Folders on /flash
/flash contains the following folders:
Each time you make a configuration change, it does get applied but doesn't
get committed to the disk. To commit changes you need to click on Save
config. Five such files, each resulting from a "save config", are saved in
the /nsconfig/ folder. So, you can get back to a last known good
configuration if you are in trouble after saving configuration changes.
[2]
Chapter 1
The best practice of course is to not leave it to chance and use well named
backup files. The current versions offer a handy way to do this: navigate
to System | Backup/Restore, choose a file name, and select either Basic
backup (configuration, location database) or Full (basic backup along with
certificates). You can then download the backup.
A copy of these backups is sent to the /var/ns_sys_backup/ folder.
The /nsconfig/ folder is also home to other configuration files, most notably
that of the routing engine ZebOS:
ns-root.* and ns-server.*: These files come by default; the nsroot.* files are used for signing, while the ns-server.* files are
[3]
Monitors provided as Perl files are used when creating a monitor of the
type USER. Going by the list in the following screenshot, you can guess
that these are usually monitors that provide application knowledge
beyond basic port or protocol response checks. In newer versions, the
home for these files is /netscaler/monitors/; it's when you upload
any with modifications that they are stored in /nsconfig/monitors.
Notice that all this while that we've been referring to this all important folder
as /nsconfig/ and not /flash/nsconfig/ - that's because /nsconfig/ is a
link to /flash/nsconfig/ and they represent the same folder.
Folders on /var
/var/log contains text based logs. Let's look at some of the important ones:
license.log: This is the log to look at if the licenses fail to apply, such as
when the hostID used during allocation is incorrect.
[4]
Chapter 1
/var/nslog: This folder contains the binary newnslog files. While the
ns.log files we just discussed are very easy to read by text and are our
/var/core: This contains any crashes related to the NetScaler software, and
you will almost always have them labeled in the format NSPPE-0x-xxxx
where the NSPPE stands for the NetScaler packet engine, the first x for the
packet engine number, and the rest for the PID; recall what we said about the
Packet Engines running as User processes, therefore they will have PIDs.
/var/crash: This is where any core dumps by the Kernel will go.
NetScaler
Admin
NSIP
Internet
MEP
Application
Users
MIP/SNIP
VIP
Servers
GSLB
Site IP
NetScaler
GSLB
Site IP
NSIP
MIP/SNIP
VIP
Servers
NetScaler
IP addresses used in a regular NetScaler deployment.
[5]
NetScaler IP
NetScaler IP (NSIP) is the Management IP address, unique to each unit. The
following are some of the features:
Virtual IP
Virtual IP (VIP) is the IP that users land on and is usually added as part of
configuring a feature.
Mapped IP
Mapped IP (MIP) is an IP that the NetScaler can also use to talk to the Server. Its
features are as follows:
You can add as many MIPs as you like but only if they are from the same
subnet as the NSIP.
MIP only exists these days for legacy reasons; everything you can do with a
MIP you can do with a SNIP. So follow on to the next.
Subnet IP
Subnet IP (SNIP) is the defacto IP for NetScaler to Server communication. This IP is
everything the MIP is, but without the limitation of having to be in the same subnet
as the NSIP.
[6]
Chapter 1
As a bonus, adding a SNIP will also add a direct route on the NetScaler to facilitate
communication with the Servers. Check out the illustration with a routing table,
as follows:
Here, 192.168.1.150 is the NSIP that evidently sits in a different subnet from
172.16.1.151, which is the SNIP. In this case, the NetScaler will add a direct
route to 172.16.1.0 with itself as the gateway.
You can, also use the SNIP to manage the NetScaler (among other IPs) by enabling
management access. This especially helps in the HA environment by ensuring you
always arrive at the primary when logging in to make any changes:
> set nsip 172.16.1.151 -mgmtAccess ENABLED
GSLB Site IP
A GSLB Site IP (GSLBsiteIP), in general terms is a Data Center. This IP only comes
into play if you use GSLB.
This exists to enable communication between different sites allowing them to
exchange operational information via a customer protocol called Metric Exchange
Protocol (MEP).
You would also use the mgmtAccess command with the
GSLBsiteIP for one specific use case. Thus enabling the GSLB
configuration to be synchronized between sites. Failure of the GSLB
config sync functionality has very often come down to just this.
[7]
[8]
Chapter 1
NetScaler
Without Connection Multiplexing
NetScaler
With Connection Multiplexing
NetScaler
With Request Switching
Server
Server
Server
Server
[9]
GUI
This is the easiest of the lot to use, and comprehensive. Its benefits also include the
ability to more easily spot DOWN entities such as services/VIPs. You can also navigate
to System | Diagnostics | Command Line Interface to invoke the CLI/shell, though I
would personally prefer the ease and speed of an SSH client if that access is needed.
The ability to view reports is huge when you are looking at performance issues.
Apart from the standard port 80 or 443 for SSL, you also need Java ports 3008/3010.
11.0 is now fully HTML5 and thus no longer needs Java.
CLI
Administrators coming from a Unix background might prefer CLI. This provides
you an easy means to access the shell, which we use a lot for troubleshooting.
Console
It is highly recommended that you have this access when you are making
network-related changes to the NetScaler; many damage control operations have
been possible when all network access to the unit was lost following a change,
purely because console access was available.
Console access is also handy when recovering from a corrupt kernel or changing a
lost password. Another way of accessing the console on some physical NetScalers is
via the Lights Out Management (LOM) Interface. This is a dedicated module on the
NetScaler that has its own network and SSL settings that you can use the NetScaler CLI
to revert any recent changes you have made, or even remotely reboot the NetScaler.
Shell
Accessed from the CLI, shell commands are the preferred way to Grep log outputs
as well as to look at counters, that is, you would spend a lot of your troubleshooting
time using the shell.
Another use case (though not often) is when you need to manipulate files such as the
rc.netscaler or the nsbefore.sh/nsafter.sh files.
[ 10 ]
Chapter 1
Of course, shell access is mighty, so you might want to restrict who you provide
access to using command policies.
Nitro
Nitro is a move away from the original APIs that the older releases supported
and has the inherent benefit of being lightweight and fast, and as with any API, it
allows you to manage the NetScaler programmatically. It's a great way to automate
configuration.
Here's an excellent text taken from the docs that describes it better. Source:
http://support.citrix.com/proddocs/topic/netscaler-main-api-10-map/
ns-nitro-wrapper-con.html:
SFTP
Finally, you can use SFTP, which is based on SSH, for the purposes of browsing
through the file system and copying in and out files. My favorite SFTP client is
WinSCP, which is free and has a very easy-to-use interface.
NetScaler modes
Let's take a look at the various nodes that the NetScaler operates in. First, we'll look
at two different ways in which the NetScaler behavior is influenced, based on how
your Virtual IPs are configured.
[ 11 ]
[ 12 ]
Chapter 1
Fast Ramp
Fast Ramp is a performance friend. Traditional (read RFC-based) TCP follows a very
conservative approach to increasing window sizes; while this made perfect sense
in the days of unreliable pipes, it stems the TCP connection from quickly reaching
its top speed. Especially in the context of the NetScaler, which will sit closer to or at
least have very solid connections to the Server, Fast Ramp works great and is one of
those features that rarely has to be touched.
Edge Configuration
Even though it's enabled by default, the Edge Configuration mode only impacts very
specific use cases. Notably, Link Load Balancing and Cache Redirection. It's called
edge mode because it's sitting literally at the edge of the network learning services
that are not even part of your infrastructure, purely with the goal of load balancing.
There are two desired behaviors for such deployments:
To turn off binary performance logging for such services, thereby increasing
performance and at the same time reducing the impact on log size
Remember though, this is only when cache redirection or link load balancing are in
use, not system wide.
Using Subnet IP
As we discussed in the IP review section, SNIPs are the recommended way to
configure IPs for the purpose of NetScaler to Server conversations. This mode,
abbreviated as USNIP, simply enables your SNIPs to be used.
[ 13 ]
"Each device can retrieve and program the hardware and software tables of the
other (for example, the forwarding tables, routing tables, and access control lists
[ACLs])."
[ 14 ]
Chapter 1
The two RISE modes represent two of the fundamental use cases of this integration:
Let's now get to the modes that we are most concerned with while troubleshooting:
Layer 2 Mode: As you can tell from the name, this turns the NetScaler into
a switch, forwarding any packets that are not meant for its MAC addresses.
So yes, this does induce a very real risk of a loop if enabled without proper
network evaluation. This is why it's turned off by default. Luckily, most
deployments do not require this option (a couple of such exceptions are the
AppFw transparent mode and CloudBridge Connector).
Bridge BPDU: It's important to first note, that the NetScaler doesn't
participate in understanding Spanning Tree Protocol. By default, it drops
BPDUs, and this is perfectly okay for most deployments because the L2 mode
is disabled by default. The best practice in fact is to not have STP enabled
at all on any of the switch ports that the NetScaler (with L2 mode off) is
plugged into, so that the instances come on instantly without cycling through
the intermediate states. If you, however, are enabling the L2 mode, consider
bridging BPDUs so that the switches can detect loops and turn off redundant
interfaces if they need to.
Use Source IP (USIP): When enabled, NetScaler preserves the original client
IP as visible to it while forwarding traffic to the Servers. As simple as that
is, there are network implications to consider in order to avoid dropped
packets. When USIP is enabled, the Server can see the Clients IP address,
and unless it's set to route traffic back to the NetScaler, might attempt
to talk directly to the client. This, of course, will be rejected by the client.
To get around this, you will need to either set the NetScaler as the default
gateway for the Servers, route traffic back to the NetScaler using PBR,
set up a non-ARPing loopback address, or alternatively use NAT for the
reverse traffic.
[ 15 ]
If it's purely for Client IP logging purposes that you are turning on USIP,
consider Client IP Insertion or Web logging instead. The latter is especially
designed for high performance logging. Another point to bear in mind while
enabling USIP is that it reduces the reusability of a connection on the Server
side. Why is that? Because when the NetScaler tries to look for a connection
in its reuse pool, it looks for something that matches among other things,
the source IP. Whereas, by default, you have a lot of matches given the SNIP
remains somewhat a constant; with USIP, this gets chopped up into several
small pools of connections.
A common question is what happens if both USNIP (which we discussed
earlier) and USIP are enabled? USIP always overrides USNIP. Also, USIP can
be enabled either at the Global level or at the service level. The service level
setting takes precedence over the Global level setting.
[ 16 ]
Chapter 1
Summary
Let's quickly recap what we've covered in this opening chapter. We have looked
at the NetScaler folder structure and the files and folders that are most interesting
from a troubleshooting perspective. Key to note was the /var/log/ns.log file,
which when used with a tail -f, spews out a lot of useful information while an
issue is being reproduced.
The different IP addresses that you, as an Administrator, assign to the NetScaler and
their purpose. We looked at Request Switching and Connection Multiplexing, which
gives the NetScaler its high performance. We looked at the different ways to interact
with the NetScaler and what each UI is best for.
We looked at the various modes of operation that the NetScaler functions in and
those that can be configured. Among the modes that are configurable, we noted
that there are very important considerations, especially for those features that are
disabled by default. As we move to the next chapter, we will look at troubleshooting
areas in the features that form the bulk of NetScaler deployments.
[ 17 ]
www.PacktPub.com
Stay Connected: