summaryrefslogtreecommitdiff
path: root/check_postgres.pl.html
diff options
context:
space:
mode:
authorGreg Sabino Mullane2008-04-05 21:13:15 +0000
committerGreg Sabino Mullane2008-04-05 21:13:15 +0000
commit94d5524f83c2d390f2ece40f984bd902d9cecb84 (patch)
treef63c920683c42ee28bb06184ada2f4b5a9ac9ef9 /check_postgres.pl.html
parent4d26767f11065d6270dcdbdd985a5056286c8924 (diff)
Update documentation file.
Diffstat (limited to 'check_postgres.pl.html')
-rw-r--r--check_postgres.pl.html879
1 files changed, 879 insertions, 0 deletions
diff --git a/check_postgres.pl.html b/check_postgres.pl.html
new file mode 100644
index 000000000..6b7cc9e1c
--- /dev/null
+++ b/check_postgres.pl.html
@@ -0,0 +1,879 @@
+<?xml version="1.0" ?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+<head>
+<title>check_postgres.pl - Postgres monitoring script for Nagios</title>
+<meta http-equiv="content-type" content="text/html; charset=utf-8" />
+</head>
+
+<body style="background-color: white">
+
+
+<!-- INDEX BEGIN -->
+<div name="index">
+<p><a name="__index__"></a></p>
+
+<ul>
+
+ <li><a href="#name">NAME</a></li>
+ <li><a href="#version">VERSION</a></li>
+ <li><a href="#synopsis">SYNOPSIS</a></li>
+ <li><a href="#website">WEBSITE</a></li>
+ <li><a href="#description">DESCRIPTION</a></li>
+ <li><a href="#database_connection_options">DATABASE CONNECTION OPTIONS</a></li>
+ <li><a href="#other_options">OTHER OPTIONS</a></li>
+ <li><a href="#actions">ACTIONS</a></li>
+ <li><a href="#inclusion_and_exclusion">INCLUSION AND EXCLUSION</a></li>
+ <li><a href="#test_mode">TEST MODE</a></li>
+ <li><a href="#dependencies">DEPENDENCIES</a></li>
+ <li><a href="#development">DEVELOPMENT</a></li>
+ <li><a href="#history">HISTORY</a></li>
+ <li><a href="#bugs_and_limitations">BUGS AND LIMITATIONS</a></li>
+ <li><a href="#author">AUTHOR</a></li>
+ <li><a href="#license_and_copyright">LICENSE AND COPYRIGHT</a></li>
+</ul>
+
+<hr name="index" />
+</div>
+<!-- INDEX END -->
+
+<p>
+</p>
+<hr />
+<h1><a name="name">NAME</a></h1>
+<p>check_postgres.pl - Postgres monitoring script for Nagios</p>
+<p>
+</p>
+<hr />
+<h1><a name="version">VERSION</a></h1>
+<p>This documents describes check_postgres.pl version 1.4.2</p>
+<p>
+</p>
+<hr />
+<h1><a name="synopsis">SYNOPSIS</a></h1>
+<pre>
+ ## Create all symlinks
+ check_postgres.pl --action=build_symlinks</pre>
+<pre>
+ ## Check connection to Postgres database 'pluto':
+ check_postgres.pl --action=connection --db=pluto</pre>
+<pre>
+ ## Same things, but using the symlink
+ check_postgres_connection --db=pluto</pre>
+<pre>
+ ## Warn if &gt; 100 locks, critical if &gt; 200, or &gt; 20 exclusive
+ check_postgres_locks --warning=100 --critical=&quot;total=200;exclusive=20&quot;</pre>
+<pre>
+ ## There are many other actions and options, please keep reading.</pre>
+<p>
+</p>
+<hr />
+<h1><a name="website">WEBSITE</a></h1>
+<p>The latest news and documentation can always be found at:</p>
+<p><a href="http://bucardo.org/nagios_postgres/">http://bucardo.org/nagios_postgres/</a></p>
+<p>
+</p>
+<hr />
+<h1><a name="description">DESCRIPTION</a></h1>
+<p>check_postgres.pl is a Perl script that runs many different tests against
+one or more Postgres databases. It uses the psql program to gather the
+information, and returns one of four exit codes used by Nagios, as well
+as a short description of the results. The exit codes are:</p>
+<ol>
+<li><strong><a name="ok" class="item">(OK)</a></strong>
+
+<li><strong><a name="warning" class="item">(WARNING)</a></strong>
+
+<li><strong><a name="critical" class="item">(CRITICAL)</a></strong>
+
+<li><strong><a name="unknown" class="item">(UNKNOWN)</a></strong>
+
+</ol>
+<p>
+</p>
+<hr />
+<h1><a name="database_connection_options">DATABASE CONNECTION OPTIONS</a></h1>
+<p>Almost all actions accept a common set of options, most dealing with connecting to the databases.</p>
+<dl>
+<dt><strong><a name="h_name_or_host_name" class="item"><strong>-H NAME</strong> or <strong>--host=NAME</strong></a></strong>
+
+<dd>
+<p>Connect to the host indicated by NAME. Can be a comma-separated list of names. Multiple host arguments
+are allowed. If no host is given, defaults to a local Unix socket.</p>
+</dd>
+</li>
+<dt><strong><a name="p_port_or_port_port" class="item"><strong>-p PORT</strong> or <strong>--port=PORT</strong></a></strong>
+
+<dd>
+<p>Connects using the specified PORT number. Can be a comma-separated list of port numbers, and multiple
+port arguments are allowed. If no port number is given, we default to port 5432.</p>
+</dd>
+</li>
+<dt><strong><a name="db_name_or_dbname_name" class="item"><strong>-db NAME</strong> or <strong>--dbname=NAME</strong></a></strong>
+
+<dd>
+<p>Specifies which database to connect to. Can be a comma-separated list of names, and multiple dbname
+arguments are allowed. If no dbname option is provided, defaults to 'postgres' if the psql
+version is version 8 or greater, and 'template1' otherwise.</p>
+</dd>
+</li>
+<dt><strong><a name="u_username_or_dbuser_username" class="item"><strong>-u USERNAME</strong> or <strong>--dbuser=USERNAME</strong></a></strong>
+
+<dd>
+<p>The name of the database user to connect as. Can be a comma-separated list of usernames, and multiple
+dbuser arguments are allowed. If this is not provided, defaults to 'postgres'.</p>
+</dd>
+</li>
+<dt><strong><a name="dbpass_password" class="item"><strong>--dbpass=PASSWORD</strong></a></strong>
+
+<dd>
+<p>Provides the password to connect to the database with. Use of this option is highly discouraged.
+Instead, one should use a .pgpass file.</p>
+</dd>
+</li>
+</dl>
+<p>Connection options can be grouped: --host=a,b --host=c --port=1234 --port=3344
+would connect to a-1234, b-1234, and c-3344. Note that once set, an option
+carries over until it is changed again.</p>
+<p>Examples:</p>
+<pre>
+ --host=a,b --port=5433 --db=c
+ Connects twice to port 5433, using database c, to hosts a and b
+ a-5433-c b-5433-c</pre>
+<pre>
+ --host=a,b --port=5433 --db=c,d
+ Connects four times: a-5433-c a-5433-d b-5433-c b-5433-d</pre>
+<pre>
+ --host=a,b --host=foo --port=1234 --port=5433 --db=e,f
+ Connects six times: a-1234-e a-1234-f b-1234-e b-1234-f foo-5433-e foo-5433-f</pre>
+<pre>
+ --host=a,b --host=x --port=5432,5433 --dbuser=alice --dbuser=bob -db=baz
+ Connects three times: a-5432-alice-baz b-5433-alice-baz x-5433-bob-baz</pre>
+<p>
+</p>
+<hr />
+<h1><a name="other_options">OTHER OPTIONS</a></h1>
+<p>Other common options include:</p>
+<dl>
+<dt><strong><a name="psql_path" class="item"><strong>PSQL=PATH</strong></a></strong>
+
+<dd>
+<p>Tells the script where to find the psql program. Useful if you have more than one version of the psql executable
+around, or if it is not in your path. Note that this option is in all uppercase. By default, this option is
+<em>not allowed</em>. To enable it, you must change the <code>$NO_PSQL_OPTION</code> near the top of the script to 0. Avoid using
+this option if you can, and instead hard-code your psql location into the <code>$PSQL</code> variable, also near the top
+of the script.</p>
+</dd>
+</li>
+<dt><strong><a name="t_val_or_timeout_val" class="item"><strong>-t VAL</strong> or <strong>--timeout=VAL</strong></a></strong>
+
+<dd>
+<p>Sets the timeout in seconds after which the script will abort whatever it is doing and return an UNKNOWN
+status. The timeout is per Postgres cluster, not for the entire script. The default value is 10; the units
+are always in seconds.</p>
+</dd>
+</li>
+<dt><strong><a name="h_or_help" class="item"><strong>-h</strong> or <strong>--help</strong></a></strong>
+
+<dd>
+<p>Displays a help screen with a summary of all actions and options.</p>
+</dd>
+</li>
+<dt><strong><a name="v_or_version" class="item"><strong>-V</strong> or <strong>--version</strong></a></strong>
+
+<dd>
+<p>Shows the current version.</p>
+</dd>
+</li>
+<dt><strong><a name="v_or_verbose" class="item"><strong>-v</strong> or <strong>--verbose</strong></a></strong>
+
+<dd>
+<p>Set the verbosity level. Can call more than once to boost the level. Setting it to three or higher (in other words,
+issuing <code>-v -v -v</code>) turns on debugging information for this program which is sent to stderr.</p>
+</dd>
+</li>
+<dt><strong><a name="test" class="item"><strong>--test</strong></a></strong>
+
+<dd>
+<p>Enables test mode. See the <a href="#test_mode">TEST MODE</a> section below.</p>
+</dd>
+</li>
+<dt><strong><a name="showperf_val" class="item"><strong>--showperf=VAL</strong></a></strong>
+
+<dd>
+<p>Determines if we output performance data in standard Nagios format (at end of string, after a pipe symbol, using
+name=value). VAL should be 0 or 1. The default is 1.</p>
+</dd>
+</li>
+<dt><strong><a name="perflimit_i" class="item"><strong>--perflimit=i</strong></a></strong>
+
+<dd>
+<p>Sets a limit s to how many items of interest are reported back when using the <strong>showperf</strong> option. This only has
+an effect for actions that return a large number of items, such as <strong>table_size</strong>. The default is 0, or no limit.
+Be careful when using this with --include or --exclude, as those restrictions are done after the query has
+been run, and thus your limit may not include the items you want.</p>
+</dd>
+</li>
+<dt><strong><a name="showtime_val" class="item"><strong>--showtime=VAL</strong></a></strong>
+
+<dd>
+<p>Determines if the time taken to run each query is shown in the output. VAL should be 0 or 1. The default is 1.
+No effect unless showperf is on.</p>
+</dd>
+</li>
+<dt><strong><a name="action_name" class="item"><strong>--action=NAME</strong></a></strong>
+
+<dd>
+<p>States what action we are running as. Required unless using a symlinked file, in which case the name of the file
+is used to figure out the action.</p>
+</dd>
+</li>
+</dl>
+<p>
+</p>
+<hr />
+<h1><a name="actions">ACTIONS</a></h1>
+<p>The script runs one or more actions. This can either be done with the --action
+flag, or by using a symlink to the main file that contains the name of the action
+inside of it. For example, to run the action &quot;timesync&quot;, you may either issue:</p>
+<pre>
+ check_postgres.pl --action=timesync</pre>
+<p>or use a program named:</p>
+<pre>
+ check_postgres_timesync</pre>
+<p>All the symlinks are created for you if use the action &quot;build_symlinks&quot;:</p>
+<pre>
+ perl check_postgres.pl --action=&quot;build_symlinks&quot;</pre>
+<p>If the file name already exists, it will not be overwritten. If the file exists
+and is a symlink, you can force it to overwrite by using &quot;build_symlinks_force&quot;</p>
+<p>Most actions take a --warning and an -critical option, indicating at what point we change from OK to WARNING
+and then to CRITICAL. Note that because criticals are always checked first, setting the warning equal to the
+critical is an effective way to turn warnings off and always give a critical.</p>
+<p>The current supported actions are:</p>
+<dl>
+<dt><strong><a name="backends" class="item"><strong>backends</strong> (symlink: <code>check_postgres_backends</code>)</a></strong>
+
+<dd>
+<p>Checks the current number of connections for one or more databases, and optionally comparing it to the maximum
+allowed, which is determined by the 'max_connections' setting. The warning and option can take one of three forms.
+First, a simple number can be given, which represents the number of connections at which the alert will be given.
+This choice does not use the max_connections setting. Second, the percentage of available connections can be given.
+Third, a negative number can be given which represents the number of connections left until max_connections is
+reached. The default values for warning and critical are '90%' and '95%'. This action also supports the use of the
+include and exclude options to filter out specific databases: see the INCLUDES section below for more detail.</p>
+</dd>
+<dd>
+<p>Example 1: Give a warning when the number of connections on host quirm reaches 120, and a critical if it reaches 140.
+ check_postgres_backends --host=quirm --warning=120 --critical=150</p>
+</dd>
+<dd>
+<p>Example 2: Give a critical when we reach 75% of our max_connections setting on hosts lancre or lancre2.
+ check_postgres_backends --warning='75%' --critical='75%' --host=lancre,lancre2</p>
+</dd>
+<dd>
+<p>Example 3: Give a warning when there are only 10 more connection slots left on host plasmid, and a critical
+when we have only 5 left.
+ check_postgres_backends --warning=-10 --critical=-5 --host=plasmid</p>
+</dd>
+<dd>
+<p>Example 4: Check all databases except those with &quot;test&quot; in their name, but allow ones that are named &quot;pg_greatest&quot;. Connect as port 5432 on the first two hosts, and as port 5433 on the third one. We want to always throw a critical when we reach 30 or more connections.</p>
+</dd>
+<dd>
+<pre>
+ check_postgres_backends --dbhost=hong,kong --dbhost=fooey --dbport=5432 --dbport=5433 --warning=30 --critical=30 --exclude=&quot;~test&quot; --include=&quot;pg_greatest,~prod&quot;</pre>
+</dd>
+</li>
+<dt><strong><a name="bloat" class="item"><strong>bloat</strong> (symlink: <code>check_postgres_bloat</code>)</a></strong>
+
+<dd>
+<p>Checks the amount of bloat in tables and indexes. This action requires that stats collection be enabled on the
+target databases, and that ANALYZE is run frequently as well. The --include and --exclude options can be used to
+filter out which tables to look at: see the INCLUDE section below for more details. The --warning and --critical
+option must be specified in sizes. Valid units are bytes, kilobytes, megabytes, gigabytes, terabytes, and exabytes.
+You can abbreviate all of those with the first letter. Items without units are assumed to be 'bytes'. The default values
+are '1 GB' and '5 GB'. The number represents the number of &quot;wasted bytes&quot;, or the difference between what is actually
+used by the table and index, and what we compute it should be.</p>
+</dd>
+<dd>
+<p>Note that this action has two hard-coded values to avoid false alarms on smaller relations. Tables must have at
+least 10 pages, and indexes at least 15, before they can be considered by this test. If you really want to adjust
+these values, you can look for the variables $MINPAGES and $MINIPAGES at the top of the check_bloat subroutine.</p>
+</dd>
+<dd>
+<p>Please note that the values computed by this action are not precise, and should be used as a guideline only. Great
+effort was made to estimate the correct size of a table, but in the end it is only an estimate. The correct index size is
+much more of a guess than the correct table size, but both should give a rough idea of how bloated they are.</p>
+</dd>
+<dd>
+<p>Example 1: Warn if any table on port 5432 is over 100 MB bloated, and critical if over 200 MB
+ check_postgres_bloat --port=5432 --warning='100 M', --critical='200 M'</p>
+</dd>
+<dd>
+<p>Example 2: Give a critical if table 'orders' on host 'sami' has more than 10 megs of bloat
+ check_postgres_bloat --host=sami --include=orders --critical='10 MB'</p>
+</dd>
+</li>
+<dt><strong><a name="connection" class="item"><strong>connection</strong> (symlink: check_postgres_connection)</a></strong>
+
+<dd>
+<p>Simply connects, issues a 'SELECT version()', and leaves.
+Takes no --warning or --critical options.</p>
+</dd>
+</li>
+<dt><strong><a name="database_size" class="item"><strong>database_size</strong> (symlink: <code>check_postgres_database_size</code>)</a></strong>
+
+<dd>
+<p>Checks the size of all databases and complains when they are too big. Makes no sense to run this more than once
+per cluster. Databases can be filtered with the --include and --exclude options: See the INCLUDE section below for more
+detail. The warning and critical can be specified as bytes, kilobytes, megabytes, gigabytes, terabytes, or exabytes.
+Each may be abbreviated to the first letter as well. If no unit is given, the unit is assumed to be bytes.
+There are not defaults for this action: the warning and critical must be specified. The warning cannot be greater than
+the critical. The output returns all databases sorted by size largest first, with both bytes and a &quot;pretty&quot; form
+returned.</p>
+</dd>
+<dd>
+<p>Example 1: Warn if any database on host flagg is over 1 TB in size, and critical if over 1.1 TB.
+ check_postgres_database_size --host=flagg --warning='1 TB' --critical='1.1 t'</p>
+</dd>
+<dd>
+<p>Example 2: Give a critical if the database template1 on port 5432 is over 10 MB.
+ check_postgres_database_size --port=5432 --include=template1 --warning='10MB' --critical='10MB'</p>
+</dd>
+</li>
+<dt><strong><a name="disk_space" class="item"><strong>disk_space</strong> (symlink: <code>check_postgres_disk_space</code>)</a></strong>
+
+<dd>
+<p>Checks on the available physical disk space used by Postgres. This action requires that you have the executable &quot;/bin/df&quot;
+available to report on disk sizes, and it requires that it be run as a superuser, so it can examine the 'data_directory'
+setting inside of Postgres. The --warning and --critical options are given in either sizes or percentages. If using sizes,
+the standard unit types are allowed: bytes, kilobytes, gigabytes, megabytes, gigabytes, terabytes, or exabytes. Each
+may be abbreviated to the first letter only; no units at all indicates 'bytes'. The default values are '90%' and '95%'.</p>
+</dd>
+<dd>
+<p>This command checks the following things to determine all of the different physical disks being used by Postgres.</p>
+</dd>
+<dl>
+<dt><strong><a name="data_directory" class="item"><strong>data_directory</strong></a></strong>
+
+<dd>
+<p>The disk that the main data directory is on.</p>
+</dd>
+</li>
+<dt><strong><a name="log_directory" class="item"><strong>log directory</strong></a></strong>
+
+<dd>
+<p>The disk that the log files are on.</p>
+</dd>
+</li>
+<dt><strong><a name="wal_file_directory" class="item"><strong>WAL file directory</strong></a></strong>
+
+<dd>
+<p>The disk that the write-ahead logs are on (e.g. symlinked pg_xlog)</p>
+</dd>
+</li>
+<dt><strong><a name="tablespaces" class="item"><strong>tablespaces</strong></a></strong>
+
+<dd>
+<p>Each tablespace that is on a separate disk</p>
+</dd>
+</li>
+</dl>
+<p>The output shows the total size used and available on each disk, as well as the percentage, ordered by highest to lowest
+percentage used. Each item above maps to a file system: these can be included or excluded: see the INCLUDE section below
+for more information on the --include and --exclude options.</p>
+<p>Example 1: Make sure that no file system is over 90% for the database on port 5432.
+ check_postgres_disk_space --port=5432 --warning='90%' --critical=&quot;90%'</p>
+<p>Example 2: Check that all file systems starting with /dev/sda are smaller than 10 GB and 11 GB (warning and critical)
+ check_postgres_disk_space --port=5432 --warning='10 GB' --critical='11 GB' --include=~^/dev/sda</p>
+<dt><strong><a name="index_size" class="item"><strong>index_size</strong> (symlink: <code>check_postgres_index_size</code>)</a></strong>
+
+<dt><strong><a name="table_size" class="item"><strong>table_size</strong> (symlink: <code>check_postgres_table_size</code>)</a></strong>
+
+<dt><strong><a name="relation_size" class="item"><strong>relation_size</strong> (symlink: <code>check_postgres_relation_size</code>)</a></strong>
+
+<dd>
+<p>The actions table_size and index_size are simply variations of the relation_size index, which checks for a relation
+that has grown too big. Relations (in other words, tables and indexes) can be filtered with the --include and
+--exclude options: See the INCLUDE section below for more detail. The warning and critical are given in file sizes, and
+can have units of bytes, kilobytes, megabytes, gigabytes, terabytes, or exabytes. Each can be abbreviated to the
+first letter, only. If no units are given, bytes is assumed. There are no default values: both warning and critical
+must be given. The return text shows the size of the largest relation found.</p>
+</dd>
+<dd>
+<p>If the <strong>showperf</strong> option is enabled, <em>all</em> of the relations with their sizes will be given. To prevent this, is
+is recommended that you set the <strong>perflimit</strong>, which will cause the query to do a <code>ORDER BY size DESC LIMIT (perflimit)</code>.</p>
+</dd>
+<dd>
+<p>Example 1: Give a critical if any table is larger than 600MB on host burrick.
+ check_postgres_table_size --critical='600 MB' --warning='600 MB' --host=burrick</p>
+</dd>
+<dd>
+<p>Example 2: Warn if the table products is over 4 GB in size, and give a critical at 4.5 GB.
+ check_postgres_table_size --host=burrick --warning='4 GB' --critical='4.5 GB' --include=products</p>
+</dd>
+</li>
+<dt><strong><a name="last_analyze" class="item"><strong>last_analyze</strong> (symlink: <code>check_postgres_last_analyze</code>)</a></strong>
+
+<dt><strong><a name="last_vacuum" class="item"><strong>last_vacuum</strong> (symlink: <code>check_postgres_last_vacuum</code>)</a></strong>
+
+<dd>
+<p>Checks how long it has been since vacuum (or analyze) was last run on each table in one or more databases. This requires
+that stats_rows_level is enabled, and the target database must be version 8.2 or higher. Tables can be excluded and
+included: see the INCLUDE section below for details. The units for --warning and --critical are times. Valid units are
+seconds, minutes, hours, and days; all can be abbreviated to the first letter. If no units are given, 'seconds' is assumed.
+The default values are '1 day' and '2 days'. Please note that there are cases in which this field does not get
+automatically populated. If certain tables are giving you problems, make sure that they have dead rows to vacuum,
+or just exclude them from the test.</p>
+</dd>
+<dd>
+<p>Example 1: Warn if any table has not been vacuumed in 3 days, and give a critical at a week, for host wormwood
+ check_last_vacuum --host=wormwood --warning='3d' --critical='7d'</p>
+</dd>
+</li>
+<dt><strong><a name="listener" class="item"><strong>listener</strong> (symlink: <code>check_postgres_listener</code>)</a></strong>
+
+<dd>
+<p>Confirm that someone is listening for one or more specific strings. Only one of warning or critical is needed. The format
+is a simple string representing the LISTEN target, or a tilde character followed by a string for a regular expression
+check.</p>
+</dd>
+<dd>
+<p>Example 1: Give a warning if nobody is listening for the string bucardo_mcp_ping on ports 5555 and 5556
+ check_postgres_listener --port=5555,5556 --warning=bucardo_mcp_ping</p>
+</dd>
+<dd>
+<p>Example 2: Give a critical if there are no active LISTEN requests matching 'grimm' on database oskar
+ check_postgres_listener --db oskar --critical=~grimm</p>
+</dd>
+</li>
+<dt><strong><a name="locks" class="item"><strong>locks</strong> (symlink: <code>check_postgres_locks</code>)</a></strong>
+
+<dd>
+<p>Check the total number of locks on one or more databases. Makes no sense to run this more than once per cluster.
+Databases can be filtered with the --include and --exclude options: See the INCLUDE section below for more detail.
+The warning and critical can be specified as simple numbers, which represent the total number of locks, or they can
+be broken down by type of lock. Valid lock names are &quot;total&quot;, &quot;waiting&quot;, or a type of lock used by Postgres.
+These names are case-insensitive and do not need the &quot;lock&quot; part on the end, so 'exclusive' will match
+'ExclusiveLock'. The format is name=number, with different items separated by semicolons.</p>
+</dd>
+<dd>
+<p>Example 1: Warn if the number of locks is 100 or more, and critical if 200 or more, on host garrett
+ check_postgres_locks --host=garrett --warning=100 --critical=200</p>
+</dd>
+<dd>
+<p>Example 2: On the host artemus, warn if 200 or more locks exist, and give a critical if over 250 total locks exist,
+or if over 20 exclusive locks exist, or if over 5 connections are waiting for a lock.
+ check_postgres_locks --host=artemus --warning=200 --critical=&quot;total=250;waiting=5;exclusive=20&quot;</p>
+</dd>
+</li>
+<dt><strong><a name="logfile" class="item"><strong>logfile</strong> (symlink: <code>check_postgres_logfile</code>)</a></strong>
+
+<dd>
+<p>Ensures that the logfile is in the expected location and is being logged to. This action issues a command that throws
+an error on each database it is checking, and ensures that the message shows up in the logs. It scans the various
+log_* settings inside of Postgres to figure out where the logs should be. If you are using syslog, it does a rough
+but not foolproof scan of /etc/syslog,conf. Alternatively, you can provide the name of the logfile with the --logfile
+option. This is especially useful if the logs have a custom rotation scheme driven be an external program. The
+--logfile option supports the following escape characters: %Y %m %d %H, which represent the current year, month, date,
+and hour respectively. An error is always reported as critical unless the warning option has been passed in as a
+non-zero value. Other than that specific usage, the --warning and --critical options should not be used.</p>
+</dd>
+<dd>
+<p>Example 1: On port 5432, ensure the logfile is being written to the file /home/greg/pg8.2.log
+ check_postgres_logfile --port=5432 --logfile=/home/greg/pg8.2.log</p>
+</dd>
+<dd>
+<p>Example 2: Same as above, but raise a warning, not a critical
+ check_postgres_logfile --port=5432 --logfile=/home/greg/pg8.2.log -w 1</p>
+</dd>
+</li>
+<dt><strong><a name="query_runtime" class="item"><strong>query_runtime</strong> (symlink: <code>check_postgres_query_runtime</code>)</a></strong>
+
+<dd>
+<p>Checks how long a specific query takes to run, by executing a &quot;EXPLAIN ANALYZE&quot; against it. The --warning and --critical
+options are the maximum amount of time the query should take. Valid units are seconds, minutes, and hours; any can be
+abbreviated to the first letter. If no units are given, 'seconds' is assumed. Both warning and critical must be given.
+The name of the view or function to be run must be passed in to the --queryname
+option. It must consist of a single word (or schema.word format), with optional parens at the end.</p>
+</dd>
+<dd>
+<p>Example 1: Give a critical if the function named &quot;speedtest&quot; fails to run in 10 seconds or less.
+ check_postgres_query_runtime --queryname='speedtest()' --critical=10 --warning=10</p>
+</dd>
+</li>
+<dt><strong><a name="query_time" class="item"><strong>query_time</strong> (symlink: <code>check_postgres_query_time</code>)</a></strong>
+
+<dd>
+<p>Checks the length of running queries on one or more databases. It makes no sense to run this more than once
+on the same cluster (all databases are returned no matter where you connect from). Databases can be included or
+excluded with the --include and --exclude option: see the INCLUDE section below for more details. The warning and
+critical options are an amount of time, and default to '2 minutes' and '5 minutes'. Valid units are 'seconds', 'minutes',
+'hours', or 'days'. Each may be written singular or abbreviated to just the first letter. If no units are given,
+the unit is assumed to be seconds.</p>
+</dd>
+<dd>
+<p>Example 1: Give a warning if any query has been running longer than 3 minutes, and a critical if longer than 5 minutes.
+ check_postgres_query_time --port=5432 --warning='3 minutes' --critical='5 minutes'</p>
+</dd>
+<dd>
+<p>Example 2: Using default values (2 and 5 minutes), check all databases except those starting with 'template'.
+ check_postgres_query_time --port=5432 --exclude=~^template</p>
+</dd>
+</li>
+<dt><strong><a name="txn_time" class="item"><strong>txn_time</strong> (symlink: <code>check_postgres_txn_time</code>)</a></strong>
+
+<dd>
+<p>Checks the length of open transactions on one or more databases. It makes no sense to run this more than once
+on the same cluster (all databases are returned no matter where you connect from). Databases can be included or
+excluded with the --include and --exclude option: see the INCLUDE section below for more details. The warning and
+critical options are an amount of time, and must be provided (no default). Valid units are 'seconds', 'minutes',
+'hours', or 'days'. Each may be written singular or abbreviated to just the first letter. If no units are given,
+the unit is assumed to be seconds. Requires Postgres 8.3 or better.</p>
+</dd>
+<dd>
+<p>Example 1: Give a critical if any transaction has been open for more than 10 minutes:
+ check_postgres_txn_time --port=5432 --critical='10 minutes'</p>
+</dd>
+</li>
+<dt><strong><a name="txn_idle" class="item"><strong>txn_idle</strong> (symlink: <code>check_postgres_txn_idle</code>)</a></strong>
+
+<dd>
+<p>Checks the length of &quot;idle in transaction&quot; queries on one or more databases. It makes no sense to run this more than once
+on the same cluster (all databases are returned no matter where you connect from). Databases can be included or
+excluded with the --include and --exclude option: see the INCLUDE section below for more details. The warning and
+critical options are an amount of time, and must be provided (no default). Valid units are 'seconds', 'minutes',
+'hours', or 'days'. Each may be written singular or abbreviated to just the first letter. If no units are given,
+the unit is assumed to be seconds. Requires Postgres 8.3 or better.</p>
+</dd>
+<dd>
+<p>Example 1: Give a warning if any connection has been idle in transaction for more than 15 seconds:
+ check_postgres_txn_idle --port=5432 --warning='15 seconds'</p>
+</dd>
+</li>
+<dt><strong><a name="rebuild_symlinks" class="item"><strong>rebuild_symlinks</strong></a></strong>
+
+<dt><strong><a name="rebuild_symlinks_force" class="item"><strong>rebuild_symlinks_force</strong></a></strong>
+
+<dd>
+<p>This action requires no other arguments, and does not create to any databases, but simply creates symlinks for
+each action, in the form &quot;check_postgres_&lt;action_name&gt;&quot;. If the file already exists, it will not be overwritten.
+If the action is rebuild_symlinks_force, then symlinks will be overwritten.</p>
+</dd>
+</li>
+<dt><strong><a name="settings_checksum" class="item"><strong>settings_checksum</strong> (symlink: <code>check_postgres_settings_checksum</code>)</a></strong>
+
+<dd>
+<p>Check that all the Postgres settings are the same as last time you checked. This is done by generating a checksum
+of a sorted list of setting names and their values. Note that different users in the same database may have
+different checksums, due to ALTER USER usage, and due to the fact that superusers see more settings than
+ordinary users. Either the --warning or the --critical should be given. but not both. The value of each one is
+the checksum, a 32-character hexadecimal value. You can run with the special --critical=0 option to find out
+an existing checksum.</p>
+</dd>
+<dd>
+<p>This action requires the Digest::MD5 module.</p>
+</dd>
+<dd>
+<p>Example 1: Find the initial checksum for the database on port 5555 using the default user (usually postgres)
+ check_postgres_settings_checksum --port=5555 --critical=0</p>
+</dd>
+<dd>
+<p>Example 2: Make sure no settings have changed and warn if so, using the checksum from above.
+ check_postgres_settings_checksum --port=5555 --warning=cd2f3b5e129dc2b4f5c0f6d8d2e64231</p>
+</dd>
+</li>
+<dt><strong><a name="timesync" class="item"><strong>timesync</strong> (symlink: <code>check_postgres_timesync</code>)</a></strong>
+
+<dd>
+<p>Compares the local system time with the time reported by one or more databases. The warning and critical options represent
+the number of seconds at which the warning or critical should be given. If neither is specified, the default values
+are used, which are '2' and '5'. The warning cannot be greater than the critical. Due to the non-exact nature of this
+test, a value of '0' or '1' is not recommended.</p>
+</dd>
+<dd>
+<p>The string returned shows the time difference as well as the time on each side written out.</p>
+</dd>
+<dd>
+<p>Example 1: Check that databases on hosts ankh, morpork, and klatch are no more than 3 seconds off from the local time:
+ check_postgres_timesync --host=ankh,morpork.klatch --critical=3</p>
+</dd>
+</li>
+<dt><strong><a name="txn_wraparound" class="item"><strong>txn_wraparound</strong> (symlink: <code>check_postgres_txn_wraparound</code>)</a></strong>
+
+<dd>
+<p>Checks how close to transaction wraparound one or more databases are getting. The warning and critical indicate
+the number of transactions left and must be a positive integer. If either is not given, the default values of
+1.3 and 1.4 billion are used. It makes no sense to run this check more than once on a single cluster. For a more
+detailed discussion of what this number represents and what to do about it, please visit the page
+<a href="http://www.postgresql.org/docs/current/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND">http://www.postgresql.org/docs/current/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND</a></p>
+</dd>
+<dd>
+<p>The warning and value can have underscores in the number for legibility, as Perl does.</p>
+</dd>
+<dd>
+<p>Example 1: Check the default values for the localhost database
+ check_postgres_txn_wraparound --host=localhost</p>
+</dd>
+<dd>
+<p>Example 2: Check port 6000 and give a critical at 1.7 billion transactions left:
+ check_postgres_txn_wraparound --port=6000 --critical=1_700_000_000t</p>
+</dd>
+</li>
+<dt><strong><a name="wal_files" class="item"><strong>wal_files</strong> (symlink: <code>check_postgres_wal_files</code>)</a></strong>
+
+<dd>
+<p>Checks how many WAL files exist in the pg_xlog file, which is found off of your data directory, sometimes
+as a symlink to another disk for performance reasons. This must be run as a superuser, in order to
+access the contents of the pg_xlog directory. The minimum version to use this action is 8.1. The
+warning and critical are simply the number of files in the pg_xlog directory. What number to set this
+to will vary, but a general guideline is to put a number slightly higher than what is normally there,
+to catch problems early.</p>
+</dd>
+<dd>
+<p>Normally, WAL files are closed and then re-used, but a long-running open transaction, or a faulty
+log shipping method, may cause Postgres to create too many files. Ultimately, this will cause the
+disk they are on to run out of space, at which point Postgres will shut down.</p>
+</dd>
+<dd>
+<p>Example 1: Check that the number of WAL files is 20 or less on host &quot;pluto&quot;
+ check_postgres_txn_wraparound --host=pluto --critical=20</p>
+</dd>
+</li>
+<dt><strong><a name="version" class="item"><strong>version</strong> (symlink: <code>check_version</code>)</a></strong>
+
+<dd>
+<p>Checks that the required version of Postgres is running. The --warning and --critical arguments (only one is required)
+must be of the format X.Y or X.Y.Z where X is the major version number, Y is the minor version number, and Z is the
+revision.</p>
+</dd>
+<dd>
+<p>Example 1: Give a warning if the database on port 5678 is not version 8.4.10:
+ check_postgres_version --port=5678 -w=8.4.10</p>
+</dd>
+<dd>
+<p>Example 2: Give a warning if any databases on hosts valley,grain, or sunshine is not 8.3:
+ check_postgres_version -H valley,grain,sunshine --critical=8.3</p>
+</dd>
+</li>
+</dl>
+<p>
+</p>
+<hr />
+<h1><a name="inclusion_and_exclusion">INCLUSION AND EXCLUSION</a></h1>
+<p>The options --include and --exclude can be combined to limit which things are checked, depending on the action.
+The name of the database can be filtered when using the following actions:
+backends, database_size, last_vacuum, last_analyze, locks, and query_time.
+The name of a relation can be filtered when using the following actions:
+bloat, index_size, table_size, and relation_size.
+The name of a setting can be filtered when using the settings_checksum action.
+The name of a file system can be filtered when using the disk_space action.
+The name of a setting can be filtered when using the settings_checksum action.</p>
+<p>If only an include option is given, then ONLY those entries that match will be checked. However, if given
+both exclude and include, the exclusion is done first, and the inclusion second to reinstate things that
+may have been excluded. Both --include and --exclude can be given multiple times, or as comma-separated lists.
+A leading tilde will match the following word as a regular expression.</p>
+<p>Examples:</p>
+<pre>
+ --include=pg_class
+ Only checks items named pg_class</pre>
+<pre>
+ --include=~pg_
+ Only checks items containing the letters 'pg_'</pre>
+<pre>
+ --include=~^pg_
+ Only check items beginning with 'pg_'</pre>
+<pre>
+ --exclude=test
+ Exclude the item named 'test'</pre>
+<pre>
+ --exclude=~test
+ Exclude all items containing the letters 'test</pre>
+<pre>
+ --exclude=~ace --include=faceoff
+ Exclude all items containing the letters 'ace', but allow the item 'faceoff'</pre>
+<pre>
+ --exclude=~^pg_,~slon,sql_settings --exclude=green --include=~prod,pg_relname
+ Exclude all items which start with the letters 'pg_', which contain the letters 'slon', or which are named
+ 'sql_settings' or 'green'. Specifically check items with the letters 'prod' in their names, and always
+ check the item named 'pg_relname'.</pre>
+<p>
+</p>
+<hr />
+<h1><a name="test_mode">TEST MODE</a></h1>
+<p>To help in setting things up, this program can be run in a &quot;test mode&quot; by specifying the --test option. This will
+perform some basic tests to make sure that the databases can be contacted, and that certain per-action prerequisites
+are met. Currently, we check that the user is a superuser if required by that action, and that the version of Postgres
+is new enough for those actions that depend on a specific version.</p>
+<p>
+</p>
+<hr />
+<h1><a name="dependencies">DEPENDENCIES</a></h1>
+<dl>
+<dt><strong><a name="access_to_a_working_version_of_psql" class="item">Access to a working version of psql</a></strong>
+
+<dt><strong><a name="some_very_standard_perl_modules" class="item">Some very standard Perl modules:</a></strong>
+
+<dl>
+<dt><strong><a name="getopt_long" class="item">Getopt::Long</a></strong>
+
+<dt><strong><a name="file_basename" class="item">File::Basename</a></strong>
+
+<dt><strong><a name="file_temp" class="item">File::Temp</a></strong>
+
+<dt><strong><a name="hires" class="item">Time::HiRes (if opt{showtime} is set to true, which is the default)</a></strong>
+
+</dl>
+</dl>
+<p>The 'settings_checksum' action requires the Digest::MD5 module.</p>
+<p>Some actions require access to external programs. If psql is not explicitly specified, the command
+'which' is used to find it. The program &quot;/bin/df&quot; is needed by the 'check_disk_space' action.</p>
+<p>
+</p>
+<hr />
+<h1><a name="development">DEVELOPMENT</a></h1>
+<p>Development happens using the git system. You can clone the latest version by doing:
+ git-clone <a href="http://bucardo.org/nagios_postgres.git">http://bucardo.org/nagios_postgres.git</a></p>
+<p>
+</p>
+<hr />
+<h1><a name="history">HISTORY</a></h1>
+<p>Items not specifically attributed are by Greg Sabino Mullane.</p>
+<dl>
+<dt><strong><a name="version_1_4_1" class="item"><strong>Version 1.4.1</strong></a></strong>
+
+<dd>
+<p>Fix bug preventing --dbpass argument from working (Robert Treat)</p>
+</dd>
+</li>
+<dt><strong><a name="version_1_4_12" class="item"><strong>Version 1.4.1</strong></a></strong>
+
+<dd>
+<p>Minor documentation fixes.</p>
+</dd>
+</li>
+<dt><strong><a name="version_1_4_0" class="item"><strong>Version 1.4.0</strong></a></strong>
+
+<dd>
+<p>Have check_wal_files use pg_ls_dir (idea by Robert Treat)</p>
+</dd>
+<dd>
+<p>For last_vacuum and last_analyze, respect autovacuum effects, add separate
+autovacuum checks (ideas by Robert Treat)</p>
+</dd>
+</li>
+<dt><strong><a name="version_1_3_1" class="item"><strong>Version 1.3.1</strong></a></strong>
+
+<dd>
+<p>Have txn_idle use query_start, not xact_start</p>
+</dd>
+</li>
+<dt><strong><a name="version_1_3_0" class="item"><strong>Version 1.3.0</strong></a></strong>
+
+<dd>
+<p>Add in txn_idle and txn_time actions.</p>
+</dd>
+</li>
+<dt><strong><a name="version_1_2_0" class="item"><strong>Version 1.2.0</strong></a></strong>
+
+<dd>
+<p>Add the check_wal_files method, which counts the number of WAL files
+in your pg_xlog directory.</p>
+</dd>
+<dd>
+<p>Fix some typos in the docs.</p>
+</dd>
+<dd>
+<p>Explicitly allow -v as an argument.</p>
+</dd>
+<dd>
+<p>Allow for a null syslog_facility in check_logfile</p>
+</dd>
+</li>
+<dt><strong><a name="version_1_1_2" class="item"><strong>Version 1.1.2</strong></a></strong>
+
+<dd>
+<p>Fix error preventing --action=rebuild_symlinks from working.</p>
+</dd>
+</li>
+<dt><strong><a name="version_1_1_1" class="item"><strong>Version 1.1.1</strong></a></strong>
+
+<dd>
+<p>Switch vacuum and analyze date output to use 'DD', not 'D'. (Glyn Astill)</p>
+</dd>
+</li>
+<dt><strong><a name="version_1_1_0" class="item"><strong>Version 1.1.0</strong></a></strong>
+
+<dd>
+<p>Fixes, enhancements, and performance tracking, December 2007</p>
+</dd>
+<dd>
+<p>Add performance data tracking via --showperf and --perflimit</p>
+</dd>
+<dd>
+<p>Lots of refactoring and cleanup of how actions handle arguments.</p>
+</dd>
+<dd>
+<p>Do basic checks to figure out syslog file for 'logfile' action.</p>
+</dd>
+<dd>
+<p>Allow for exact matching of beta versions with 'version' action.</p>
+</dd>
+<dd>
+<p>Redo the default arguments to only populate when neither 'warning' nor 'critical' is provided.</p>
+</dd>
+<dd>
+<p>Allow just warning OR critical to be given for the 'timesync' action.</p>
+</dd>
+<dd>
+<p>Remove 'redirect_stderr' requirement from 'logfile' due to 8.3 changes.</p>
+</dd>
+<dd>
+<p>Actions 'last_vacuum' and 'last_analyze' are 8.2 only (Robert Treat)</p>
+</dd>
+</li>
+<dt><strong><a name="version_1_0_16" class="item"><strong>Version 1.0.16</strong></a></strong>
+
+<dd>
+<p>First public release, December 2007</p>
+</dd>
+</li>
+</dl>
+<p>
+</p>
+<hr />
+<h1><a name="bugs_and_limitations">BUGS AND LIMITATIONS</a></h1>
+<p>The index bloat size optimization is still very rough.</p>
+<p>Some actions may not work on older versions of Postgres (before 8.0).</p>
+<p>Please report any problems to <a href="mailto:[email protected].">[email protected].</a></p>
+<p>
+</p>
+<hr />
+<h1><a name="author">AUTHOR</a></h1>
+<p>Greg Sabino Mullane &lt;<a href="mailto:[email protected]">[email protected]</a>&gt;</p>
+<p>
+</p>
+<hr />
+<h1><a name="license_and_copyright">LICENSE AND COPYRIGHT</a></h1>
+<p>Copyright (c) 2007-2008 Greg Sabino Mullane &lt;<a href="mailto:[email protected]">[email protected]</a>&gt;.</p>
+<p>Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:</p>
+<pre>
+ 1. Redistributions of source code must retain the above copyright notice,
+ this list of conditions and the following disclaimer.
+ 2. Redistributions in binary form must reproduce the above copyright notice,
+ this list of conditions and the following disclaimer in the documentation
+ and/or other materials provided with the distribution.</pre>
+<p>THIS SOFTWARE IS PROVIDED BY THE AUTHOR &quot;AS IS&quot; AND ANY EXPRESS OR IMPLIED
+WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
+EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT
+OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY
+OF SUCH DAMAGE.</p>
+
+</body>
+
+</html>