diff options
author | Greg Sabino Mullane | 2008-04-05 21:13:15 +0000 |
---|---|---|
committer | Greg Sabino Mullane | 2008-04-05 21:13:15 +0000 |
commit | 94d5524f83c2d390f2ece40f984bd902d9cecb84 (patch) | |
tree | f63c920683c42ee28bb06184ada2f4b5a9ac9ef9 /check_postgres.pl.html | |
parent | 4d26767f11065d6270dcdbdd985a5056286c8924 (diff) |
Update documentation file.
Diffstat (limited to 'check_postgres.pl.html')
-rw-r--r-- | check_postgres.pl.html | 879 |
1 files changed, 879 insertions, 0 deletions
diff --git a/check_postgres.pl.html b/check_postgres.pl.html new file mode 100644 index 000000000..6b7cc9e1c --- /dev/null +++ b/check_postgres.pl.html @@ -0,0 +1,879 @@ +<?xml version="1.0" ?> +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> +<html xmlns="http://www.w3.org/1999/xhtml"> +<head> +<title>check_postgres.pl - Postgres monitoring script for Nagios</title> +<meta http-equiv="content-type" content="text/html; charset=utf-8" /> +</head> + +<body style="background-color: white"> + + +<!-- INDEX BEGIN --> +<div name="index"> +<p><a name="__index__"></a></p> + +<ul> + + <li><a href="#name">NAME</a></li> + <li><a href="#version">VERSION</a></li> + <li><a href="#synopsis">SYNOPSIS</a></li> + <li><a href="#website">WEBSITE</a></li> + <li><a href="#description">DESCRIPTION</a></li> + <li><a href="#database_connection_options">DATABASE CONNECTION OPTIONS</a></li> + <li><a href="#other_options">OTHER OPTIONS</a></li> + <li><a href="#actions">ACTIONS</a></li> + <li><a href="#inclusion_and_exclusion">INCLUSION AND EXCLUSION</a></li> + <li><a href="#test_mode">TEST MODE</a></li> + <li><a href="#dependencies">DEPENDENCIES</a></li> + <li><a href="#development">DEVELOPMENT</a></li> + <li><a href="#history">HISTORY</a></li> + <li><a href="#bugs_and_limitations">BUGS AND LIMITATIONS</a></li> + <li><a href="#author">AUTHOR</a></li> + <li><a href="#license_and_copyright">LICENSE AND COPYRIGHT</a></li> +</ul> + +<hr name="index" /> +</div> +<!-- INDEX END --> + +<p> +</p> +<hr /> +<h1><a name="name">NAME</a></h1> +<p>check_postgres.pl - Postgres monitoring script for Nagios</p> +<p> +</p> +<hr /> +<h1><a name="version">VERSION</a></h1> +<p>This documents describes check_postgres.pl version 1.4.2</p> +<p> +</p> +<hr /> +<h1><a name="synopsis">SYNOPSIS</a></h1> +<pre> + ## Create all symlinks + check_postgres.pl --action=build_symlinks</pre> +<pre> + ## Check connection to Postgres database 'pluto': + check_postgres.pl --action=connection --db=pluto</pre> +<pre> + ## Same things, but using the symlink + check_postgres_connection --db=pluto</pre> +<pre> + ## Warn if > 100 locks, critical if > 200, or > 20 exclusive + check_postgres_locks --warning=100 --critical="total=200;exclusive=20"</pre> +<pre> + ## There are many other actions and options, please keep reading.</pre> +<p> +</p> +<hr /> +<h1><a name="website">WEBSITE</a></h1> +<p>The latest news and documentation can always be found at:</p> +<p><a href="http://bucardo.org/nagios_postgres/">http://bucardo.org/nagios_postgres/</a></p> +<p> +</p> +<hr /> +<h1><a name="description">DESCRIPTION</a></h1> +<p>check_postgres.pl is a Perl script that runs many different tests against +one or more Postgres databases. It uses the psql program to gather the +information, and returns one of four exit codes used by Nagios, as well +as a short description of the results. The exit codes are:</p> +<ol> +<li><strong><a name="ok" class="item">(OK)</a></strong> + +<li><strong><a name="warning" class="item">(WARNING)</a></strong> + +<li><strong><a name="critical" class="item">(CRITICAL)</a></strong> + +<li><strong><a name="unknown" class="item">(UNKNOWN)</a></strong> + +</ol> +<p> +</p> +<hr /> +<h1><a name="database_connection_options">DATABASE CONNECTION OPTIONS</a></h1> +<p>Almost all actions accept a common set of options, most dealing with connecting to the databases.</p> +<dl> +<dt><strong><a name="h_name_or_host_name" class="item"><strong>-H NAME</strong> or <strong>--host=NAME</strong></a></strong> + +<dd> +<p>Connect to the host indicated by NAME. Can be a comma-separated list of names. Multiple host arguments +are allowed. If no host is given, defaults to a local Unix socket.</p> +</dd> +</li> +<dt><strong><a name="p_port_or_port_port" class="item"><strong>-p PORT</strong> or <strong>--port=PORT</strong></a></strong> + +<dd> +<p>Connects using the specified PORT number. Can be a comma-separated list of port numbers, and multiple +port arguments are allowed. If no port number is given, we default to port 5432.</p> +</dd> +</li> +<dt><strong><a name="db_name_or_dbname_name" class="item"><strong>-db NAME</strong> or <strong>--dbname=NAME</strong></a></strong> + +<dd> +<p>Specifies which database to connect to. Can be a comma-separated list of names, and multiple dbname +arguments are allowed. If no dbname option is provided, defaults to 'postgres' if the psql +version is version 8 or greater, and 'template1' otherwise.</p> +</dd> +</li> +<dt><strong><a name="u_username_or_dbuser_username" class="item"><strong>-u USERNAME</strong> or <strong>--dbuser=USERNAME</strong></a></strong> + +<dd> +<p>The name of the database user to connect as. Can be a comma-separated list of usernames, and multiple +dbuser arguments are allowed. If this is not provided, defaults to 'postgres'.</p> +</dd> +</li> +<dt><strong><a name="dbpass_password" class="item"><strong>--dbpass=PASSWORD</strong></a></strong> + +<dd> +<p>Provides the password to connect to the database with. Use of this option is highly discouraged. +Instead, one should use a .pgpass file.</p> +</dd> +</li> +</dl> +<p>Connection options can be grouped: --host=a,b --host=c --port=1234 --port=3344 +would connect to a-1234, b-1234, and c-3344. Note that once set, an option +carries over until it is changed again.</p> +<p>Examples:</p> +<pre> + --host=a,b --port=5433 --db=c + Connects twice to port 5433, using database c, to hosts a and b + a-5433-c b-5433-c</pre> +<pre> + --host=a,b --port=5433 --db=c,d + Connects four times: a-5433-c a-5433-d b-5433-c b-5433-d</pre> +<pre> + --host=a,b --host=foo --port=1234 --port=5433 --db=e,f + Connects six times: a-1234-e a-1234-f b-1234-e b-1234-f foo-5433-e foo-5433-f</pre> +<pre> + --host=a,b --host=x --port=5432,5433 --dbuser=alice --dbuser=bob -db=baz + Connects three times: a-5432-alice-baz b-5433-alice-baz x-5433-bob-baz</pre> +<p> +</p> +<hr /> +<h1><a name="other_options">OTHER OPTIONS</a></h1> +<p>Other common options include:</p> +<dl> +<dt><strong><a name="psql_path" class="item"><strong>PSQL=PATH</strong></a></strong> + +<dd> +<p>Tells the script where to find the psql program. Useful if you have more than one version of the psql executable +around, or if it is not in your path. Note that this option is in all uppercase. By default, this option is +<em>not allowed</em>. To enable it, you must change the <code>$NO_PSQL_OPTION</code> near the top of the script to 0. Avoid using +this option if you can, and instead hard-code your psql location into the <code>$PSQL</code> variable, also near the top +of the script.</p> +</dd> +</li> +<dt><strong><a name="t_val_or_timeout_val" class="item"><strong>-t VAL</strong> or <strong>--timeout=VAL</strong></a></strong> + +<dd> +<p>Sets the timeout in seconds after which the script will abort whatever it is doing and return an UNKNOWN +status. The timeout is per Postgres cluster, not for the entire script. The default value is 10; the units +are always in seconds.</p> +</dd> +</li> +<dt><strong><a name="h_or_help" class="item"><strong>-h</strong> or <strong>--help</strong></a></strong> + +<dd> +<p>Displays a help screen with a summary of all actions and options.</p> +</dd> +</li> +<dt><strong><a name="v_or_version" class="item"><strong>-V</strong> or <strong>--version</strong></a></strong> + +<dd> +<p>Shows the current version.</p> +</dd> +</li> +<dt><strong><a name="v_or_verbose" class="item"><strong>-v</strong> or <strong>--verbose</strong></a></strong> + +<dd> +<p>Set the verbosity level. Can call more than once to boost the level. Setting it to three or higher (in other words, +issuing <code>-v -v -v</code>) turns on debugging information for this program which is sent to stderr.</p> +</dd> +</li> +<dt><strong><a name="test" class="item"><strong>--test</strong></a></strong> + +<dd> +<p>Enables test mode. See the <a href="#test_mode">TEST MODE</a> section below.</p> +</dd> +</li> +<dt><strong><a name="showperf_val" class="item"><strong>--showperf=VAL</strong></a></strong> + +<dd> +<p>Determines if we output performance data in standard Nagios format (at end of string, after a pipe symbol, using +name=value). VAL should be 0 or 1. The default is 1.</p> +</dd> +</li> +<dt><strong><a name="perflimit_i" class="item"><strong>--perflimit=i</strong></a></strong> + +<dd> +<p>Sets a limit s to how many items of interest are reported back when using the <strong>showperf</strong> option. This only has +an effect for actions that return a large number of items, such as <strong>table_size</strong>. The default is 0, or no limit. +Be careful when using this with --include or --exclude, as those restrictions are done after the query has +been run, and thus your limit may not include the items you want.</p> +</dd> +</li> +<dt><strong><a name="showtime_val" class="item"><strong>--showtime=VAL</strong></a></strong> + +<dd> +<p>Determines if the time taken to run each query is shown in the output. VAL should be 0 or 1. The default is 1. +No effect unless showperf is on.</p> +</dd> +</li> +<dt><strong><a name="action_name" class="item"><strong>--action=NAME</strong></a></strong> + +<dd> +<p>States what action we are running as. Required unless using a symlinked file, in which case the name of the file +is used to figure out the action.</p> +</dd> +</li> +</dl> +<p> +</p> +<hr /> +<h1><a name="actions">ACTIONS</a></h1> +<p>The script runs one or more actions. This can either be done with the --action +flag, or by using a symlink to the main file that contains the name of the action +inside of it. For example, to run the action "timesync", you may either issue:</p> +<pre> + check_postgres.pl --action=timesync</pre> +<p>or use a program named:</p> +<pre> + check_postgres_timesync</pre> +<p>All the symlinks are created for you if use the action "build_symlinks":</p> +<pre> + perl check_postgres.pl --action="build_symlinks"</pre> +<p>If the file name already exists, it will not be overwritten. If the file exists +and is a symlink, you can force it to overwrite by using "build_symlinks_force"</p> +<p>Most actions take a --warning and an -critical option, indicating at what point we change from OK to WARNING +and then to CRITICAL. Note that because criticals are always checked first, setting the warning equal to the +critical is an effective way to turn warnings off and always give a critical.</p> +<p>The current supported actions are:</p> +<dl> +<dt><strong><a name="backends" class="item"><strong>backends</strong> (symlink: <code>check_postgres_backends</code>)</a></strong> + +<dd> +<p>Checks the current number of connections for one or more databases, and optionally comparing it to the maximum +allowed, which is determined by the 'max_connections' setting. The warning and option can take one of three forms. +First, a simple number can be given, which represents the number of connections at which the alert will be given. +This choice does not use the max_connections setting. Second, the percentage of available connections can be given. +Third, a negative number can be given which represents the number of connections left until max_connections is +reached. The default values for warning and critical are '90%' and '95%'. This action also supports the use of the +include and exclude options to filter out specific databases: see the INCLUDES section below for more detail.</p> +</dd> +<dd> +<p>Example 1: Give a warning when the number of connections on host quirm reaches 120, and a critical if it reaches 140. + check_postgres_backends --host=quirm --warning=120 --critical=150</p> +</dd> +<dd> +<p>Example 2: Give a critical when we reach 75% of our max_connections setting on hosts lancre or lancre2. + check_postgres_backends --warning='75%' --critical='75%' --host=lancre,lancre2</p> +</dd> +<dd> +<p>Example 3: Give a warning when there are only 10 more connection slots left on host plasmid, and a critical +when we have only 5 left. + check_postgres_backends --warning=-10 --critical=-5 --host=plasmid</p> +</dd> +<dd> +<p>Example 4: Check all databases except those with "test" in their name, but allow ones that are named "pg_greatest". Connect as port 5432 on the first two hosts, and as port 5433 on the third one. We want to always throw a critical when we reach 30 or more connections.</p> +</dd> +<dd> +<pre> + check_postgres_backends --dbhost=hong,kong --dbhost=fooey --dbport=5432 --dbport=5433 --warning=30 --critical=30 --exclude="~test" --include="pg_greatest,~prod"</pre> +</dd> +</li> +<dt><strong><a name="bloat" class="item"><strong>bloat</strong> (symlink: <code>check_postgres_bloat</code>)</a></strong> + +<dd> +<p>Checks the amount of bloat in tables and indexes. This action requires that stats collection be enabled on the +target databases, and that ANALYZE is run frequently as well. The --include and --exclude options can be used to +filter out which tables to look at: see the INCLUDE section below for more details. The --warning and --critical +option must be specified in sizes. Valid units are bytes, kilobytes, megabytes, gigabytes, terabytes, and exabytes. +You can abbreviate all of those with the first letter. Items without units are assumed to be 'bytes'. The default values +are '1 GB' and '5 GB'. The number represents the number of "wasted bytes", or the difference between what is actually +used by the table and index, and what we compute it should be.</p> +</dd> +<dd> +<p>Note that this action has two hard-coded values to avoid false alarms on smaller relations. Tables must have at +least 10 pages, and indexes at least 15, before they can be considered by this test. If you really want to adjust +these values, you can look for the variables $MINPAGES and $MINIPAGES at the top of the check_bloat subroutine.</p> +</dd> +<dd> +<p>Please note that the values computed by this action are not precise, and should be used as a guideline only. Great +effort was made to estimate the correct size of a table, but in the end it is only an estimate. The correct index size is +much more of a guess than the correct table size, but both should give a rough idea of how bloated they are.</p> +</dd> +<dd> +<p>Example 1: Warn if any table on port 5432 is over 100 MB bloated, and critical if over 200 MB + check_postgres_bloat --port=5432 --warning='100 M', --critical='200 M'</p> +</dd> +<dd> +<p>Example 2: Give a critical if table 'orders' on host 'sami' has more than 10 megs of bloat + check_postgres_bloat --host=sami --include=orders --critical='10 MB'</p> +</dd> +</li> +<dt><strong><a name="connection" class="item"><strong>connection</strong> (symlink: check_postgres_connection)</a></strong> + +<dd> +<p>Simply connects, issues a 'SELECT version()', and leaves. +Takes no --warning or --critical options.</p> +</dd> +</li> +<dt><strong><a name="database_size" class="item"><strong>database_size</strong> (symlink: <code>check_postgres_database_size</code>)</a></strong> + +<dd> +<p>Checks the size of all databases and complains when they are too big. Makes no sense to run this more than once +per cluster. Databases can be filtered with the --include and --exclude options: See the INCLUDE section below for more +detail. The warning and critical can be specified as bytes, kilobytes, megabytes, gigabytes, terabytes, or exabytes. +Each may be abbreviated to the first letter as well. If no unit is given, the unit is assumed to be bytes. +There are not defaults for this action: the warning and critical must be specified. The warning cannot be greater than +the critical. The output returns all databases sorted by size largest first, with both bytes and a "pretty" form +returned.</p> +</dd> +<dd> +<p>Example 1: Warn if any database on host flagg is over 1 TB in size, and critical if over 1.1 TB. + check_postgres_database_size --host=flagg --warning='1 TB' --critical='1.1 t'</p> +</dd> +<dd> +<p>Example 2: Give a critical if the database template1 on port 5432 is over 10 MB. + check_postgres_database_size --port=5432 --include=template1 --warning='10MB' --critical='10MB'</p> +</dd> +</li> +<dt><strong><a name="disk_space" class="item"><strong>disk_space</strong> (symlink: <code>check_postgres_disk_space</code>)</a></strong> + +<dd> +<p>Checks on the available physical disk space used by Postgres. This action requires that you have the executable "/bin/df" +available to report on disk sizes, and it requires that it be run as a superuser, so it can examine the 'data_directory' +setting inside of Postgres. The --warning and --critical options are given in either sizes or percentages. If using sizes, +the standard unit types are allowed: bytes, kilobytes, gigabytes, megabytes, gigabytes, terabytes, or exabytes. Each +may be abbreviated to the first letter only; no units at all indicates 'bytes'. The default values are '90%' and '95%'.</p> +</dd> +<dd> +<p>This command checks the following things to determine all of the different physical disks being used by Postgres.</p> +</dd> +<dl> +<dt><strong><a name="data_directory" class="item"><strong>data_directory</strong></a></strong> + +<dd> +<p>The disk that the main data directory is on.</p> +</dd> +</li> +<dt><strong><a name="log_directory" class="item"><strong>log directory</strong></a></strong> + +<dd> +<p>The disk that the log files are on.</p> +</dd> +</li> +<dt><strong><a name="wal_file_directory" class="item"><strong>WAL file directory</strong></a></strong> + +<dd> +<p>The disk that the write-ahead logs are on (e.g. symlinked pg_xlog)</p> +</dd> +</li> +<dt><strong><a name="tablespaces" class="item"><strong>tablespaces</strong></a></strong> + +<dd> +<p>Each tablespace that is on a separate disk</p> +</dd> +</li> +</dl> +<p>The output shows the total size used and available on each disk, as well as the percentage, ordered by highest to lowest +percentage used. Each item above maps to a file system: these can be included or excluded: see the INCLUDE section below +for more information on the --include and --exclude options.</p> +<p>Example 1: Make sure that no file system is over 90% for the database on port 5432. + check_postgres_disk_space --port=5432 --warning='90%' --critical="90%'</p> +<p>Example 2: Check that all file systems starting with /dev/sda are smaller than 10 GB and 11 GB (warning and critical) + check_postgres_disk_space --port=5432 --warning='10 GB' --critical='11 GB' --include=~^/dev/sda</p> +<dt><strong><a name="index_size" class="item"><strong>index_size</strong> (symlink: <code>check_postgres_index_size</code>)</a></strong> + +<dt><strong><a name="table_size" class="item"><strong>table_size</strong> (symlink: <code>check_postgres_table_size</code>)</a></strong> + +<dt><strong><a name="relation_size" class="item"><strong>relation_size</strong> (symlink: <code>check_postgres_relation_size</code>)</a></strong> + +<dd> +<p>The actions table_size and index_size are simply variations of the relation_size index, which checks for a relation +that has grown too big. Relations (in other words, tables and indexes) can be filtered with the --include and +--exclude options: See the INCLUDE section below for more detail. The warning and critical are given in file sizes, and +can have units of bytes, kilobytes, megabytes, gigabytes, terabytes, or exabytes. Each can be abbreviated to the +first letter, only. If no units are given, bytes is assumed. There are no default values: both warning and critical +must be given. The return text shows the size of the largest relation found.</p> +</dd> +<dd> +<p>If the <strong>showperf</strong> option is enabled, <em>all</em> of the relations with their sizes will be given. To prevent this, is +is recommended that you set the <strong>perflimit</strong>, which will cause the query to do a <code>ORDER BY size DESC LIMIT (perflimit)</code>.</p> +</dd> +<dd> +<p>Example 1: Give a critical if any table is larger than 600MB on host burrick. + check_postgres_table_size --critical='600 MB' --warning='600 MB' --host=burrick</p> +</dd> +<dd> +<p>Example 2: Warn if the table products is over 4 GB in size, and give a critical at 4.5 GB. + check_postgres_table_size --host=burrick --warning='4 GB' --critical='4.5 GB' --include=products</p> +</dd> +</li> +<dt><strong><a name="last_analyze" class="item"><strong>last_analyze</strong> (symlink: <code>check_postgres_last_analyze</code>)</a></strong> + +<dt><strong><a name="last_vacuum" class="item"><strong>last_vacuum</strong> (symlink: <code>check_postgres_last_vacuum</code>)</a></strong> + +<dd> +<p>Checks how long it has been since vacuum (or analyze) was last run on each table in one or more databases. This requires +that stats_rows_level is enabled, and the target database must be version 8.2 or higher. Tables can be excluded and +included: see the INCLUDE section below for details. The units for --warning and --critical are times. Valid units are +seconds, minutes, hours, and days; all can be abbreviated to the first letter. If no units are given, 'seconds' is assumed. +The default values are '1 day' and '2 days'. Please note that there are cases in which this field does not get +automatically populated. If certain tables are giving you problems, make sure that they have dead rows to vacuum, +or just exclude them from the test.</p> +</dd> +<dd> +<p>Example 1: Warn if any table has not been vacuumed in 3 days, and give a critical at a week, for host wormwood + check_last_vacuum --host=wormwood --warning='3d' --critical='7d'</p> +</dd> +</li> +<dt><strong><a name="listener" class="item"><strong>listener</strong> (symlink: <code>check_postgres_listener</code>)</a></strong> + +<dd> +<p>Confirm that someone is listening for one or more specific strings. Only one of warning or critical is needed. The format +is a simple string representing the LISTEN target, or a tilde character followed by a string for a regular expression +check.</p> +</dd> +<dd> +<p>Example 1: Give a warning if nobody is listening for the string bucardo_mcp_ping on ports 5555 and 5556 + check_postgres_listener --port=5555,5556 --warning=bucardo_mcp_ping</p> +</dd> +<dd> +<p>Example 2: Give a critical if there are no active LISTEN requests matching 'grimm' on database oskar + check_postgres_listener --db oskar --critical=~grimm</p> +</dd> +</li> +<dt><strong><a name="locks" class="item"><strong>locks</strong> (symlink: <code>check_postgres_locks</code>)</a></strong> + +<dd> +<p>Check the total number of locks on one or more databases. Makes no sense to run this more than once per cluster. +Databases can be filtered with the --include and --exclude options: See the INCLUDE section below for more detail. +The warning and critical can be specified as simple numbers, which represent the total number of locks, or they can +be broken down by type of lock. Valid lock names are "total", "waiting", or a type of lock used by Postgres. +These names are case-insensitive and do not need the "lock" part on the end, so 'exclusive' will match +'ExclusiveLock'. The format is name=number, with different items separated by semicolons.</p> +</dd> +<dd> +<p>Example 1: Warn if the number of locks is 100 or more, and critical if 200 or more, on host garrett + check_postgres_locks --host=garrett --warning=100 --critical=200</p> +</dd> +<dd> +<p>Example 2: On the host artemus, warn if 200 or more locks exist, and give a critical if over 250 total locks exist, +or if over 20 exclusive locks exist, or if over 5 connections are waiting for a lock. + check_postgres_locks --host=artemus --warning=200 --critical="total=250;waiting=5;exclusive=20"</p> +</dd> +</li> +<dt><strong><a name="logfile" class="item"><strong>logfile</strong> (symlink: <code>check_postgres_logfile</code>)</a></strong> + +<dd> +<p>Ensures that the logfile is in the expected location and is being logged to. This action issues a command that throws +an error on each database it is checking, and ensures that the message shows up in the logs. It scans the various +log_* settings inside of Postgres to figure out where the logs should be. If you are using syslog, it does a rough +but not foolproof scan of /etc/syslog,conf. Alternatively, you can provide the name of the logfile with the --logfile +option. This is especially useful if the logs have a custom rotation scheme driven be an external program. The +--logfile option supports the following escape characters: %Y %m %d %H, which represent the current year, month, date, +and hour respectively. An error is always reported as critical unless the warning option has been passed in as a +non-zero value. Other than that specific usage, the --warning and --critical options should not be used.</p> +</dd> +<dd> +<p>Example 1: On port 5432, ensure the logfile is being written to the file /home/greg/pg8.2.log + check_postgres_logfile --port=5432 --logfile=/home/greg/pg8.2.log</p> +</dd> +<dd> +<p>Example 2: Same as above, but raise a warning, not a critical + check_postgres_logfile --port=5432 --logfile=/home/greg/pg8.2.log -w 1</p> +</dd> +</li> +<dt><strong><a name="query_runtime" class="item"><strong>query_runtime</strong> (symlink: <code>check_postgres_query_runtime</code>)</a></strong> + +<dd> +<p>Checks how long a specific query takes to run, by executing a "EXPLAIN ANALYZE" against it. The --warning and --critical +options are the maximum amount of time the query should take. Valid units are seconds, minutes, and hours; any can be +abbreviated to the first letter. If no units are given, 'seconds' is assumed. Both warning and critical must be given. +The name of the view or function to be run must be passed in to the --queryname +option. It must consist of a single word (or schema.word format), with optional parens at the end.</p> +</dd> +<dd> +<p>Example 1: Give a critical if the function named "speedtest" fails to run in 10 seconds or less. + check_postgres_query_runtime --queryname='speedtest()' --critical=10 --warning=10</p> +</dd> +</li> +<dt><strong><a name="query_time" class="item"><strong>query_time</strong> (symlink: <code>check_postgres_query_time</code>)</a></strong> + +<dd> +<p>Checks the length of running queries on one or more databases. It makes no sense to run this more than once +on the same cluster (all databases are returned no matter where you connect from). Databases can be included or +excluded with the --include and --exclude option: see the INCLUDE section below for more details. The warning and +critical options are an amount of time, and default to '2 minutes' and '5 minutes'. Valid units are 'seconds', 'minutes', +'hours', or 'days'. Each may be written singular or abbreviated to just the first letter. If no units are given, +the unit is assumed to be seconds.</p> +</dd> +<dd> +<p>Example 1: Give a warning if any query has been running longer than 3 minutes, and a critical if longer than 5 minutes. + check_postgres_query_time --port=5432 --warning='3 minutes' --critical='5 minutes'</p> +</dd> +<dd> +<p>Example 2: Using default values (2 and 5 minutes), check all databases except those starting with 'template'. + check_postgres_query_time --port=5432 --exclude=~^template</p> +</dd> +</li> +<dt><strong><a name="txn_time" class="item"><strong>txn_time</strong> (symlink: <code>check_postgres_txn_time</code>)</a></strong> + +<dd> +<p>Checks the length of open transactions on one or more databases. It makes no sense to run this more than once +on the same cluster (all databases are returned no matter where you connect from). Databases can be included or +excluded with the --include and --exclude option: see the INCLUDE section below for more details. The warning and +critical options are an amount of time, and must be provided (no default). Valid units are 'seconds', 'minutes', +'hours', or 'days'. Each may be written singular or abbreviated to just the first letter. If no units are given, +the unit is assumed to be seconds. Requires Postgres 8.3 or better.</p> +</dd> +<dd> +<p>Example 1: Give a critical if any transaction has been open for more than 10 minutes: + check_postgres_txn_time --port=5432 --critical='10 minutes'</p> +</dd> +</li> +<dt><strong><a name="txn_idle" class="item"><strong>txn_idle</strong> (symlink: <code>check_postgres_txn_idle</code>)</a></strong> + +<dd> +<p>Checks the length of "idle in transaction" queries on one or more databases. It makes no sense to run this more than once +on the same cluster (all databases are returned no matter where you connect from). Databases can be included or +excluded with the --include and --exclude option: see the INCLUDE section below for more details. The warning and +critical options are an amount of time, and must be provided (no default). Valid units are 'seconds', 'minutes', +'hours', or 'days'. Each may be written singular or abbreviated to just the first letter. If no units are given, +the unit is assumed to be seconds. Requires Postgres 8.3 or better.</p> +</dd> +<dd> +<p>Example 1: Give a warning if any connection has been idle in transaction for more than 15 seconds: + check_postgres_txn_idle --port=5432 --warning='15 seconds'</p> +</dd> +</li> +<dt><strong><a name="rebuild_symlinks" class="item"><strong>rebuild_symlinks</strong></a></strong> + +<dt><strong><a name="rebuild_symlinks_force" class="item"><strong>rebuild_symlinks_force</strong></a></strong> + +<dd> +<p>This action requires no other arguments, and does not create to any databases, but simply creates symlinks for +each action, in the form "check_postgres_<action_name>". If the file already exists, it will not be overwritten. +If the action is rebuild_symlinks_force, then symlinks will be overwritten.</p> +</dd> +</li> +<dt><strong><a name="settings_checksum" class="item"><strong>settings_checksum</strong> (symlink: <code>check_postgres_settings_checksum</code>)</a></strong> + +<dd> +<p>Check that all the Postgres settings are the same as last time you checked. This is done by generating a checksum +of a sorted list of setting names and their values. Note that different users in the same database may have +different checksums, due to ALTER USER usage, and due to the fact that superusers see more settings than +ordinary users. Either the --warning or the --critical should be given. but not both. The value of each one is +the checksum, a 32-character hexadecimal value. You can run with the special --critical=0 option to find out +an existing checksum.</p> +</dd> +<dd> +<p>This action requires the Digest::MD5 module.</p> +</dd> +<dd> +<p>Example 1: Find the initial checksum for the database on port 5555 using the default user (usually postgres) + check_postgres_settings_checksum --port=5555 --critical=0</p> +</dd> +<dd> +<p>Example 2: Make sure no settings have changed and warn if so, using the checksum from above. + check_postgres_settings_checksum --port=5555 --warning=cd2f3b5e129dc2b4f5c0f6d8d2e64231</p> +</dd> +</li> +<dt><strong><a name="timesync" class="item"><strong>timesync</strong> (symlink: <code>check_postgres_timesync</code>)</a></strong> + +<dd> +<p>Compares the local system time with the time reported by one or more databases. The warning and critical options represent +the number of seconds at which the warning or critical should be given. If neither is specified, the default values +are used, which are '2' and '5'. The warning cannot be greater than the critical. Due to the non-exact nature of this +test, a value of '0' or '1' is not recommended.</p> +</dd> +<dd> +<p>The string returned shows the time difference as well as the time on each side written out.</p> +</dd> +<dd> +<p>Example 1: Check that databases on hosts ankh, morpork, and klatch are no more than 3 seconds off from the local time: + check_postgres_timesync --host=ankh,morpork.klatch --critical=3</p> +</dd> +</li> +<dt><strong><a name="txn_wraparound" class="item"><strong>txn_wraparound</strong> (symlink: <code>check_postgres_txn_wraparound</code>)</a></strong> + +<dd> +<p>Checks how close to transaction wraparound one or more databases are getting. The warning and critical indicate +the number of transactions left and must be a positive integer. If either is not given, the default values of +1.3 and 1.4 billion are used. It makes no sense to run this check more than once on a single cluster. For a more +detailed discussion of what this number represents and what to do about it, please visit the page +<a href="http://www.postgresql.org/docs/current/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND">http://www.postgresql.org/docs/current/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND</a></p> +</dd> +<dd> +<p>The warning and value can have underscores in the number for legibility, as Perl does.</p> +</dd> +<dd> +<p>Example 1: Check the default values for the localhost database + check_postgres_txn_wraparound --host=localhost</p> +</dd> +<dd> +<p>Example 2: Check port 6000 and give a critical at 1.7 billion transactions left: + check_postgres_txn_wraparound --port=6000 --critical=1_700_000_000t</p> +</dd> +</li> +<dt><strong><a name="wal_files" class="item"><strong>wal_files</strong> (symlink: <code>check_postgres_wal_files</code>)</a></strong> + +<dd> +<p>Checks how many WAL files exist in the pg_xlog file, which is found off of your data directory, sometimes +as a symlink to another disk for performance reasons. This must be run as a superuser, in order to +access the contents of the pg_xlog directory. The minimum version to use this action is 8.1. The +warning and critical are simply the number of files in the pg_xlog directory. What number to set this +to will vary, but a general guideline is to put a number slightly higher than what is normally there, +to catch problems early.</p> +</dd> +<dd> +<p>Normally, WAL files are closed and then re-used, but a long-running open transaction, or a faulty +log shipping method, may cause Postgres to create too many files. Ultimately, this will cause the +disk they are on to run out of space, at which point Postgres will shut down.</p> +</dd> +<dd> +<p>Example 1: Check that the number of WAL files is 20 or less on host "pluto" + check_postgres_txn_wraparound --host=pluto --critical=20</p> +</dd> +</li> +<dt><strong><a name="version" class="item"><strong>version</strong> (symlink: <code>check_version</code>)</a></strong> + +<dd> +<p>Checks that the required version of Postgres is running. The --warning and --critical arguments (only one is required) +must be of the format X.Y or X.Y.Z where X is the major version number, Y is the minor version number, and Z is the +revision.</p> +</dd> +<dd> +<p>Example 1: Give a warning if the database on port 5678 is not version 8.4.10: + check_postgres_version --port=5678 -w=8.4.10</p> +</dd> +<dd> +<p>Example 2: Give a warning if any databases on hosts valley,grain, or sunshine is not 8.3: + check_postgres_version -H valley,grain,sunshine --critical=8.3</p> +</dd> +</li> +</dl> +<p> +</p> +<hr /> +<h1><a name="inclusion_and_exclusion">INCLUSION AND EXCLUSION</a></h1> +<p>The options --include and --exclude can be combined to limit which things are checked, depending on the action. +The name of the database can be filtered when using the following actions: +backends, database_size, last_vacuum, last_analyze, locks, and query_time. +The name of a relation can be filtered when using the following actions: +bloat, index_size, table_size, and relation_size. +The name of a setting can be filtered when using the settings_checksum action. +The name of a file system can be filtered when using the disk_space action. +The name of a setting can be filtered when using the settings_checksum action.</p> +<p>If only an include option is given, then ONLY those entries that match will be checked. However, if given +both exclude and include, the exclusion is done first, and the inclusion second to reinstate things that +may have been excluded. Both --include and --exclude can be given multiple times, or as comma-separated lists. +A leading tilde will match the following word as a regular expression.</p> +<p>Examples:</p> +<pre> + --include=pg_class + Only checks items named pg_class</pre> +<pre> + --include=~pg_ + Only checks items containing the letters 'pg_'</pre> +<pre> + --include=~^pg_ + Only check items beginning with 'pg_'</pre> +<pre> + --exclude=test + Exclude the item named 'test'</pre> +<pre> + --exclude=~test + Exclude all items containing the letters 'test</pre> +<pre> + --exclude=~ace --include=faceoff + Exclude all items containing the letters 'ace', but allow the item 'faceoff'</pre> +<pre> + --exclude=~^pg_,~slon,sql_settings --exclude=green --include=~prod,pg_relname + Exclude all items which start with the letters 'pg_', which contain the letters 'slon', or which are named + 'sql_settings' or 'green'. Specifically check items with the letters 'prod' in their names, and always + check the item named 'pg_relname'.</pre> +<p> +</p> +<hr /> +<h1><a name="test_mode">TEST MODE</a></h1> +<p>To help in setting things up, this program can be run in a "test mode" by specifying the --test option. This will +perform some basic tests to make sure that the databases can be contacted, and that certain per-action prerequisites +are met. Currently, we check that the user is a superuser if required by that action, and that the version of Postgres +is new enough for those actions that depend on a specific version.</p> +<p> +</p> +<hr /> +<h1><a name="dependencies">DEPENDENCIES</a></h1> +<dl> +<dt><strong><a name="access_to_a_working_version_of_psql" class="item">Access to a working version of psql</a></strong> + +<dt><strong><a name="some_very_standard_perl_modules" class="item">Some very standard Perl modules:</a></strong> + +<dl> +<dt><strong><a name="getopt_long" class="item">Getopt::Long</a></strong> + +<dt><strong><a name="file_basename" class="item">File::Basename</a></strong> + +<dt><strong><a name="file_temp" class="item">File::Temp</a></strong> + +<dt><strong><a name="hires" class="item">Time::HiRes (if opt{showtime} is set to true, which is the default)</a></strong> + +</dl> +</dl> +<p>The 'settings_checksum' action requires the Digest::MD5 module.</p> +<p>Some actions require access to external programs. If psql is not explicitly specified, the command +'which' is used to find it. The program "/bin/df" is needed by the 'check_disk_space' action.</p> +<p> +</p> +<hr /> +<h1><a name="development">DEVELOPMENT</a></h1> +<p>Development happens using the git system. You can clone the latest version by doing: + git-clone <a href="http://bucardo.org/nagios_postgres.git">http://bucardo.org/nagios_postgres.git</a></p> +<p> +</p> +<hr /> +<h1><a name="history">HISTORY</a></h1> +<p>Items not specifically attributed are by Greg Sabino Mullane.</p> +<dl> +<dt><strong><a name="version_1_4_1" class="item"><strong>Version 1.4.1</strong></a></strong> + +<dd> +<p>Fix bug preventing --dbpass argument from working (Robert Treat)</p> +</dd> +</li> +<dt><strong><a name="version_1_4_12" class="item"><strong>Version 1.4.1</strong></a></strong> + +<dd> +<p>Minor documentation fixes.</p> +</dd> +</li> +<dt><strong><a name="version_1_4_0" class="item"><strong>Version 1.4.0</strong></a></strong> + +<dd> +<p>Have check_wal_files use pg_ls_dir (idea by Robert Treat)</p> +</dd> +<dd> +<p>For last_vacuum and last_analyze, respect autovacuum effects, add separate +autovacuum checks (ideas by Robert Treat)</p> +</dd> +</li> +<dt><strong><a name="version_1_3_1" class="item"><strong>Version 1.3.1</strong></a></strong> + +<dd> +<p>Have txn_idle use query_start, not xact_start</p> +</dd> +</li> +<dt><strong><a name="version_1_3_0" class="item"><strong>Version 1.3.0</strong></a></strong> + +<dd> +<p>Add in txn_idle and txn_time actions.</p> +</dd> +</li> +<dt><strong><a name="version_1_2_0" class="item"><strong>Version 1.2.0</strong></a></strong> + +<dd> +<p>Add the check_wal_files method, which counts the number of WAL files +in your pg_xlog directory.</p> +</dd> +<dd> +<p>Fix some typos in the docs.</p> +</dd> +<dd> +<p>Explicitly allow -v as an argument.</p> +</dd> +<dd> +<p>Allow for a null syslog_facility in check_logfile</p> +</dd> +</li> +<dt><strong><a name="version_1_1_2" class="item"><strong>Version 1.1.2</strong></a></strong> + +<dd> +<p>Fix error preventing --action=rebuild_symlinks from working.</p> +</dd> +</li> +<dt><strong><a name="version_1_1_1" class="item"><strong>Version 1.1.1</strong></a></strong> + +<dd> +<p>Switch vacuum and analyze date output to use 'DD', not 'D'. (Glyn Astill)</p> +</dd> +</li> +<dt><strong><a name="version_1_1_0" class="item"><strong>Version 1.1.0</strong></a></strong> + +<dd> +<p>Fixes, enhancements, and performance tracking, December 2007</p> +</dd> +<dd> +<p>Add performance data tracking via --showperf and --perflimit</p> +</dd> +<dd> +<p>Lots of refactoring and cleanup of how actions handle arguments.</p> +</dd> +<dd> +<p>Do basic checks to figure out syslog file for 'logfile' action.</p> +</dd> +<dd> +<p>Allow for exact matching of beta versions with 'version' action.</p> +</dd> +<dd> +<p>Redo the default arguments to only populate when neither 'warning' nor 'critical' is provided.</p> +</dd> +<dd> +<p>Allow just warning OR critical to be given for the 'timesync' action.</p> +</dd> +<dd> +<p>Remove 'redirect_stderr' requirement from 'logfile' due to 8.3 changes.</p> +</dd> +<dd> +<p>Actions 'last_vacuum' and 'last_analyze' are 8.2 only (Robert Treat)</p> +</dd> +</li> +<dt><strong><a name="version_1_0_16" class="item"><strong>Version 1.0.16</strong></a></strong> + +<dd> +<p>First public release, December 2007</p> +</dd> +</li> +</dl> +<p> +</p> +<hr /> +<h1><a name="bugs_and_limitations">BUGS AND LIMITATIONS</a></h1> +<p>The index bloat size optimization is still very rough.</p> +<p>Some actions may not work on older versions of Postgres (before 8.0).</p> +<p>Please report any problems to <a href="mailto:[email protected].">[email protected].</a></p> +<p> +</p> +<hr /> +<h1><a name="author">AUTHOR</a></h1> +<p>Greg Sabino Mullane <<a href="mailto:[email protected]">[email protected]</a>></p> +<p> +</p> +<hr /> +<h1><a name="license_and_copyright">LICENSE AND COPYRIGHT</a></h1> +<p>Copyright (c) 2007-2008 Greg Sabino Mullane <<a href="mailto:[email protected]">[email protected]</a>>.</p> +<p>Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met:</p> +<pre> + 1. Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + 2. Redistributions in binary form must reproduce the above copyright notice, + this list of conditions and the following disclaimer in the documentation + and/or other materials provided with the distribution.</pre> +<p>THIS SOFTWARE IS PROVIDED BY THE AUTHOR "AS IS" AND ANY EXPRESS OR IMPLIED +WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF +MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO +EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, +EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT +OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING +IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY +OF SUCH DAMAGE.</p> + +</body> + +</html> |