Skip to content

Commit 8d74fc9

Browse files
author
Amit Kapila
committed
Add a view to show the stats of subscription workers.
This commit adds a new system view pg_stat_subscription_workers, that shows information about any errors which occur during the application of logical replication changes as well as during performing initial table synchronization. The subscription statistics entries are removed when the corresponding subscription is removed. It also adds an SQL function pg_stat_reset_subscription_worker() to reset single subscription errors. The contents of this view can be used by an upcoming patch that skips the particular transaction that conflicts with the existing data on the subscriber. This view can be extended in the future to track other xact related statistics like the number of xacts committed/aborted for subscription workers. Author: Masahiko Sawada Reviewed-by: Greg Nancarrow, Hou Zhijie, Tang Haiying, Vignesh C, Dilip Kumar, Takamichi Osumi, Amit Kapila Discussion: https://postgr.es/m/CAD21AoDeScrsHhLyEPYqN3sydg6PxAPVBboK=30xJfUVihNZDA@mail.gmail.com
1 parent 98105e5 commit 8d74fc9

File tree

13 files changed

+1069
-27
lines changed

13 files changed

+1069
-27
lines changed

doc/src/sgml/monitoring.sgml

+157
Original file line numberDiff line numberDiff line change
@@ -627,6 +627,15 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
627627
</entry>
628628
</row>
629629

630+
<row>
631+
<entry><structname>pg_stat_subscription_workers</structname><indexterm><primary>pg_stat_subscription_workers</primary></indexterm></entry>
632+
<entry>One row per subscription worker, showing statistics about errors
633+
that occurred on that subscription worker.
634+
See <link linkend="monitoring-pg-stat-subscription-workers">
635+
<structname>pg_stat_subscription_workers</structname></link> for details.
636+
</entry>
637+
</row>
638+
630639
</tbody>
631640
</tgroup>
632641
</table>
@@ -3054,6 +3063,128 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
30543063

30553064
</sect2>
30563065

3066+
<sect2 id="monitoring-pg-stat-subscription-workers">
3067+
<title><structname>pg_stat_subscription_workers</structname></title>
3068+
3069+
<indexterm>
3070+
<primary>pg_stat_subscription_workers</primary>
3071+
</indexterm>
3072+
3073+
<para>
3074+
The <structname>pg_stat_subscription_workers</structname> view will contain
3075+
one row per subscription worker on which errors have occurred, for workers
3076+
applying logical replication changes and workers handling the initial data
3077+
copy of the subscribed tables. The statistics entry is removed when the
3078+
corresponding subscription is dropped.
3079+
</para>
3080+
3081+
<table id="pg-stat-subscription-workers" xreflabel="pg_stat_subscription_workers">
3082+
<title><structname>pg_stat_subscription_workers</structname> View</title>
3083+
<tgroup cols="1">
3084+
<thead>
3085+
<row>
3086+
<entry role="catalog_table_entry"><para role="column_definition">
3087+
Column Type
3088+
</para>
3089+
<para>
3090+
Description
3091+
</para></entry>
3092+
</row>
3093+
</thead>
3094+
3095+
<tbody>
3096+
<row>
3097+
<entry role="catalog_table_entry"><para role="column_definition">
3098+
<structfield>subid</structfield> <type>oid</type>
3099+
</para>
3100+
<para>
3101+
OID of the subscription
3102+
</para></entry>
3103+
</row>
3104+
3105+
<row>
3106+
<entry role="catalog_table_entry"><para role="column_definition">
3107+
<structfield>subname</structfield> <type>name</type>
3108+
</para>
3109+
<para>
3110+
Name of the subscription
3111+
</para></entry>
3112+
</row>
3113+
3114+
<row>
3115+
<entry role="catalog_table_entry"><para role="column_definition">
3116+
<structfield>subrelid</structfield> <type>oid</type>
3117+
</para>
3118+
<para>
3119+
OID of the relation that the worker is synchronizing; null for the
3120+
main apply worker
3121+
</para></entry>
3122+
</row>
3123+
3124+
<row>
3125+
<entry role="catalog_table_entry"><para role="column_definition">
3126+
<structfield>last_error_relid</structfield> <type>oid</type>
3127+
</para>
3128+
<para>
3129+
OID of the relation that the worker was processing when the
3130+
error occurred
3131+
</para></entry>
3132+
</row>
3133+
3134+
<row>
3135+
<entry role="catalog_table_entry"><para role="column_definition">
3136+
<structfield>last_error_command</structfield> <type>text</type>
3137+
</para>
3138+
<para>
3139+
Name of command being applied when the error occurred. This field
3140+
is null if the error was reported during the initial data copy.
3141+
</para></entry>
3142+
</row>
3143+
3144+
<row>
3145+
<entry role="catalog_table_entry"><para role="column_definition">
3146+
<structfield>last_error_xid</structfield> <type>xid</type>
3147+
</para>
3148+
<para>
3149+
Transaction ID of the publisher node being applied when the error
3150+
occurred. This field is null if the error was reported
3151+
during the initial data copy.
3152+
</para></entry>
3153+
</row>
3154+
3155+
<row>
3156+
<entry role="catalog_table_entry"><para role="column_definition">
3157+
<structfield>last_error_count</structfield> <type>uint8</type>
3158+
</para>
3159+
<para>
3160+
Number of consecutive times the error occurred
3161+
</para></entry>
3162+
</row>
3163+
3164+
<row>
3165+
<entry role="catalog_table_entry"><para role="column_definition">
3166+
<structfield>last_error_message</structfield> <type>text</type>
3167+
</para>
3168+
<para>
3169+
The error message
3170+
</para></entry>
3171+
</row>
3172+
3173+
<row>
3174+
<entry role="catalog_table_entry"><para role="column_definition">
3175+
<structfield>last_error_time</structfield> <type>timestamp with time zone</type>
3176+
</para>
3177+
<para>
3178+
Last time at which this error occurred
3179+
</para></entry>
3180+
</row>
3181+
3182+
</tbody>
3183+
</tgroup>
3184+
</table>
3185+
3186+
</sect2>
3187+
30573188
<sect2 id="monitoring-pg-stat-ssl-view">
30583189
<title><structname>pg_stat_ssl</structname></title>
30593190

@@ -5176,6 +5307,32 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
51765307
can be granted EXECUTE to run the function.
51775308
</para></entry>
51785309
</row>
5310+
5311+
<row>
5312+
<entry role="func_table_entry"><para role="func_signature">
5313+
<indexterm>
5314+
<primary>pg_stat_reset_subscription_worker</primary>
5315+
</indexterm>
5316+
<function>pg_stat_reset_subscription_worker</function> ( <parameter>subid</parameter> <type>oid</type> <optional>, <parameter>relid</parameter> <type>oid</type> </optional> )
5317+
<returnvalue>void</returnvalue>
5318+
</para>
5319+
<para>
5320+
Resets the statistics of subscription workers running on the
5321+
subscription with <parameter>subid</parameter> shown in the
5322+
<structname>pg_stat_subscription_workers</structname> view. If the
5323+
argument <parameter>relid</parameter> is not <literal>NULL</literal>,
5324+
resets statistics of the subscription worker handling the initial data
5325+
copy of the relation with <parameter>relid</parameter>. Otherwise,
5326+
resets the subscription worker statistics of the main apply worker.
5327+
If the argument <parameter>relid</parameter> is omitted, resets the
5328+
statistics of all subscription workers running on the subscription
5329+
with <parameter>subid</parameter>.
5330+
</para>
5331+
<para>
5332+
This function is restricted to superusers by default, but other users
5333+
can be granted EXECUTE to run the function.
5334+
</para></entry>
5335+
</row>
51795336
</tbody>
51805337
</tgroup>
51815338
</table>

src/backend/catalog/system_functions.sql

+4
Original file line numberDiff line numberDiff line change
@@ -639,6 +639,10 @@ REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_function_counters(oid) FROM publ
639639

640640
REVOKE EXECUTE ON FUNCTION pg_stat_reset_replication_slot(text) FROM public;
641641

642+
REVOKE EXECUTE ON FUNCTION pg_stat_reset_subscription_worker(oid) FROM public;
643+
644+
REVOKE EXECUTE ON FUNCTION pg_stat_reset_subscription_worker(oid, oid) FROM public;
645+
642646
REVOKE EXECUTE ON FUNCTION lo_import(text) FROM public;
643647

644648
REVOKE EXECUTE ON FUNCTION lo_import(text, oid) FROM public;

src/backend/catalog/system_views.sql

+23
Original file line numberDiff line numberDiff line change
@@ -1261,3 +1261,26 @@ REVOKE ALL ON pg_subscription FROM public;
12611261
GRANT SELECT (oid, subdbid, subname, subowner, subenabled, subbinary,
12621262
substream, subtwophasestate, subslotname, subsynccommit, subpublications)
12631263
ON pg_subscription TO public;
1264+
1265+
CREATE VIEW pg_stat_subscription_workers AS
1266+
SELECT
1267+
w.subid,
1268+
s.subname,
1269+
w.subrelid,
1270+
w.last_error_relid,
1271+
w.last_error_command,
1272+
w.last_error_xid,
1273+
w.last_error_count,
1274+
w.last_error_message,
1275+
w.last_error_time
1276+
FROM (SELECT
1277+
oid as subid,
1278+
NULL as relid
1279+
FROM pg_subscription
1280+
UNION ALL
1281+
SELECT
1282+
srsubid as subid,
1283+
srrelid as relid
1284+
FROM pg_subscription_rel) sr,
1285+
LATERAL pg_stat_get_subscription_worker(sr.subid, sr.relid) w
1286+
JOIN pg_subscription s ON (w.subid = s.oid);

src/backend/commands/subscriptioncmds.c

+15-1
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@
3232
#include "executor/executor.h"
3333
#include "miscadmin.h"
3434
#include "nodes/makefuncs.h"
35+
#include "pgstat.h"
3536
#include "replication/logicallauncher.h"
3637
#include "replication/origin.h"
3738
#include "replication/slot.h"
@@ -1204,7 +1205,8 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
12041205
* Since dropping a replication slot is not transactional, the replication
12051206
* slot stays dropped even if the transaction rolls back. So we cannot
12061207
* run DROP SUBSCRIPTION inside a transaction block if dropping the
1207-
* replication slot.
1208+
* replication slot. Also, in this case, we report a message for dropping
1209+
* the subscription to the stats collector.
12081210
*
12091211
* XXX The command name should really be something like "DROP SUBSCRIPTION
12101212
* of a subscription that is associated with a replication slot", but we
@@ -1377,6 +1379,18 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
13771379
}
13781380
PG_END_TRY();
13791381

1382+
/*
1383+
* Send a message for dropping this subscription to the stats collector.
1384+
* We can safely report dropping the subscription statistics here if the
1385+
* subscription is associated with a replication slot since we cannot run
1386+
* DROP SUBSCRIPTION inside a transaction block. Subscription statistics
1387+
* will be removed later by (auto)vacuum either if it's not associated
1388+
* with a replication slot or if the message for dropping the subscription
1389+
* gets lost.
1390+
*/
1391+
if (slotname)
1392+
pgstat_report_subscription_drop(subid);
1393+
13801394
table_close(rel, NoLock);
13811395
}
13821396

0 commit comments

Comments
 (0)