Skip to content

Commit 78bad97

Browse files
committed
Improve various aspects of pg_rewind documentation
The pg_rewind docs currently assert that the state of the target's data directory after rewind is equivalent to the source's data directory. This clarifies the documentation to describe that the base state is further back in time and that the target's data directory will include the current state from the source of any copied blocks since the point of divergence. This commit also improves the section "How It Works": - Describe the update of the pg_control file. - Reorganize the list of files and directories ignored during the rewind. Author: James Coleman Discussion: https://postgr.es/m/CAAaqYe-sgqCos7MXF4XiY8rUPy3CEmaCY9EvfhX-DhPhPBF5_A@mail.gmail.com
1 parent d9c501d commit 78bad97

File tree

1 file changed

+57
-33
lines changed

1 file changed

+57
-33
lines changed

doc/src/sgml/ref/pg_rewind.sgml

+57-33
Original file line numberDiff line numberDiff line change
@@ -48,14 +48,16 @@ PostgreSQL documentation
4848
</para>
4949

5050
<para>
51-
The result is equivalent to replacing the target data directory with the
52-
source one. Only changed blocks from relation files are copied;
53-
all other files are copied in full, including configuration files. The
54-
advantage of <application>pg_rewind</application> over taking a new base backup, or
55-
tools like <application>rsync</application>, is that <application>pg_rewind</application> does
56-
not require reading through unchanged blocks in the cluster. This makes
57-
it a lot faster when the database is large and only a small
58-
fraction of blocks differ between the clusters.
51+
After a successful rewind, the state of the target data directory is
52+
analogous to a base backup of the source data directory. Unlike taking
53+
a new base backup or using a tool like <application>rsync</application>,
54+
<application>pg_rewind</application> does not require comparing or copying
55+
unchanged relation blocks in the cluster. Only changed blocks from existing
56+
relation files are copied; all other files, including new relation files,
57+
configuration files, and WAL segments, are copied in full. As such the
58+
rewind operation is significantly faster than other approaches when the
59+
database is large and only a small fraction of blocks differ between the
60+
clusters.
5961
</para>
6062

6163
<para>
@@ -77,16 +79,18 @@ PostgreSQL documentation
7779
</para>
7880

7981
<para>
80-
When the target server is started for the first time after running
81-
<application>pg_rewind</application>, it will go into recovery mode and replay all
82-
WAL generated in the source server after the point of divergence.
83-
If some of the WAL was no longer available in the source server when
84-
<application>pg_rewind</application> was run, and therefore could not be copied by the
85-
<application>pg_rewind</application> session, it must be made available when the
86-
target server is started. This can be done by creating a
87-
<filename>recovery.signal</filename> file in the target data directory
88-
and configuring suitable <xref linkend="guc-restore-command"/>
89-
in <filename>postgresql.conf</filename>.
82+
After running <application>pg_rewind</application>, WAL replay needs to
83+
complete for the data directory to be in a consistent state. When the
84+
target server is started again it will enter archive recovery and replay
85+
all WAL generated in the source server from the last checkpoint before
86+
the point of divergence. If some of the WAL was no longer available in the
87+
source server when <application>pg_rewind</application> was run, and
88+
therefore could not be copied by the <application>pg_rewind</application>
89+
session, it must be made available when the target server is started.
90+
This can be done by creating a <filename>recovery.signal</filename> file
91+
in the target data directory and by configuring a suitable
92+
<xref linkend="guc-restore-command"/> in
93+
<filename>postgresql.conf</filename>.
9094
</para>
9195

9296
<para>
@@ -105,6 +109,15 @@ PostgreSQL documentation
105109
recovered. In such a case, taking a new fresh backup is recommended.
106110
</para>
107111

112+
<para>
113+
As <application>pg_rewind</application> copies configuration files
114+
entirely from the source, it may be required to correct the configuration
115+
used for recovery before restarting the target server, especially if
116+
the target is reintroduced as a standby of the source. If you restart
117+
the server after the rewind operation has finished but without configuring
118+
recovery, the target may again diverge from the primary.
119+
</para>
120+
108121
<para>
109122
<application>pg_rewind</application> will fail immediately if it finds
110123
files it cannot write directly to. This can happen for example when
@@ -342,34 +355,45 @@ GRANT EXECUTE ON function pg_catalog.pg_read_binary_file(text, bigint, bigint, b
342355
Copy all those changed blocks from the source cluster to
343356
the target cluster, either using direct file system access
344357
(<option>--source-pgdata</option>) or SQL (<option>--source-server</option>).
358+
Relation files are now in a state equivalent to the moment of the last
359+
completed checkpoint prior to the point at which the WAL timelines of the
360+
source and target diverged plus the current state on the source of any
361+
blocks changed on the target after that divergence.
345362
</para>
346363
</step>
347364
<step>
348365
<para>
349-
Copy all other files such as <filename>pg_xact</filename> and
350-
configuration files from the source cluster to the target cluster
351-
(everything except the relation files). Similarly to base backups,
352-
the contents of the directories <filename>pg_dynshmem/</filename>,
366+
Copy all other files, including new relation files, WAL segments,
367+
<filename>pg_xact</filename>, and configuration files from the source
368+
cluster to the target cluster. Similarly to base backups, the contents
369+
of the directories <filename>pg_dynshmem/</filename>,
353370
<filename>pg_notify/</filename>, <filename>pg_replslot/</filename>,
354371
<filename>pg_serial/</filename>, <filename>pg_snapshots/</filename>,
355-
<filename>pg_stat_tmp/</filename>, and
356-
<filename>pg_subtrans/</filename> are omitted from the data copied
357-
from the source cluster. Any file or directory beginning with
358-
<filename>pgsql_tmp</filename> is omitted, as well as are
372+
<filename>pg_stat_tmp/</filename>, and <filename>pg_subtrans/</filename>
373+
are omitted from the data copied from the source cluster. The files
359374
<filename>backup_label</filename>,
360375
<filename>tablespace_map</filename>,
361376
<filename>pg_internal.init</filename>,
362-
<filename>postmaster.opts</filename> and
363-
<filename>postmaster.pid</filename>.
377+
<filename>postmaster.opts</filename>, and
378+
<filename>postmaster.pid</filename>, as well as any file or directory
379+
beginning with <filename>pgsql_tmp</filename>, are omitted.
380+
</para>
381+
</step>
382+
<step>
383+
<para>
384+
Create a <filename>backup_label</filename> file to begin WAL replay at
385+
the checkpoint created at failover and configure the
386+
<filename>pg_control</filename> file with a minimum consistency LSN
387+
defined as the result of <literal>pg_current_wal_insert_lsn()</literal>
388+
when rewinding from a live source or the last checkpoint LSN when
389+
rewinding from a stopped source.
364390
</para>
365391
</step>
366392
<step>
367393
<para>
368-
Apply the WAL from the source cluster, starting from the checkpoint
369-
created at failover. (Strictly speaking, <application>pg_rewind</application>
370-
doesn't apply the WAL, it just creates a backup label file that
371-
makes <productname>PostgreSQL</productname> start by replaying all WAL from
372-
that checkpoint forward.)
394+
When starting the target, <productname>PostgreSQL</productname> replays
395+
all the required WAL, resulting in a data directory in a consistent
396+
state.
373397
</para>
374398
</step>
375399
</procedure>

0 commit comments

Comments
 (0)