Skip to content

Commit 7d81bdc

Browse files
committedJul 24, 2019
Improve stability of TAP test for synchronous replication
Slow buildfarm machines have run into issues with this TAP test caused by a race condition related to the startup of a set of standbys, where it is possible to finish with an unexpected order in the WAL sender array of the primary. This closes the race condition by making sure that any standby started is registered into the WAL sender array of the primary before starting the next one based on lookups of pg_stat_replication. Backpatch down to 9.6 where the test has been introduced. Author: Michael Paquier Reviewed-by: Álvaro Herrera, Noah Misch Discussion: https://postgr.es/m/20190617055145.GB18917@paquier.xyz Backpatch-through: 9.6
1 parent 5562272 commit 7d81bdc

File tree

1 file changed

+33
-9
lines changed

1 file changed

+33
-9
lines changed
 

‎src/test/recovery/t/007_sync_rep.pl

+33-9
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,23 @@ sub test_sync_state
2727
return;
2828
}
2929

30+
# Start a standby and check that it is registered within the WAL sender
31+
# array of the given primary. This polls the primary's pg_stat_replication
32+
# until the standby is confirmed as registered.
33+
sub start_standby_and_wait
34+
{
35+
my ($master, $standby) = @_;
36+
my $master_name = $master->name;
37+
my $standby_name = $standby->name;
38+
my $query =
39+
"SELECT count(1) = 1 FROM pg_stat_replication WHERE application_name = '$standby_name'";
40+
41+
$standby->start;
42+
43+
print("### Waiting for standby \"$standby_name\" on \"$master_name\"\n");
44+
$master->poll_query_until('postgres', $query);
45+
}
46+
3047
# Initialize master node
3148
my $node_master = get_new_node('master');
3249
$node_master->init(allows_streaming => 1);
@@ -36,23 +53,26 @@ sub test_sync_state
3653
# Take backup
3754
$node_master->backup($backup_name);
3855

56+
# Create all the standbys. Their status on the primary is checked to ensure
57+
# the ordering of each one of them in the WAL sender array of the primary.
58+
3959
# Create standby1 linking to master
4060
my $node_standby_1 = get_new_node('standby1');
4161
$node_standby_1->init_from_backup($node_master, $backup_name,
4262
has_streaming => 1);
43-
$node_standby_1->start;
63+
start_standby_and_wait($node_master, $node_standby_1);
4464

4565
# Create standby2 linking to master
4666
my $node_standby_2 = get_new_node('standby2');
4767
$node_standby_2->init_from_backup($node_master, $backup_name,
4868
has_streaming => 1);
49-
$node_standby_2->start;
69+
start_standby_and_wait($node_master, $node_standby_2);
5070

5171
# Create standby3 linking to master
5272
my $node_standby_3 = get_new_node('standby3');
5373
$node_standby_3->init_from_backup($node_master, $backup_name,
5474
has_streaming => 1);
55-
$node_standby_3->start;
75+
start_standby_and_wait($node_master, $node_standby_3);
5676

5777
# Check that sync_state is determined correctly when
5878
# synchronous_standby_names is specified in old syntax.
@@ -82,8 +102,10 @@ sub test_sync_state
82102
$node_standby_2->stop;
83103
$node_standby_3->stop;
84104

85-
$node_standby_2->start;
86-
$node_standby_3->start;
105+
# Make sure that each standby reports back to the primary in the wanted
106+
# order.
107+
start_standby_and_wait($node_master, $node_standby_2);
108+
start_standby_and_wait($node_master, $node_standby_3);
87109

88110
# Specify 2 as the number of sync standbys.
89111
# Check that two standbys are in 'sync' state.
@@ -94,7 +116,7 @@ sub test_sync_state
94116
'2(standby1,standby2,standby3)');
95117

96118
# Start standby1
97-
$node_standby_1->start;
119+
start_standby_and_wait($node_master, $node_standby_1);
98120

99121
# Create standby4 linking to master
100122
my $node_standby_4 = get_new_node('standby4');
@@ -126,14 +148,16 @@ sub test_sync_state
126148

127149
# The setting that * comes before another standby name is acceptable
128150
# but does not make sense in most cases. Check that sync_state is
129-
# chosen properly even in case of that setting.
130-
# The priority of standby2 should be 2 because it matches * first.
151+
# chosen properly even in case of that setting. standby1 is selected
152+
# as synchronous as it has the highest priority, and is followed by a
153+
# second standby listed first in the WAL sender array, which is
154+
# standby2 in this case.
131155
test_sync_state(
132156
$node_master, qq(standby1|1|sync
133157
standby2|2|sync
134158
standby3|2|potential
135159
standby4|2|potential),
136-
'asterisk comes before another standby name',
160+
'asterisk before another standby name',
137161
'2(standby1,*,standby2)');
138162

139163
# Check that the setting of '2(*)' chooses standby2 and standby3 that are stored

0 commit comments

Comments
 (0)
Please sign in to comment.