Skip to content

Commit 4fdbf9a

Browse files
committed
Tighten TAP tests' tracking of postmaster state some more.
Commits 6c4a890 et al. had a couple of deficiencies: * The logic I added to Cluster::start to see if a PID file is present could be fooled by a stale PID file left over from a previous postmaster. To fix, if we're not sure whether we expect to find a running postmaster or not, validate the PID using "kill 0". * 017_shm.pl has a loop in which it just issues repeated Cluster::start calls; this will fail if some invocation fails but leaves self->_pid set. Per buildfarm results, the above fix is not enough to make this safe: we might have "validated" a PID for a postmaster that exits immediately after we look. Hence, match each failed start call with a stop call that will get us back to the self->_pid == undef state. Add a fail_ok option to Cluster::stop to make this work. Discussion: https://postgr.es/m/CA+hUKGKV6fOHvfiPt8=dOKzvswjAyLoFoJF1iQXMNpi7+hD1JQ@mail.gmail.com
1 parent 3500ccc commit 4fdbf9a

File tree

2 files changed

+37
-6
lines changed

2 files changed

+37
-6
lines changed

src/test/perl/PostgreSQL/Test/Cluster.pm

Lines changed: 34 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -896,23 +896,40 @@ Note: if the node is already known stopped, this does nothing.
896896
However, if we think it's running and it's not, it's important for
897897
this to fail. Otherwise, tests might fail to detect server crashes.
898898
899+
With optional extra param fail_ok => 1, returns 0 for failure
900+
instead of bailing out.
901+
899902
=cut
900903

901904
sub stop
902905
{
903-
my ($self, $mode) = @_;
904-
my $port = $self->port;
906+
my ($self, $mode, %params) = @_;
905907
my $pgdata = $self->data_dir;
906908
my $name = $self->name;
909+
my $ret;
907910

908911
local %ENV = $self->_get_env();
909912

910913
$mode = 'fast' unless defined $mode;
911-
return unless defined $self->{_pid};
914+
return 1 unless defined $self->{_pid};
915+
912916
print "### Stopping node \"$name\" using mode $mode\n";
913-
PostgreSQL::Test::Utils::system_or_bail('pg_ctl', '-D', $pgdata, '-m', $mode, 'stop');
917+
$ret = PostgreSQL::Test::Utils::system_log('pg_ctl', '-D', $pgdata,
918+
'-m', $mode, 'stop');
919+
920+
if ($ret != 0)
921+
{
922+
print "# pg_ctl stop failed: $ret\n";
923+
924+
# Check to see if we still have a postmaster or not.
925+
$self->_update_pid(-1);
926+
927+
BAIL_OUT("pg_ctl stop failed") unless $params{fail_ok};
928+
return 0;
929+
}
930+
914931
$self->_update_pid(0);
915-
return;
932+
return 1;
916933
}
917934

918935
=pod
@@ -1142,9 +1159,20 @@ sub _update_pid
11421159
if (open my $pidfile, '<', $self->data_dir . "/postmaster.pid")
11431160
{
11441161
chomp($self->{_pid} = <$pidfile>);
1145-
print "# Postmaster PID for node \"$name\" is $self->{_pid}\n";
11461162
close $pidfile;
11471163

1164+
# If we aren't sure what to expect, validate the PID using kill().
1165+
# This protects against stale PID files left by crashed postmasters.
1166+
if ($is_running == -1 && kill(0, $self->{_pid}) == 0)
1167+
{
1168+
print
1169+
"# Stale postmaster.pid file for node \"$name\": PID $self->{_pid} no longer exists\n";
1170+
$self->{_pid} = undef;
1171+
return;
1172+
}
1173+
1174+
print "# Postmaster PID for node \"$name\" is $self->{_pid}\n";
1175+
11481176
# If we found a pidfile when there shouldn't be one, complain.
11491177
BAIL_OUT("postmaster.pid unexpectedly present") if $is_running == 0;
11501178
return;

src/test/recovery/t/017_shm.pl

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -207,6 +207,9 @@ sub poll_start
207207
# Wait 0.1 second before retrying.
208208
usleep(100_000);
209209

210+
# Clean up in case the start attempt just timed out or some such.
211+
$node->stop('fast', fail_ok => 1);
212+
210213
$attempts++;
211214
}
212215

0 commit comments

Comments
 (0)