Skip to content

errors when reading from the mq, possibly blocking '-m fast' restart #2

@tvondra

Description

@tvondra

Hi Alexander,

I've done a review and a bit of testing of the extension today, and I've ran into some strange issues in high-concurrency environments. Essentially, I do have two pgbench tests running at the same time:

  1. a regular pgbench with 72 clients, using the standard workload (so "pgbench -c 72 ...")

  2. a pgbench reading the collected wait data, essentially running this custom SQL script (16 clients)

    select count() from pg_wait_sampling_current;
    select count(
    ) from pg_wait_sampling_history;
    select count(*) from pg_wait_sampling_profile;

After a short while, I get these errors in the second pgbench:

client 13 aborted in state 1: ERROR: Error reading mq.
client 4 aborted in state 1: ERROR: Error reading mq.

What's worse, running "pg_ctl restart" on the cluster times out - there's no CPU or I/O activity, the cluster should restart without any issue, but I suppose there are some locking issues or so, caused by the mq read failures.

Regarding the code - I'm not sure what is the purpose of setup_gucs(). Why not to simply define the GUC variables? If anything, get_guc_variables() is only meant to be used in help_config.c (per comment in guc.c).

Also, should the bgworker main method really do proc_exit(1) instead of proc_exit(0)? At least that's what the other workers I've seen do.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions