Re: [HACKERS] Two pass CheckDeadlock in contentent case
От | Sokolov Yura |
---|---|
Тема | Re: [HACKERS] Two pass CheckDeadlock in contentent case |
Дата | |
Msg-id | [email protected] обсуждение исходный текст |
Ответ на | [HACKERS] Two pass CheckDeadlock in contentent case (Sokolov Yura <[email protected]>) |
Ответы |
Re: [HACKERS] Two pass CheckDeadlock in contentent case
Re: [HACKERS] Two pass CheckDeadlock in contentent case |
Список | pgsql-hackers |
On 2017-10-03 17:30, Sokolov Yura wrote: > Good day, hackers. > > During hard workload sometimes process reaches deadlock timeout > even if no real deadlock occurred. It is easily reproducible with > pg_xact_advisory_lock on same value + some time consuming > operation (update) and many clients. > > When backend reaches deadlock timeout, it calls CheckDeadlock, > which locks all partitions of lock hash in exclusive mode to > walk through graph and search for deadlock. > > If hundreds of backends reaches this timeout trying to acquire > advisory lock on a same value, it leads to hard-stuck for many > seconds, cause they all traverse same huge lock graph under > exclusive lock. > During this stuck there is no possibility to do any meaningful > operations (no new transaction can begin). > > Attached patch makes CheckDeadlock to do two passes: > - first pass uses LW_SHARED on partitions of lock hash. > DeadLockCheck is called with flag "readonly", so it doesn't > modify anything. > - If there is possibility of "soft" or "hard" deadlock detected, > ie if there is need to modify lock graph, then partitions > relocked with LW_EXCLUSIVE, and DeadLockCheck is called again. > > It fixes hard-stuck, cause backends walk lock graph under shared > lock, and found that there is no real deadlock. > > Test on 4 socket xeon machine: > pgbench_zipf -s 10 -c 800 -j 100 -M prepared -T 450 -f > ./ycsb_read_zipf.sql@50 -f ./ycsb_update_lock2_zipf.sql@50 -P 5 > > ycsb_read_zipf.sql: > \set i random_zipfian(1, 100000 * :scale, 1.01) > SELECT abalance FROM pgbench_accounts WHERE aid = :i; > ycsb_update_lock2_zipf.sql: > \set i random_zipfian(1, 100000 * :scale, 1.01) > select lock_and_update( :i ); > > CREATE OR REPLACE FUNCTION public.lock_and_update(i integer) > RETURNS void > LANGUAGE sql > AS $function$ > SELECT pg_advisory_xact_lock( $1 ); > UPDATE pgbench_accounts SET abalance = 1 WHERE aid = $1; > $function$ > > Without attached patch: > > progress: 5.0 s, 45707.0 tps, lat 15.599 ms stddev 83.757 > progress: 10.0 s, 51124.4 tps, lat 15.681 ms stddev 78.353 > progress: 15.0 s, 52293.8 tps, lat 15.327 ms stddev 77.017 > progress: 20.0 s, 51280.4 tps, lat 15.603 ms stddev 78.199 > progress: 25.0 s, 47278.6 tps, lat 16.795 ms stddev 83.570 > progress: 30.0 s, 41792.9 tps, lat 18.535 ms stddev 93.697 > progress: 35.0 s, 12393.7 tps, lat 33.757 ms stddev 169.116 > progress: 40.0 s, 0.0 tps, lat -nan ms stddev -nan > progress: 45.0 s, 0.0 tps, lat -nan ms stddev -nan > progress: 50.0 s, 1.2 tps, lat 2497.734 ms stddev 5393.166 > progress: 55.0 s, 0.0 tps, lat -nan ms stddev -nan > progress: 60.0 s, 27357.9 tps, lat 160.622 ms stddev 1866.625 > progress: 65.0 s, 38770.8 tps, lat 20.829 ms stddev 104.453 > progress: 70.0 s, 40553.2 tps, lat 19.809 ms stddev 99.741 > > (autovacuum led to trigger deadlock timeout, > and therefore, to stuck) > > Patched: > > progress: 5.0 s, 56264.7 tps, lat 12.847 ms stddev 66.980 > progress: 10.0 s, 55389.3 tps, lat 14.329 ms stddev 71.997 > progress: 15.0 s, 50757.7 tps, lat 15.730 ms stddev 78.222 > progress: 20.0 s, 50797.3 tps, lat 15.736 ms stddev 79.296 > progress: 25.0 s, 48485.3 tps, lat 16.432 ms stddev 82.720 > progress: 30.0 s, 45202.1 tps, lat 17.733 ms stddev 88.554 > progress: 35.0 s, 40035.8 tps, lat 19.343 ms stddev 97.787 > progress: 40.0 s, 14240.1 tps, lat 47.625 ms stddev 265.465 > progress: 45.0 s, 30150.6 tps, lat 31.140 ms stddev 196.114 > progress: 50.0 s, 38975.0 tps, lat 20.543 ms stddev 104.281 > progress: 55.0 s, 40573.9 tps, lat 19.770 ms stddev 99.487 > progress: 60.0 s, 38361.1 tps, lat 20.693 ms stddev 103.141 > progress: 65.0 s, 39793.3 tps, lat 20.216 ms stddev 101.080 > progress: 70.0 s, 38387.9 tps, lat 20.886 ms stddev 104.482 > > (autovacuum led to trigger deadlock timeout, > but postgresql did not stuck) > > I believe, patch will positively affect other heavy workloads > as well, although I have not collect benchmarks. > > `make check-world` passes with configured `--enable-tap-tests > --enable-casserts` Excuse me, corrected version is in attach. -- Sokolov Yura Postgres Professional: https://postgrespro.ru The Russian Postgres Company -- Sent via pgsql-hackers mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Robert HaasДата:
Сообщение: Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with alot of columns
Следующее
От: Tom LaneДата:
Сообщение: Re: [HACKERS] Postgresql gives error that role goes not exists while it exists