Skip to content

Commit 726926a

Browse files
committed
Update pgcvslog
1 parent 127f785 commit 726926a

File tree

4 files changed

+345
-146
lines changed

4 files changed

+345
-146
lines changed

doc/TODO

+3-1
Original file line numberDiff line numberDiff line change
@@ -192,7 +192,8 @@ PERFORMANCE
192192

193193
FSYNC
194194

195-
* Allow transaction commits with rollback with no-fsync performance [fsync](Vadim)
195+
* Allow transaction commits with rollback with no-fsync performance
196+
[fsync] (Vadim)
196197

197198
INDEXES
198199

@@ -231,6 +232,7 @@ MISC
231232
* Remove pg_listener index
232233
* Remove ANALYZE from VACUUM so it can be run separately without locks
233234
* Gather more accurate statistics using indexes
235+
* Improve statistics storage in pg_class [performance]
234236

235237
SOURCE CODE
236238
-----------

doc/TODO.detail/performance

+211
Original file line numberDiff line numberDiff line change
@@ -341,3 +341,214 @@ Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612
341341
good, you'll have to ram them down people's throats." -- Howard Aiken
342342

343343

344+
From [email protected] Tue Oct 19 10:31:10 1999
345+
Received: from renoir.op.net ([email protected] [209.152.193.4])
346+
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA29087
347+
for <[email protected]>; Tue, 19 Oct 1999 10:31:08 -0400 (EDT)
348+
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.2 $) with ESMTP id KAA27535 for <[email protected]>; Tue, 19 Oct 1999 10:19:47 -0400 (EDT)
349+
Received: from localhost (majordom@localhost)
350+
by hub.org (8.9.3/8.9.3) with SMTP id KAA30328;
351+
Tue, 19 Oct 1999 10:12:10 -0400 (EDT)
352+
(envelope-from owner-pgsql-hackers)
353+
Received: by hub.org (bulk_mailer v1.5); Tue, 19 Oct 1999 10:11:55 -0400
354+
Received: (from majordom@localhost)
355+
by hub.org (8.9.3/8.9.3) id KAA30030
356+
for pgsql-hackers-outgoing; Tue, 19 Oct 1999 10:11:00 -0400 (EDT)
357+
(envelope-from [email protected])
358+
Received: from sss.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
359+
by hub.org (8.9.3/8.9.3) with ESMTP id KAA29914
360+
for <[email protected]>; Tue, 19 Oct 1999 10:10:33 -0400 (EDT)
361+
(envelope-from [email protected])
362+
Received: from sss.sss.pgh.pa.us (localhost [127.0.0.1])
363+
by sss.sss.pgh.pa.us (8.9.1/8.9.1) with ESMTP id KAA09038;
364+
Tue, 19 Oct 1999 10:09:15 -0400 (EDT)
365+
To: "Hiroshi Inoue" <[email protected]>
366+
cc: "Vadim Mikheev" <[email protected]>, [email protected]
367+
Subject: Re: [HACKERS] mdnblocks is an amazing time sink in huge relations
368+
In-reply-to: Your message of Tue, 19 Oct 1999 19:03:22 +0900
369+
370+
Date: Tue, 19 Oct 1999 10:09:15 -0400
371+
Message-ID: <[email protected]>
372+
From: Tom Lane <[email protected]>
373+
374+
Status: OR
375+
376+
"Hiroshi Inoue" <[email protected]> writes:
377+
> 1. shared cache holds committed system tuples.
378+
> 2. private cache holds uncommitted system tuples.
379+
> 3. relpages of shared cache are updated immediately by
380+
> phisical change and corresponding buffer pages are
381+
> marked dirty.
382+
> 4. on commit, the contents of uncommitted tuples except
383+
> relpages,reltuples,... are copied to correponding tuples
384+
> in shared cache and the combined contents are
385+
> committed.
386+
> If so,catalog cache invalidation would be no longer needed.
387+
> But synchronization of the step 4. may be difficult.
388+
389+
I think the main problem is that relpages and reltuples shouldn't
390+
be kept in pg_class columns at all, because they need to have
391+
very different update behavior from the other pg_class columns.
392+
393+
The rest of pg_class is update-on-commit, and we can lock down any one
394+
row in the normal MVCC way (if transaction A has modified a row and
395+
transaction B also wants to modify it, B waits for A to commit or abort,
396+
so it can know which version of the row to start from). Furthermore,
397+
there can legitimately be several different values of a row in use in
398+
different places: the latest committed, an uncommitted modification, and
399+
one or more old values that are still being used by active transactions
400+
because they were current when those transactions started. (BTW, the
401+
present relcache is pretty bad about maintaining pure MVCC transaction
402+
semantics like this, but it seems clear to me that that's the direction
403+
we want to go in.)
404+
405+
relpages cannot operate this way. To be useful for avoiding lseeks,
406+
relpages *must* change exactly when the physical file changes. It
407+
matters not at all whether the particular transaction that extended the
408+
file ultimately commits or not. Moreover there can be only one correct
409+
value (per relation) across the whole system, because there is only one
410+
length of the relation file.
411+
412+
If we want to take reltuples seriously and try to maintain it
413+
on-the-fly, then I think it needs still a third behavior. Clearly
414+
it cannot be updated using MVCC rules, or we lose all writer
415+
concurrency (if A has added tuples to a rel, B would have to wait
416+
for A to commit before it could update reltuples...). Furthermore
417+
"updating" isn't a simple matter of storing what you think the new
418+
value is; otherwise two transactions adding tuples in parallel would
419+
leave the wrong answer after B commits and overwrites A's value.
420+
I think it would work for each transaction to keep track of a net delta
421+
in reltuples for each table it's changed (total tuples added less total
422+
tuples deleted), and then atomically add that value to the table's
423+
shared reltuples counter during commit. But that still leaves the
424+
problem of how you use the counter during a transaction to get an
425+
accurate answer to the question "If I scan this table now, how many tuples
426+
will I see?" At the time the question is asked, the current shared
427+
counter value might include the effects of transactions that have
428+
committed since your transaction started, and therefore are not visible
429+
under MVCC rules. I think getting the correct answer would involve
430+
making an instantaneous copy of the current counter at the start of
431+
your xact, and then adding your own private net-uncommitted-delta to
432+
the saved shared counter value when asked the question. This doesn't
433+
look real practical --- you'd have to save the reltuples counts of
434+
*all* tables in the database at the start of each xact, on the off
435+
chance that you might need them. Ugh. Perhaps someone has a better
436+
idea. In any case, reltuples clearly needs different mechanisms than
437+
the ordinary fields in pg_class do, because updating it will be a
438+
performance bottleneck otherwise.
439+
440+
If we allow reltuples to be updated only by vacuum-like events, as
441+
it is now, then I think keeping it in pg_class is still OK.
442+
443+
In short, it seems clear to me that relpages should be removed from
444+
pg_class and kept somewhere else if we want to make it more reliable
445+
than it is now, and the same for reltuples (but reltuples doesn't
446+
behave the same as relpages, and probably ought to be handled
447+
differently).
448+
449+
regards, tom lane
450+
451+
************
452+
453+
From [email protected] Tue Oct 19 21:25:30 1999
454+
Received: from renoir.op.net ([email protected] [209.152.193.4])
455+
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id VAA28130
456+
for <[email protected]>; Tue, 19 Oct 1999 21:25:26 -0400 (EDT)
457+
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.2 $) with ESMTP id VAA10512 for <[email protected]>; Tue, 19 Oct 1999 21:15:28 -0400 (EDT)
458+
Received: from localhost (majordom@localhost)
459+
by hub.org (8.9.3/8.9.3) with SMTP id VAA50745;
460+
Tue, 19 Oct 1999 21:07:23 -0400 (EDT)
461+
(envelope-from owner-pgsql-hackers)
462+
Received: by hub.org (bulk_mailer v1.5); Tue, 19 Oct 1999 21:07:01 -0400
463+
Received: (from majordom@localhost)
464+
by hub.org (8.9.3/8.9.3) id VAA50644
465+
for pgsql-hackers-outgoing; Tue, 19 Oct 1999 21:06:06 -0400 (EDT)
466+
(envelope-from [email protected])
467+
Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34])
468+
by hub.org (8.9.3/8.9.3) with ESMTP id VAA50584
469+
for <[email protected]>; Tue, 19 Oct 1999 21:05:26 -0400 (EDT)
470+
(envelope-from [email protected])
471+
Received: from cadzone ([126.0.1.40] (may be forged))
472+
by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP
473+
id KAA01715; Wed, 20 Oct 1999 10:05:14 +0900
474+
From: "Hiroshi Inoue" <[email protected]>
475+
To: "Tom Lane" <[email protected]>
476+
477+
Subject: RE: [HACKERS] mdnblocks is an amazing time sink in huge relations
478+
Date: Wed, 20 Oct 1999 10:09:13 +0900
479+
Message-ID: <[email protected]>
480+
MIME-Version: 1.0
481+
Content-Type: text/plain;
482+
charset="iso-8859-1"
483+
Content-Transfer-Encoding: 7bit
484+
X-Priority: 3 (Normal)
485+
X-MSMail-Priority: Normal
486+
X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0
487+
X-Mimeole: Produced By Microsoft MimeOLE V4.72.2106.4
488+
Importance: Normal
489+
490+
Status: ORr
491+
492+
> -----Original Message-----
493+
> From: Hiroshi Inoue [mailto:[email protected]]
494+
> Sent: Tuesday, October 19, 1999 6:45 PM
495+
> To: Tom Lane
496+
497+
> Subject: RE: [HACKERS] mdnblocks is an amazing time sink in huge
498+
> relations
499+
>
500+
>
501+
> >
502+
> > "Hiroshi Inoue" <[email protected]> writes:
503+
>
504+
> [snip]
505+
>
506+
> >
507+
> > > Deletion is necessary only not to consume disk space.
508+
> > >
509+
> > > For example vacuum could remove not deleted files.
510+
> >
511+
> > Hmm ... interesting idea ... but I can hear the complaints
512+
> > from users already...
513+
> >
514+
>
515+
> My idea is only an analogy of PostgreSQL's simple recovery
516+
> mechanism of tuples.
517+
>
518+
> And my main point is
519+
> "delete fails after commit" doesn't harm the database
520+
> except that not deleted files consume disk space.
521+
>
522+
> Of cource,it's preferable to delete relation files immediately
523+
> after(or just when) commit.
524+
> Useless files are visible though useless tuples are invisible.
525+
>
526+
527+
Anyway I don't need "DROP TABLE inside transactions" now
528+
and my idea is originally for that issue.
529+
530+
After a thought,I propose the following solution.
531+
532+
1. mdcreate() couldn't create existent relation files.
533+
If the existent file is of length zero,we would overwrite
534+
the file.(seems the comment in md.c says so but the
535+
code doesn't do so).
536+
If the file is an Index relation file,we would overwrite
537+
the file.
538+
539+
2. mdunlink() couldn't unlink non-existent relation files.
540+
mdunlink() doesn't call elog(ERROR) even if the file
541+
doesn't exist,though I couldn't find where to change
542+
now.
543+
mdopen() doesn't call elog(ERROR) even if the file
544+
doesn't exist and leaves the relation as CLOSED.
545+
546+
Comments ?
547+
548+
Regards.
549+
550+
Hiroshi Inoue
551+
552+
553+
************
554+

0 commit comments

Comments
 (0)