users/simon/postgres.git - Simon's dev repository

Age	Commit message (Collapse)	Author
2009-10-02	Make sure that GIN fast-insert and regular code paths enforce the same	Tom Lane
	tuple size limit. Improve the error message for index-tuple-too-large so that it includes the actual size, the limit, and the index name. Sync with the btree occurrences of the same error. Back-patch to 8.4 because it appears that the out-of-sync problem is occurring in the field. Teodor and Tom
2009-09-01	Force VACUUM to recalculate oldestXmin even when we haven't changed our	Tom Lane
	own database's datfrozenxid, if the current value is old enough to be forcing autovacuums or warning messages. This ensures that a bogus value is replaced as soon as possible. Per a comment from Heikki.
2009-09-01	Remove flatfiles.c, which is now obsolete.	Alvaro Herrera
	Recent commits have removed the various uses it was supporting. It was a performance bottleneck, according to bug report #4919 by Lauris Ulmanis; seems it slowed down user creation after a billion users.
2009-08-31	Track the current XID wrap limit (or more accurately, the oldest unfrozen	Tom Lane
	XID) in checkpoint records. This eliminates the need to recompute the value from scratch during database startup, which is one of the two remaining reasons for the flatfile code to exist. It should also simplify life for hot-standby operation. To avoid bloating the checkpoint records unreasonably, I switched from tracking the oldest database by name to tracking it by OID. This turns out to save cycles in general (everywhere but the warning-generating paths, which we hardly care about) and also helps us deal with the case that the oldest database got dropped instead of being vacuumed. The prior coding might go for a long time without updating the wrap limit in that case, which is bad because it might result in a lot of useless autovacuum activity.
2009-08-24	Fix a violation of WAL coding rules in the recent patch to include an	Tom Lane
	"all tuples visible" flag in heap page headers. The flag update must be applied before calling XLogInsert, but heap_update and the tuple moving routines in VACUUM FULL were ignoring this rule. A crash and replay could therefore leave the flag incorrectly set, causing rows to appear visible in seqscans when they should not be. This might explain recent reports of data corruption from Jeff Ross and others. In passing, do a bit of editorialization on comments in visibilitymap.c.
2009-08-06	Improve plpgsql's ability to cope with rowtypes containing dropped columns,	Tom Lane
	by supporting conversions in places that used to demand exact rowtype match. Since this issue is certain to come up elsewhere (in fact, already has, in ExecEvalConvertRowtype), factor out the support code into new core functions for tuple conversion. I chose to put these in a new source file since heaptuple.c is already overly long. Heavily revised version of a patch by Pavel Stehule.
2009-08-01	Department of second thoughts: let's show the exact key during unique index	Tom Lane
	build failures, too. Refactor a bit more since that error message isn't spelled the same.
2009-08-01	Improve unique-constraint-violation error messages to include the exact	Tom Lane
	values being complained of. In passing, also remove the arbitrary length limitation in the similar error detail message for foreign key violations. Itagaki Takahiro
2009-07-29	Support deferrable uniqueness constraints.	Tom Lane
	The current implementation fires an AFTER ROW trigger for each tuple that looks like it might be non-unique according to the index contents at the time of insertion. This works well as long as there aren't many conflicts, but won't scale to massive unique-key reassignments. Improving that case is a TODO item. Dean Rasheed
2009-07-22	Tweak TOAST code so that columns marked with MAIN storage strategy are	Tom Lane
	not forced out-of-line unless that is necessary to make the row fit on a page. Previously, they were forced out-of-line if needed to get the row down to the default target size (1/4th page). Kevin Grittner
2009-06-26	Cleanup and code review for the patch that made bgwriter active during	Tom Lane
	archive recovery. Invent a separate state variable and inquiry function for XLogInsertAllowed() to clarify some tests and make the management of writing the end-of-recovery checkpoint less klugy. Fix several places that were incorrectly testing InRecovery when they should be looking at RecoveryInProgress or XLogInsertAllowed (because they will now be executed in the bgwriter not startup process). Clarify handling of bad LSNs passed to XLogFlush during recovery. Use a spinlock for setting/testing SharedRecoveryInProgress. Improve quite a lot of comments. Heikki and Tom
2009-06-25	Fix some serious bugs in archive recovery, now that bgwriter is active	Heikki Linnakangas
	during it: When bgwriter is active, the startup process can't perform mdsync() correctly because it won't see the fsync requests accumulated in bgwriter's private pendingOpsTable. Therefore make bgwriter responsible for the end-of-recovery checkpoint as well, when it's active. When bgwriter is active (= archive recovery), the startup process must not accumulate fsync requests to its own pendingOpsTable, since bgwriter won't see them there when it performs restartpoints. Make startup process drop its pendingOpsTable when bgwriter is launched to avoid that. Update minimum recovery point one last time when leaving archive recovery. It won't be updated by the end-of-recovery checkpoint because XLogFlush() sees us as out of recovery already. This fixes bug #4879 reported by Fujii Masao.
2009-06-11	8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list	Bruce Momjian
	provided by Andrew.
2009-06-06	Improve the IndexVacuumInfo/IndexBulkDeleteResult API to allow somewhat sane	Tom Lane
	behavior in cases where we don't know the heap tuple count accurately; in particular partial vacuum, but this also makes the API a bit more useful for ANALYZE. This patch adds "estimated_count" flags to both structs so that an approximate count can be flagged as such, and adjusts the logic so that approximate counts are not used for updating pg_class.reltuples. This fixes my previous complaint that VACUUM was putting ridiculous values into pg_class.reltuples for indexes. The actual impact of that bug is limited, because the planner only pays attention to reltuples for an index if the index is partial; which probably explains why beta testers hadn't noticed a degradation in plan quality from it. But it needs to be fixed. The whole thing is a bit messy and should be redesigned in future, because reltuples now has the potential to drift quite far away from reality when a long period elapses with no non-partial vacuums. But this is as good as it's going to get for 8.4.
2009-06-06	Fix a serious bug introduced into GIN in 8.4: now that MergeItemPointers()	Tom Lane
	is supposed to remove duplicate heap TIDs, we have to be sure to reduce the tuple size and posting-item count accordingly in addItemPointersToTuple(). Failing to do so resulted in the effective injection of garbage TIDs into the index contents, ie, whatever happened to be in the memory palloc'd for the new tuple. I'm not sure that this fully explains the index corruption reported by Tatsuo Ishii, but the test case I'm using no longer fails.
2009-06-05	GIN's ItemPointerIsMin, ItemPointerIsMax, and ItemPointerIsLossyPage macros	Tom Lane
	should use GinItemPointerGetBlockNumber/GinItemPointerGetOffsetNumber, not ItemPointerGetBlockNumber/ItemPointerGetOffsetNumber, because the latter will Assert() on ip_posid == 0, ie a "Min" pointer. (Thus, ItemPointerIsMin has never worked at all, but it seems unused at present.) I'm not certain that the case can occur in normal functioning, but it's blowing up on me while investigating Tatsuo-san's data corruption problem. In any case it seems like a problem waiting to bite someone. Back-patch just in case this really is a problem for somebody in the field.
2009-05-24	Use more-portable coding for the check on handing out the last available	Tom Lane
	relopt_kind value in add_reloption_kind(). Per Zdenek Kotala.
2009-05-12	Fix LOCK TABLE to eliminate the race condition that could make it give weird	Tom Lane
	errors when tables are concurrently dropped. To do this we must take lock on each relation before we check its privileges. The old code was trying to do that the other way around, which is a bit pointless when there are lots of other commands that lock relations before checking privileges. I did keep it checking each relation's privilege before locking the next relation, which is a detail that ALTER TABLE isn't too picky about.
2009-05-08	'PGDLLIMPORT' ShmemVariableCache, needed for pg_migrator.so function	Bruce Momjian
	linkage on Win32. Tested by Hiroshi Saito
2009-04-04	Disallow setting fillfactor for TOAST tables.	Alvaro Herrera
	To implement this without almost duplicating the reloption table, treat relopt_kind as a bitmask instead of an integer value. This decreases the range of allowed values, but it's not clear that there's need for that much values anyway. This patch also makes heap_reloptions explicitly a no-op for relation kinds other than heap and TOAST tables. Patch by ITAGAKI Takahiro with minor edits from me. (In particular I removed the bit about adding relation kind to an error message, which I intend to commit separately.)
2009-03-25	Adjust the APIs for GIN opclass support functions to allow the extractQuery()	Tom Lane
	method to pass extra data to the consistent() and comparePartial() methods. This is the core infrastructure needed to support the soon-to-appear contrib/btree_gin module. The APIs are still upward compatible with the definitions used in 8.3 and before, although not with the previous 8.4devel function definitions. catversion bump for changes in pg_proc entries (although these are just cosmetic, since GIN doesn't actually look at the function signature before calling it...) Teodor Sigaev and Oleg Bartunov
2009-03-24	Install a search tree depth limit in GIN bulk-insert operations, to prevent	Tom Lane
	them from degrading badly when the input is sorted or nearly so. In this scenario the tree is unbalanced to the point of becoming a mere linked list, so insertions become O(N^2). The easiest and most safely back-patchable solution is to stop growing the tree sooner, ie limit the growth of N. We might later consider a rebalancing tree algorithm, but it's not clear that the benefit would be worth the cost and complexity. Per report from Sergey Burladyan and an earlier complaint from Heikki. Back-patch to 8.2; older versions didn't have GIN indexes.
2009-03-24	Implement "fastupdate" support for GIN indexes, in which we try to accumulate	Tom Lane
	multiple index entries in a holding area before adding them to the main index structure. This helps because bulk insert is (usually) significantly faster than retail insert for GIN. This patch also removes GIN support for amgettuple-style index scans. The API defined for amgettuple is difficult to support with fastupdate, and the previously committed partial-match feature didn't really work with it either. We might eventually figure a way to put back amgettuple support, but it won't happen for 8.4. catversion bumped because of change in GIN's pg_am entry, and because the format of GIN indexes changed on-disk (there's a metapage now, and possibly a pending list). Teodor Sigaev
2009-03-23	Const-ify the parse table passed to fillRelOptions. The previous coding	Tom Lane
	meant it had to be built on-the-fly at each entry to default_reloptions.
2009-02-18	Start background writer during archive recovery. Background writer now performs	Heikki Linnakangas
	its usual buffer cleaning duties during archive recovery, and it's responsible for performing restartpoints. This requires some changes in postmaster. When the startup process has done all the initialization and is ready to start WAL redo, it signals the postmaster to launch the background writer. The postmaster is signaled again when the point in recovery is reached where we know that the database is in consistent state. Postmaster isn't interested in that at the moment, but that's the point where we could let other backends in to perform read-only queries. The postmaster is signaled third time when the recovery has ended, so that postmaster knows that it's safe to start accepting connections. The startup process now traps SIGTERM, and performs a "clean" shutdown. If you do a fast shutdown during recovery, a shutdown restartpoint is performed, like a shutdown checkpoint, and postmaster kills the processes cleanly. You still have to continue the recovery at next startup, though. Currently, the background writer is only launched during archive recovery. We could launch it during crash recovery as well, but it seems better to keep that codepath as simple as possible, for the sake of robustness. And it couldn't do any restartpoints during crash recovery anyway, so it wouldn't be that useful. log_restartpoints is gone. Use log_checkpoints instead. This is yet to be documented. This whole operation is a pre-requisite for Hot Standby, but has some value of its own whether the hot standby patch makes 8.4 or not. Simon Riggs, with lots of modifications by me.
2009-02-02	Allow reloption names to have qualifiers, initially supporting a TOAST	Alvaro Herrera
	qualifier, and add support for this in pg_dump. This allows TOAST tables to have user-defined fillfactor, and will also enable us to move the autovacuum parameters to reloptions without taking away the possibility of setting values for TOAST tables.
2009-01-26	Allow extracting and parsing of reloptions from a bare pg_class tuple, and	Alvaro Herrera
	refactor the relcache code that used to do that. This allows other callers (particularly autovacuum) to do the same without necessarily having to open and lock a table.
2009-01-20	Add a new option to RestoreBkpBlocks() to indicate if a cleanup lock should	Heikki Linnakangas
	be used instead of the normal exclusive lock, and make WAL redo functions responsible for calling RestoreBkpBlocks(). They know better what kind of a lock they need. At the moment, this just moves things around with no functional change, but makes the hot standby patch that's under review cleaner.
2009-01-12	Simplify the writing of amoptions routines by introducing a convenience	Alvaro Herrera
	fillRelOptions routine that stores the parsed values in the struct using a table-based approach. Per Tom suggestion. Also remove the "continue" in HANDLE_*_RELOPTION macros, which were useless and in spirit they were assuming too much of how the macros were going to be used. (Note that these macros are now unused, but the intention is to introduce some usage in a future autovacuum patch, which is why they weren't completely removed.) Also, do not call the string validation routine when not validating. It seems less error-prone this way, per commentary on the amoptions SGML docs.
2009-01-10	Revise the TIDBitmap API to support multiple concurrent iterations over a	Tom Lane
	bitmap. This is extracted from Greg Stark's posix_fadvise patch; it seems worth committing separately, since it's potentially useful independently of posix_fadvise.
2009-01-08	A couple further reloptions improvements, per KaiGai Kohei: add a validation	Alvaro Herrera
	function to the string type and add a couple of macros for string handling. In passing, fix an off-by-one bug of mine.
2009-01-06	Fix string reloption handling, per KaiGai Kohei.	Alvaro Herrera

2009-01-05	Change the reloptions machinery to use a table-based parser, and provide	Alvaro Herrera
	a more complete framework for writing custom option processing routines by user-defined access methods. Catalog version bumped due to the general API changes, which are going to affect user-defined "amoptions" routines.
2009-01-01	Update copyright for 2009.	Bruce Momjian

2008-12-30	The flag to mark dead tuples is nowadays called LP_DEAD, not LP_DELETE.	Heikki Linnakangas
	Simon Riggs.
2008-12-12	Reduce some rel.h inclusions, and add pg_list.h to pg_proc_fn.h.	Alvaro Herrera

2008-12-03	Introduce visibility map. The visibility map is a bitmap with one bit per	Heikki Linnakangas
	heap page, where a set bit indicates that all tuples on the page are visible to all transactions, and the page therefore doesn't need vacuuming. It is stored in a new relation fork. Lazy vacuum uses the visibility map to skip pages that don't need vacuuming. Vacuum is also responsible for setting the bits in the map. In the future, this can hopefully be used to implement index-only-scans, but we can't currently guarantee that the visibility map is always 100% up-to-date. In addition to the visibility map, there's a new PD_ALL_VISIBLE flag on each heap page, also indicating that all tuples on the page are visible to all transactions. It's important that this flag is kept up-to-date. It is also used to skip visibility tests in sequential scans, which gives a small performance gain on seqscans.
2008-11-30	Clean up the API for DestReceiver objects by eliminating the assumption	Tom Lane
	that a Portal is a useful and sufficient additional argument for CreateDestReceiver --- it just isn't, in most cases. Instead formalize the approach of passing any needed parameters to the receiver separately. One unexpected benefit of this change is that we can declare typedef Portal in a less surprising location. This patch is just code rearrangement and doesn't change any functionality. I'll tackle the HOLD-cursor-vs-toast problem in a follow-on patch.
2008-11-19	Rethink the way FSM truncation works. Instead of WAL-logging FSM	Heikki Linnakangas
	truncations in FSM code, call FreeSpaceMapTruncateRel from smgr_redo. To make that cleaner from modularity point of view, move the WAL-logging one level up to RelationTruncate, and move RelationTruncate and all the related WAL-logging to new src/backend/catalog/storage.c file. Introduce new RelationCreateStorage and RelationDropStorage functions that are used instead of calling smgrcreate/smgrscheduleunlink directly. Move the pending rel deletion stuff from smgrcreate/smgrscheduleunlink to the new functions. This leaves smgr.c as a thin wrapper around md.c; all the transactional stuff is now in storage.c. This will make it easier to add new forks with similar truncation logic, like the visibility map.
2008-11-14	Replace the usage of heap_addheader to create pg_attribute tuples with regular	Alvaro Herrera
	heap_form_tuple. Since this removes the last remaining caller of heap_addheader, remove it. Extracted from the column privileges patch from Stephen Frost, with further code cleanups by me.
2008-11-06	Improve bulk-insert performance by keeping the current target buffer pinned	Tom Lane
	(but not locked, as that would risk deadlocks). Also, make it work in a small ring of buffers to avoid having bulk inserts trash the whole buffer arena. Robert Haas, after an idea of Simon Riggs'.
2008-11-03	Clean up the messy semantics (not to mention inefficiency) of PageGetTempPage	Tom Lane
	by splitting it into three functions with better-defined behaviors. Zdenek Kotala
2008-11-02	Remove all uses of the deprecated functions heap_formtuple, heap_modifytuple,	Tom Lane
	and heap_deformtuple in favor of the newer functions heap_form_tuple et al (which do the same things but use bool control flags instead of arbitrary char values). Eliminate the former duplicate coding of these functions, reducing the deprecated functions to mere wrappers around the newer ones. We can't get rid of them entirely because add-on modules probably still contain many instances of the old coding style. Kris Jurka
2008-10-31	Unite ReadBufferWithFork, ReadBufferWithStrategy, and ZeroOrReadBuffer	Heikki Linnakangas
	functions into one ReadBufferExtended function, that takes the strategy and mode as argument. There's three modes, RBM_NORMAL which is the default used by plain ReadBuffer(), RBM_ZERO, which replaces ZeroOrReadBuffer, and a new mode RBM_ZERO_ON_ERROR, which allows callers to read corrupt pages without throwing an error. The FSM needs the new mode to recover from corrupt pages, which could happend if we crash after extending an FSM file, and the new page is "torn". Add fork number to some error messages in bufmgr.c, that still lacked it.
2008-10-28	Arrange to squeeze out the MINIMAL_TUPLE_PADDING in the tuple representation	Tom Lane
	written to temp files by tuplesort.c and tuplestore.c. This saves 2 bytes per row for 32-bit machines, and 6 bytes per row for 64-bit machines, which seems worth the slight additional uglification of the tuple read/write routines.
2008-10-22	Fix GiST's killing tuple: GISTScanOpaque->curpos wasn't	Teodor Sigaev
	correctly set. As result, killtuple() marks as dead wrong tuple on page. Bug was introduced by me while fixing possible duplicates during GiST index scan.
2008-10-20	Rework subtransaction commit protocol for hot standby.	Alvaro Herrera
	This patch eliminates the marking of subtransactions as SUBCOMMITTED in pg_clog during their commit; instead they remain in-progress until main transaction commit. At main transaction commit, the commit protocol is atomic-by-page instead of one transaction at a time. To avoid a race condition with some subtransactions appearing committed before others in the case where they span more than one pg_clog page, we conserve the logic that marks them subcommitted before marking the parent committed. Simon Riggs with minor help from me
2008-10-20	Remove mark/restore support in GIN and GiST indexes.	Teodor Sigaev
	Per Tom's comment. Also revome useless GISTScanOpaque->flags field.
2008-10-17	Remove useless mark/restore support in hash index AM, per discussion.	Tom Lane
	(I'm leaving GiST/GIN cleanup to Teodor.)
2008-10-17	During repeated rescan of GiST index it's possible that scan key	Teodor Sigaev
	is NULL but SK_SEARCHNULL is not set. Add checking IS NULL of keys to set during key initialization. If key is NULL and SK_SEARCHNULL is not set then nothnig can be satisfied. With assert-enabled compilation that causes coredump. Bug was introduced in 8.3 by support of IS NULL index scan.