summaryrefslogtreecommitdiff
path: root/src/backend/tcop/postgres.c
AgeCommit message (Collapse)Author
2011-12-21Take fewer snapshots.Robert Haas
When a PORTAL_ONE_SELECT query is executed, we can opportunistically reuse the parse/plan shot for the execution phase. This cuts down the number of snapshots per simple query from 2 to 1 for the simple protocol, and 3 to 2 for the extended protocol. Since we are only reusing a snapshot taken early in the processing of the same protocol message, the change shouldn't be user-visible, except that the remote possibility of the planning and execution snapshots being different is eliminated. Note that this change does not make it safe to assume that the parse/plan snapshot will certainly be reused; that will currently only happen if PortalStart() decides to use the PORTAL_ONE_SELECT strategy. It might be worth trying to provide some stronger guarantees here in the future, but for now we don't. Patch by me; review by Dimitri Fontaine.
2011-12-14Implement support for CREATE TABLE AS, SELECT INTO and INSERT INTOPavan Deolasee
statements. We start by fixing the INSERT INTO support. For every result relation, we now build a corresponding RemoteQuery node so that the inserts can be carried out at the remote datanodes. Subsequently, at the coordinator at execution time, instead of inserting the resulting tuples in a local heap, we invoke remote execution and insert the rows in the remote datanodes. This works nicely even for prepared queries, multiple values clause for insert as well as any other mechanism of generating tuples. We use this infrastructure to then support CREATE TABLE AS SELECT (CTAS). The query is transformed into a CREATE TABLE statement followed by INSERT INTO statement and then run through normal planning/execution. There are many regression cases that need fixing because these statements now work correctly. This patch fixes many of them. Few might still be failing, but they seem unrelated to the work itself and might be a side-effect. We will fix them once this patch gets in.
2011-12-12Forbid access on Coordinator for inconsistent connection dataMichael P
An additional check is made when session starts up to see if remote node information is consistent between pool and catalogs. In case it is not, non-superusers are not allowed to have an access to the cluster as this could result in creation of inconsistent data. A superuser is authorized to connect however he receives a warning message to inform that remote node information inconsistency has to be solved.
2011-12-09Cancel running query if it is detected that the connection to the client isHeikki Linnakangas
lost. The only way we detect that at the moment is when write() fails when we try to write to the socket. Florian Pflug with small changes by me, reviewed by Greg Jaskiewicz.
2011-12-01Support for dynamic pooler/session connection information cache reloadMichael P
A new system function called pgxc_pool_reload has been added. If called, this function reloads connection information to remote nodes in a consistent way with the following process: 1) A lock is taken on pooler forbidding new connection requests 2) Database pools (user and database-dependant pools) are reloaded depending on the node information located on catalog pgxc_node. The following rules are followed depending on node connection information modification: - node whose node and port value is changed has its connections dropped and this node pool is deleted from each database pool - node deleted is deleted from each database pool - node unchanged is kept as is. However, its index value is changed depending on the new cluster configuration. - node created is added to each database pool 3) Lock is released 4) Session that invocated pgxc_pool_reload signals all the other server sessions to reconnect to pooler to allow each agent to update with newest connection information and reload session information related to remote node handles. This has as effect to abort current transactions and to remove all the temporary and prepared objects on session. Then a WARNING message is sent back to client to inform about the cluster configuration modification. 5) Session that invocated pgxc_pool_reload reconnects to pooler by itself and reloads its session information related to remote node handles. No WARNING message is sent back to client to inform about the session reload. This operation is limited to local Coordinator and returns a boolean depending on the success of the operation. If pooler data is consistent with catalog information when pgxc_pool_reload is invocated, nothing is done but a success message is returned. This has the following siplifications for cluster settings: - cluster_nodes.sql is deleted. - a new mandatory option --nodename is used to specify the node name of the node initialized. This allows to set up pgxc_node catalog with the node itself. pgxc_node_name in postgresql.conf is also set automatically. - CREATE/ALTER/DROP node are launched on local Coordinator only, meaning that when a cluster is set up, it is necessary to create node information on each Coordinator and then upload this information to pooler and sessions by invocaing pgxc_pool_reload. This optimization avoids to have to restart a Coordinator when changing cluster configuration and solves security problems related to cluster_nodes.sql that could be edited with all types of SQL even if its first target was only NODE DDL.
2011-11-14Fix for bug 3431570: crash with Hadoop and sqoopMichael P
This crash was happening at the execution of a RemoteQuery node when trying to allocate memory for fresh handles taken from pool. In this case allocation was made in TopTransactionContext. However, in the case of sqoop, which is a Hadoop module creating an interface with a db backend, it happened that TopTransactionContext was NULL, leading to a crash of the node. This commit switches the memory context to CurrentMemoryContext and contains fixes for possible memory leaks related to barrier and COPY. Fix from Pavan Deolasee
2011-10-27Support for Node and Node Group DDLMichael P
Node information is not anymore supported by node number using GUC parameters but node names. Node connection information is taken from a new catalog table called pgxc_node. Node group information can be found in pgxc_group. Node connection information is taken from catalog when user session begins and sticks with it for the duration of the session. This brings more flexibility to the cluster settings. Cluster node information can now be set when node is initialized with initdb using cluster_nodes.sql located in share directory. This commits adds support for the following new DDL: - CREATE NODE - ALTER NODE - DROP NODE - CREATE NODE GROUP - DROP NODE GROUP The following parameters are deleted from postgresql.conf: - num_data_nodes - preferred_data_nodes - data_node_hosts - data_node_ports - primary_data_node - num_coordinators - coordinator_hosts - coordinator_ports pgxc_node_id is replaced by pgxc_node_name to identify the node-self. Documentation is added for the new queries. Functionalities such as EXECUTE DIRECT, CLEAN CONNECTION use node names instead of node numbers now.
2011-10-08Support index-only scans using the visibility map to avoid heap fetches.Tom Lane
When a btree index contains all columns required by the query, and the visibility map shows that all tuples on a target heap page are visible-to-all, we don't need to fetch that heap page. This patch depends on the previous patches that made the visibility map reliable. There's a fair amount left to do here, notably trying to figure out a less chintzy way of estimating the cost of an index-only scan, but the core functionality seems ready to commit. Robert Haas and Ibrar Ahmed, with some previous work by Heikki Linnakangas.
2011-10-06Add postmaster -C option to query configuration parameters, and haveBruce Momjian
pg_ctl use that to query the data directory for config-only installs. This fixes awkward or impossible pg_ctl operation for config-only installs.
2011-09-16Redesign the plancache mechanism for more flexibility and efficiency.Tom Lane
Rewrite plancache.c so that a "cached plan" (which is rather a misnomer at this point) can support generation of custom, parameter-value-dependent plans, and can make an intelligent choice between using custom plans and the traditional generic-plan approach. The specific choice algorithm implemented here can probably be improved in future, but this commit is all about getting the mechanism in place, not the policy. In addition, restructure the API to greatly reduce the amount of extraneous data copying needed. The main compromise needed to make that possible was to split the initial creation of a CachedPlanSource into two steps. It's worth noting in particular that SPI_saveplan is now deprecated in favor of SPI_keepplan, which accomplishes the same end result with zero data copying, and no need to then spend even more cycles throwing away the original SPIPlan. The risk of long-term memory leaks while manipulating SPIPlans has also been greatly reduced. Most of this improvement is based on use of the recently-added MemoryContextSetParent primitive.
2011-09-09Simplify handling of the timezone GUC by making initdb choose the default.Tom Lane
We were doing some amazingly complicated things in order to avoid running the very expensive identify_system_timezone() procedure during GUC initialization. But there is an obvious fix for that, which is to do it once during initdb and have initdb install the system-specific default into postgresql.conf, as it already does for most other GUC variables that need system-environment-dependent defaults. This means that the timezone (and log_timezone) settings no longer have any magic behavior in the server. Per discussion.
2011-09-09Move Timestamp/Interval typedefs and basic macros into datatype/timestamp.h.Tom Lane
As per my recent proposal, this refactors things so that these typedefs and macros are available in a header that can be included in frontend-ish code. I also changed various headers that were undesirably including utils/timestamp.h to include datatype/timestamp.h instead. Unsurprisingly, this showed that half the system was getting utils/timestamp.h by way of xlog.h. No actual code changes here, just header refactoring.
2011-08-19Support for parameterised queries in XC. It has following changes.Ashutosh Bapat
1. The parameter values are stored in RemoteQueryState rather than RemoteQuery node, since they are execution time entities. 2. When the plan is generated for a parameterised query, the parameter types are set in all the RemoteQuery nodes in the plan. 3. At the time of execution, the parameter type names are sent to the datanodes, in the Parse message. Changes are done to send and receive parameter type names instead of OIDs. 4. The GROUP BY optimizations are now applied even in the case when there are bound parameters in the query.
2011-08-10Change the autovacuum launcher to use WaitLatch instead of a poll loop.Tom Lane
In pursuit of this (and with the expectation that WaitLatch will be needed in more places), convert the latch field that was already added to PGPROC for sync rep into a generic latch that is activated for all PGPROC-owning processes, and change many of the standard backend signal handlers to set that latch when a signal happens. This will allow WaitLatch callers to be wakened properly by these signals. In passing, fix a whole bunch of signal handlers that had been hacked to do things that might change errno, without adding the necessary save/restore logic for errno. Also make some minor fixes in unix_latch.c, and clean up bizarre and unsafe scheme for disowning the process's latch. Much of this has to be back-patched into 9.1. Peter Geoghegan, with additional work by Tom
2011-08-08Clean up compilation warningsMichael P
90% of compilation warnings are cleaned with this commit. There are still warnings remaining due to the strong dependance between GTM and PGXC main code.
2011-07-16Replace errdetail("%s", ...) with errdetail_internal("%s", ...).Tom Lane
There may be some other places where we should use errdetail_internal, but they'll have to be evaluated case-by-case. This commit just hits a bunch of places where invoking gettext is obviously a waste of cycles.
2011-07-06Merge commit 'a4bebdd92624e018108c2610fc3f2c1584b6c687' into masterMichael P
This is the commit merge of Postgres-XC with the intersection of PostgreSQL REL9_1_STABLE and master branches. Conflicts: COPYRIGHT contrib/pgbench/pgbench.c src/Makefile src/backend/access/transam/recovery.conf.sample src/backend/access/transam/varsup.c src/backend/access/transam/xlog.c src/backend/catalog/Makefile src/backend/catalog/dependency.c src/backend/catalog/system_views.sql src/backend/commands/copy.c src/backend/commands/explain.c src/backend/commands/sequence.c src/backend/commands/tablecmds.c src/backend/commands/vacuum.c src/backend/executor/nodeAgg.c src/backend/nodes/copyfuncs.c src/backend/nodes/equalfuncs.c src/backend/nodes/outfuncs.c src/backend/nodes/readfuncs.c src/backend/optimizer/path/allpaths.c src/backend/optimizer/plan/createplan.c src/backend/optimizer/plan/setrefs.c src/backend/parser/gram.y src/backend/parser/parse_utilcmd.c src/backend/postmaster/postmaster.c src/backend/rewrite/rewriteHandler.c src/backend/storage/lmgr/proc.c src/backend/tcop/postgres.c src/backend/utils/adt/ruleutils.c src/backend/utils/init/postinit.c src/backend/utils/misc/guc.c src/backend/utils/misc/postgresql.conf.sample src/backend/utils/sort/tuplesort.c src/bin/initdb/initdb.c src/bin/pg_ctl/pg_ctl.c src/bin/pg_dump/pg_dump.c src/include/access/xlog.h src/include/catalog/catversion.h src/include/catalog/indexing.h src/include/catalog/pg_aggregate.h src/include/catalog/pg_proc.h src/include/commands/copy.h src/include/nodes/parsenodes.h src/include/nodes/primnodes.h src/include/optimizer/pathnode.h src/include/parser/kwlist.h src/include/storage/procarray.h src/test/regress/expected/.gitignore src/test/regress/expected/aggregates.out src/test/regress/expected/alter_table.out src/test/regress/expected/bit.out src/test/regress/expected/box.out src/test/regress/expected/delete.out src/test/regress/expected/float4.out src/test/regress/expected/float8.out src/test/regress/expected/int2.out src/test/regress/expected/int8.out src/test/regress/expected/interval.out src/test/regress/expected/numeric.out src/test/regress/expected/point.out src/test/regress/expected/polygon.out src/test/regress/expected/sequence.out src/test/regress/expected/timestamp.out src/test/regress/expected/timestamptz.out src/test/regress/expected/transactions.out src/test/regress/expected/window.out src/test/regress/input/misc.source src/test/regress/output/create_misc_1.source src/test/regress/output/misc.source src/test/regress/sql/aggregates.sql src/test/regress/sql/alter_table.sql src/test/regress/sql/bit.sql src/test/regress/sql/box.sql src/test/regress/sql/delete.sql src/test/regress/sql/domain.sql src/test/regress/sql/float4.sql src/test/regress/sql/float8.sql src/test/regress/sql/int2.sql src/test/regress/sql/int8.sql src/test/regress/sql/interval.sql src/test/regress/sql/lseg.sql src/test/regress/sql/numeric.sql src/test/regress/sql/path.sql src/test/regress/sql/point.sql src/test/regress/sql/polygon.sql src/test/regress/sql/portals.sql src/test/regress/sql/sequence.sql src/test/regress/sql/timestamp.sql src/test/regress/sql/timestamptz.sql src/test/regress/sql/transactions.sql src/test/regress/sql/window.sql src/test/regress/sql/with.sql
2011-06-29Unify spelling of "canceled", "canceling", "cancellation"Peter Eisentraut
We had previously (af26857a2775e7ceb0916155e931008c2116632f) established the U.S. spellings as standard.
2011-06-17First cut implementation of BARRIER for PITR and global consistent recoveryPavan Deolasee
2011-06-06Partial fix for bug 3310399: Autovacuum workers using same connections to GTMMichael P
This fixes a problem with autovacuum worker/launchers that tended to use the connection allocated for postmaster to connect to GTM. In the case of multiple vacuums running at the same time, this tended to mess the way autovacuum was receiving GXID and snapshots from GTM. This commit also adds some debug messages to look at the connection activity to GTM and more strict connection control of autovacuum backends to GTM.
2011-05-30Fix for bug 3307846: multiple INSERT with JDBC driverMichael P
This fixes issues when JDBC was used with multi INSERT such as: INSERT INTO table_name VALUES (1),(2);
2011-05-26This patch adds support for the following data types to be used as ↵Abbas
distribution key INT8, INT2, OID, INT4, BOOL, INT2VECTOR, OIDVECTOR CHAR, NAME, TEXT, BPCHAR, BYTEA, VARCHAR FLOAT4, FLOAT8, NUMERIC, CASH ABSTIME, RELTIME, DATE, TIME, TIMESTAMP, TIMESTAMPTZ, INTERVAL, TIMETZ A new function compute_hash is added in the system which is used to compute hash of a any of the supported data types. The computed hash is used in the function GetRelationNodes to find the targeted data node. EXPLAIN for RemoteQuery has been modified to show the number of data nodes targeted for a certain query. This is essential to spot bugs in the optimizer in case it is targeting all nodes by mistake. In case of optimisations where comparison with a constant leads the optimiser to point to a single data node, there were a couple of mistakes in examine_conditions_walker. First it was not supporting RelabelType, which represents a "dummy" type coercion between two binary compatible datatypes. This was resulting in the optimization not working for varchar type for example. Secondly it was not catering for the case where the user specifies the condition such that the constant expression is written towards LHS and the variable towards the RHS of the = operator. i.e. 23 = a A number of test cases have been added in regression to make sure further enhancements do not break this functionality. This change has a sizeable impact on current regression tests in the following manner. 1. horology test case crashes the server and has been commented out in serial_schedule. 2. In money test case the planner optimizer wrongly kicks in to optimize this query SELECT m = '$123.01' FROM money_data; to point to a single data node. 3. There were a few un-necessary EXPLAINs in create_index test case. Since we have added support in EXPLAIN to show the number of data nodes targeted for RemoteQuery, this test case was producing output dependent on the cluster configuration. 4. In guc test case DROP ROLE temp_reset_user; results in ERROR: permission denied to drop role
2011-05-20Merge commit '1084f317702e1a039696ab8a37caf900e55ec8f2' into int-pgxcPavan Deolasee
Merge 9.0 PostgreSQL release into PGXC. Resolve conflicts thrown by git and fix some issues raised during compilation. We still don't compile fine at this point, but we should have resolved many conflicts to make further progress. Some of the changes in the regression tests are merged to reflect whats there in 9.0 release. Those are easy to fix later when we run regressions Conflicts: contrib/Makefile contrib/pgbench/pgbench.c src/Makefile src/backend/Makefile src/backend/access/transam/varsup.c src/backend/catalog/Makefile src/backend/catalog/dependency.c src/backend/catalog/genbki.sh src/backend/commands/dbcommands.c src/backend/commands/explain.c src/backend/commands/vacuum.c src/backend/executor/execMain.c src/backend/executor/execProcnode.c src/backend/executor/execTuples.c src/backend/parser/analyze.c src/backend/parser/gram.y src/backend/parser/parse_utilcmd.c src/backend/postmaster/postmaster.c src/backend/rewrite/rewriteHandler.c src/backend/storage/ipc/procarray.c src/backend/storage/lmgr/proc.c src/backend/tcop/postgres.c src/backend/tcop/utility.c src/backend/utils/cache/relcache.c src/backend/utils/init/postinit.c src/backend/utils/misc/guc.c src/bin/pg_ctl/pg_ctl.c src/include/Makefile src/include/access/twophase.h src/include/bootstrap/bootstrap.h src/include/catalog/catversion.h src/include/catalog/dependency.h src/include/catalog/indexing.h src/include/catalog/pg_proc.h src/include/nodes/nodes.h src/include/storage/lwlock.h src/include/storage/proc.h src/include/storage/procarray.h src/include/utils/lsyscache.h src/test/regress/expected/delete.out src/test/regress/expected/float4.out src/test/regress/expected/float8.out src/test/regress/expected/geometry.out src/test/regress/expected/join.out src/test/regress/expected/point.out src/test/regress/expected/rowtypes.out src/test/regress/expected/timestamp.out src/test/regress/expected/timestamptz.out src/test/regress/expected/tsearch.out src/test/regress/sql/numeric.sql src/test/regress/sql/point.sql
2011-05-19maintenance for 2011, change header filesMichael P
A little bit late, but change the headers for 2011.
2011-05-19The patch implements multiple insert syntax in PGXC.Michael P
Multiple insert means using a single insert statement to insert multiple rows into a table using the syntax e.g. insert into students(rno, class, pos) values (1, 10, 5), (2, 10, 6), (3, 10, 7), (4, 10, 8); Without the patch statements like these pass, but actually do not insert any thing in the table. The main code changes are in re-writer. The patch checks to see if the insert statement has more than one sets in the provided list of values (FOUR in the above example), and in that case rewrites the insert statement. The insert rewriter separates the sets in the provided list of values into independent lists depending on the distribution of the table, the distribution column and the value provided for the distribution column. Next the main re-writer is separated into two possible paths, one without a for loop and if we have a separated list of insert values, we run a for loop on the list and create an insert statement for each of the data nodes providing it that sub-group of the original list that is supposed to run on this particular data node. Main work is done now, all that is left is to handle multiple command result tags from the data nodes. HandleCmdComplete does this, it simply keeps adding into the insert row count until all data nodes are done. With this patch, multi insert does not work for replicated tables. Additional comments are also necessary.
2011-05-19Clean up of execRemote.cMichael P
There was some code that was used to clean up connection thread between Coordinator and Datanodes that was not really necessary. This has been introduced with version 0.9.2 to stabilize the code by consuming messages on connections where error happened on backend Noce. This patch also corrects a bug on Datanode with GXID that was not correctly set at initialization. This leaded to transactions being committed twice on backend nodes, crashing it with a FATAL error. Patch written by Andrei Martsinchyk
2011-05-19Support for CLEAN CONNECTIONMichael P
Utility to clean up Postgres-XC Pooler connections. This utility is launched to all the Coordinators of the cluster Use of CLEAN CONNECTION is limited to a super user. It is advised to clean connections before dropping a Database. SQL query synopsis is as follows: CLEAN CONNECTION TO (COORDINATOR num | DATANODE num | ALL {FORCE}) FOR DATABASE dbname Connection cleaning has to be made on a chosen database called dbname. It is also possible to clean connections of several Coordinators or Datanodes Ex: CLEAN CONNECTION TO DATANODE 1,5,7 FOR DATABASE template1 CLEAN CONNECTION TO COORDINATOR 2,4,6 FOR DATABASE template1 Or even to all Coordinators/Datanodes at the same time Ex: CLEAN CONNECTION TO DATANODE * FOR DATABASE template1 CLEAN CONNECTION TO COORDINATOR * FOR DATABASE template1 When FORCE is used, all the transactions using pooler connections are aborted, and pooler connections are cleaned up. Ex: CLEAN CONNECTION TO ALL FORCE FOR DATABASE template1; FORCE can only be used with TO ALL, as it takes a lock on pooler to stop requests asking for connections, aborts all the connections in the cluster, and cleans up pool connections
2011-05-19Added support for two new pieces of functionality.Michael P
1) Support for DDL and utility command synchronisation among Coordinators. DDL is now synchronized amongst multiple coordinators. Previously, after DDL it was required to use an extra utility to resync the nodes and restart other Coordinators. This is no longer necessary. DDL support works also with common BEGIN, COMMIT and ROLLBACK instructions in the cluster. DDL may be initiated at any node. Each Coordinator can connect to any other one. Just as Coordinators use pools for connecting to Data Nodes, Coordinators now use pools for connecting to the other Coordinators. 2) Support for PREPARE TRANSACTION and COMMIT TRANSACTION, ROLLBACK PREPARED. When a transaction is prepared or committed, based on the SQL, it will only execute on the involved nodes, including DDL on Coordinators. GTM is used track which xid and nodes are involved in the transaction, identified by the user or application specified transaction identifier, when it is prepared. New GUCs -------- There are some new GUCs for handling Coordinator communication num_coordinators coordinator_hosts coordinator_ports coordinator_users coordinator_passwords In addition, a new GUC replaces coordinator_id: pgxc_node_id Open Issues ----------- Implicit two phase commit (client in autocommit mode, but distributed transaction required because of multiple nodes) does not first prepare on the originating coordinator before committing, if DDL is involved. We really should prepare here before committing on all nodes. We also need to add a bit of special handling for COMMIT PREPARED. If there is an error, and it got committed on some nodes, we still should force it to be committed on the originating coordinator, if involved, and still return an error of some sort that it was partially committed. (When the downed node recovers, in the future it will determine if any other node has committed the transaction, and if so, it, too, must commit.) It is a pretty rare case, but we should handle it. With this current configuration, DDL will fail if at least one Coordinator is down. In the future, we will make this more flexible. Written by Michael Paquier
2011-05-19Initial support for cursors (DECLARE, FETCH).Mason Sharp
This initial version implements support by creating them on the Coordinator only; they are not created on the data nodes. Not yet supported is UPDATE / DELETE WHERE CURRENT OF, but basic read-only cursor capability works, including SCROLL cursors. Written by Andrei Martsinchyk
2011-05-19Support for Global timestamp in Postgres-XC.Michael P
When a transaction is begun on Coordinator, a transaction sending a BEGIN message to GTM receives back a timestamp with the usual GXID. This timestamp is calculated from the clock of GTM server. With that, nodes in the cluster can adjust their own timeline with GTM by calculating a delta value based on the GTM timestamp and their local clock. Like GXID and snapshot, a timestamp is also sent down to Datanodes in case so as to keep consistent timestamp values between coordinator and datanodes. This commit supports global timestamp values for now(), statement_timestamp, transaction_timestamp,current_date, current_time, current_timestamp, localtime, local_timestamp and now(). clock_timestamp and timeofday make their calculation based on the local server clock so they get their results from the local node where it is run. Their use could lead to inconsistencies if used in a transaction involving several Datanodes.
2011-05-19Initial support for multi-step queries, including cross-node joins.M S
Note that this is a "version 1.0" implementation, borrowing some code from the SQL/MED patch. This means that all cross-node joins take place on a Coordinator by pulling up data from the data nodes. Some queries will therefore execute quite slowly, but they will at least execute. In this patch, all columns are SELECTed from the remote table, but at least simple WHERE clauses are pushed down to the remote nodes. We will optimize query processing in the future. Note that the same connections to remote nodes are used in multiple steps. To get around that problem, we just add a materialization node above each RemoteQuery node, and force all results to be fetched first on the Coordinator. This patch also allows UNION, EXCEPT and INTERSECT, and other more complex SELECT statements to run now. It includes a fix for single-step, multi-node LIMIT and OFFSET. It also includes EXPLAIN output from the Coordinator's point of view. Adding these changes introduced a problem with AVG(), which is currently not working.
2011-05-19Portal integration changes.M S
This integrates Postgres-XC code deeper into PostgreSQL. The Extended Query Protocol can now be used, which means that JDBC will now work. It also lays more groundwork for supporting multi-step queries (cross-node joins). Note that statements with parameters cannot yet be prepared and executed, only those without parameters will work. Note also that this patch introduces additional performance degradation because more processing occurs with each request. We will be working to address these issues in the coming weeks. Written by Andrei Martsinchyk
2011-05-19Added more handling to deal with data node connection failures.Mason S
This includes forcing the release of connections in an unexpected state and bug fixes. This was written by Andrei Martsinchyk, with some additional handling added by Mason.
2011-05-19In Postgres-XC, the error stack may overflow becauseMason S
AbortTransaction may be called multiple times, each time calling DataNodeRollback, which may fail again if a data node is down. Instead, if we are already in an abort state, we do not bother repeating abort actions.
2011-05-19Add support for ORDER BY adn DISTINCT.Mason S
This is handled on the Coordinator. It will push down the ORDER BY and merge-sort the sorted input streams from the nodes. It converts from DataRow to tuple format as needed. If one of the SELECT clause expressions is not in the ORDER BY, it appends it to the ORDER BY when pushing it down to the data nodes and leaves it off when returning to the client. With DISTINCT, an ORDER BY will be used and pushed down to the data nodes such that a merge-sort can be done and de-duplication can occur. By Andrei Martsinchyk
2011-05-19Minor change that updates COPY so that it knows aheadMason S
of time whether or not it should only execute on the Coordinator (pg_catalog tables). Written by Michael Paquier
2011-05-19Fixed a bug where if many errors occur we run out of on_proc_exit slots.Mason S
Moved up the call to be above setting sigjmp
2011-05-19Added support for COPY TO a file or STDOUT.Mason S
It currently only supports from a single table, copy with SELECT is not yet supported. This was written by Michael Paquier.
2011-05-19Added support for COPY FROM, for loading tables.Mason S
Some additional work was done related to the combiner and error handling to make this code a little cleaner. This was written by Andrei Martsinchyk.
2011-05-19Improved error handling.Mason S
It could still happen that we do not consume Z ReadyForQuery after an error. We introduce a new connection state to detect this. Also, previously it was possible that DDL may succeed on the coordinator and get committed but not on the datanodes. We now make sure it does not get committed on the coordinator.
2011-05-19For writes to replicated tables, use primary copy technique to reduceMason S
the likelihood of distributed deadlocks. That is, if all writes for a table first go through the same data node, if the same tuple is updated by multiple clients, we can at least ensure that the first session that obtains the lock can similarly obtain the lock for the same tuple on all of the nodes. (Usual deadlocks are still possible.) There is a new GUC parameter, primary_data_node. By default it is 1, the node number where to execute writes to replicated tables first, before doing the other ones. If it is set to 0, then the primary node technique is not used, and it will update all replicas simultaneously. Instead of the planner returning a list of nodes to execute on, it returns a pointer to Exec_Nodes, which contains the primary and secondary nodes to execute on. DataNodeExec() now uses this information. I also added a new check so that if a different number of rows were affected on replicated tables (an UPDATE, for example), an error occurs. This happens for COMBINE_TYPE_SAME. (I tested with the help of EXECUTE DIRECT, intentionally messing up the data.)
2011-05-19Postgres-XC version 0.9Michael P
Application of patch PGXC-PG_REL8_4_3.patch.gz on PostgreSQL version 8.4.3
2011-04-25Add postmaster/postgres undocumented -b option for binary upgrades.Bruce Momjian
This option turns off autovacuum, prevents non-super-user connections, and enables oid setting hooks in the backend. The code continues to use the old autoavacuum disable settings for servers with earlier catalog versions. This includes a catalog version bump to identify servers that support the -b option.
2011-04-13On IA64 architecture, we check the depth of the register stack in additionHeikki Linnakangas
to the regular stack. The code to do that is platform and compiler specific, add support for the HP-UX native compiler.
2011-04-10pgindent run before PG 9.1 beta 1.Bruce Momjian
2011-04-07Revise the API for GUC variable assign hooks.Tom Lane
The previous functions of assign hooks are now split between check hooks and assign hooks, where the former can fail but the latter shouldn't. Aside from being conceptually clearer, this approach exposes the "canonicalized" form of the variable value to guc.c without having to do an actual assignment. And that lets us fix the problem recently noted by Bernd Helmle that the auto-tune patch for wal_buffers resulted in bogus log messages about "parameter "wal_buffers" cannot be changed without restarting the server". There may be some speed advantage too, because this design lets hook functions avoid re-parsing variable values when restoring a previous state after a rollback (they can store a pre-parsed representation of the value instead). This patch also resolves a longstanding annoyance about custom error messages from variable assign hooks: they should modify, not appear separately from, guc.c's own message about "invalid parameter value".
2011-03-17Fix various possible problems with synchronous replication.Robert Haas
1. Don't ignore query cancel interrupts. Instead, if the user asks to cancel the query after we've already committed it, but before it's on the standby, just emit a warning and let the COMMIT finish. 2. Don't ignore die interrupts (pg_terminate_backend or fast shutdown). Instead, emit a warning message and close the connection without acknowledging the commit. Other backends will still see the effect of the commit, but there's no getting around that; it's too late to abort at this point, and ignoring die interrupts altogether doesn't seem like a good idea. 3. If synchronous_standby_names becomes empty, wake up all backends waiting for synchronous replication to complete. Without this, someone attempting to shut synchronous replication off could easily wedge the entire system instead. 4. Avoid depending on the assumption that if a walsender updates MyProc->syncRepState, we'll see the change even if we read it without holding the lock. The window for this appears to be quite narrow (and probably doesn't exist at all on machines with strong memory ordering) but protecting against it is practically free, so do that. 5. Remove useless state SYNC_REP_MUST_DISCONNECT, which isn't needed and doesn't actually do anything. There's still some further work needed here to make the behavior of fast shutdown plausible, but that looks complex, so I'm leaving it for a separate commit. Review by Fujii Masao.
2011-03-01Rearrange snapshot handling to make rule expansion more consistent.Tom Lane
With this patch, portals, SQL functions, and SPI all agree that there should be only a CommandCounterIncrement between the queries that are generated from a single SQL command by rule expansion. Fetching a whole new snapshot now happens only between original queries. This is equivalent to the existing behavior of EXPLAIN ANALYZE, and it was judged to be the best choice since it eliminates one source of concurrency hazards for rules. The patch should also make things marginally faster by reducing the number of snapshot push/pop operations. The patch removes pg_parse_and_rewrite(), which is no longer used anywhere. There was considerable discussion about more aggressive refactoring of the query-processing functions exported by postgres.c, but for the moment nothing more has been done there. I also took the opportunity to refactor snapmgr.c's API slightly: the former PushUpdatedSnapshot() has been split into two functions. Marko Tiikkaja, reviewed by Steve Singer and Tom Lane
2011-02-01Re-classify ERRCODE_DATABASE_DROPPED to 57P04Simon Riggs
2011-02-01Create new errcode for recovery conflict caused by db drop on master.Simon Riggs
Previously reported as ERRCODE_ADMIN_SHUTDOWN, this case is now reported as ERRCODE_T_R_DATABASE_DROPPED. No message text change. Unlikely to happen on most servers, so low impact change to allow session poolers to correctly handle this situation. Tatsuo Ishii, edits by me, review by Robert Haas