Age | Commit message (Collapse) | Author |
|
When a PORTAL_ONE_SELECT query is executed, we can opportunistically
reuse the parse/plan shot for the execution phase. This cuts down the
number of snapshots per simple query from 2 to 1 for the simple
protocol, and 3 to 2 for the extended protocol. Since we are only
reusing a snapshot taken early in the processing of the same protocol
message, the change shouldn't be user-visible, except that the remote
possibility of the planning and execution snapshots being different is
eliminated.
Note that this change does not make it safe to assume that the parse/plan
snapshot will certainly be reused; that will currently only happen if
PortalStart() decides to use the PORTAL_ONE_SELECT strategy. It might
be worth trying to provide some stronger guarantees here in the future,
but for now we don't.
Patch by me; review by Dimitri Fontaine.
|
|
statements. We start by fixing the INSERT INTO support. For every result
relation, we now build a corresponding RemoteQuery node so that the
inserts can be carried out at the remote datanodes. Subsequently, at
the coordinator at execution time, instead of inserting the resulting tuples
in a local heap, we invoke remote execution and insert the rows in the
remote datanodes. This works nicely even for prepared queries, multiple
values clause for insert as well as any other mechanism of generating
tuples.
We use this infrastructure to then support CREATE TABLE AS SELECT (CTAS).
The query is transformed into a CREATE TABLE statement followed by
INSERT INTO statement and then run through normal planning/execution.
There are many regression cases that need fixing because these statements
now work correctly. This patch fixes many of them. Few might still be
failing, but they seem unrelated to the work itself and might be a
side-effect. We will fix them once this patch gets in.
|
|
An additional check is made when session starts up to see if remote
node information is consistent between pool and catalogs.
In case it is not, non-superusers are not allowed to have an access
to the cluster as this could result in creation of inconsistent data.
A superuser is authorized to connect however he receives a warning
message to inform that remote node information inconsistency has
to be solved.
|
|
lost. The only way we detect that at the moment is when write() fails when
we try to write to the socket.
Florian Pflug with small changes by me, reviewed by Greg Jaskiewicz.
|
|
A new system function called pgxc_pool_reload has been added.
If called, this function reloads connection information to remote nodes
in a consistent way with the following process:
1) A lock is taken on pooler forbidding new connection requests
2) Database pools (user and database-dependant pools) are reloaded
depending on the node information located on catalog pgxc_node.
The following rules are followed depending on node connection
information modification:
- node whose node and port value is changed has its connections
dropped and this node pool is deleted from each database pool
- node deleted is deleted from each database pool
- node unchanged is kept as is. However, its index value is changed
depending on the new cluster configuration.
- node created is added to each database pool
3) Lock is released
4) Session that invocated pgxc_pool_reload signals all the other
server sessions to reconnect to pooler to allow each agent to update
with newest connection information and reload session information
related to remote node handles. This has as effect to abort current
transactions and to remove all the temporary and prepared objects
on session. Then a WARNING message is sent back to client to inform
about the cluster configuration modification.
5) Session that invocated pgxc_pool_reload reconnects to pooler by
itself and reloads its session information related to remote
node handles. No WARNING message is sent back to client to inform
about the session reload.
This operation is limited to local Coordinator and returns a boolean
depending on the success of the operation. If pooler data is consistent
with catalog information when pgxc_pool_reload is invocated, nothing is
done but a success message is returned.
This has the following siplifications for cluster settings:
- cluster_nodes.sql is deleted.
- a new mandatory option --nodename is used to specify the node name
of the node initialized. This allows to set up pgxc_node catalog
with the node itself. pgxc_node_name in postgresql.conf is also
set automatically.
- CREATE/ALTER/DROP node are launched on local Coordinator only, meaning
that when a cluster is set up, it is necessary to create node information
on each Coordinator and then upload this information to pooler and sessions
by invocaing pgxc_pool_reload.
This optimization avoids to have to restart a Coordinator when changing
cluster configuration and solves security problems related to cluster_nodes.sql
that could be edited with all types of SQL even if its first target was only NODE
DDL.
|
|
This crash was happening at the execution of a RemoteQuery node
when trying to allocate memory for fresh handles taken from pool.
In this case allocation was made in TopTransactionContext.
However, in the case of sqoop, which is a Hadoop module creating an
interface with a db backend, it happened that TopTransactionContext
was NULL, leading to a crash of the node.
This commit switches the memory context to CurrentMemoryContext
and contains fixes for possible memory leaks related to barrier and
COPY.
Fix from Pavan Deolasee
|
|
Node information is not anymore supported by node number using
GUC parameters but node names.
Node connection information is taken from a new catalog table
called pgxc_node. Node group information can be found in pgxc_group.
Node connection information is taken from catalog when user session
begins and sticks with it for the duration of the session. This brings
more flexibility to the cluster settings. Cluster node information can
now be set when node is initialized with initdb using cluster_nodes.sql
located in share directory.
This commits adds support for the following new DDL:
- CREATE NODE
- ALTER NODE
- DROP NODE
- CREATE NODE GROUP
- DROP NODE GROUP
The following parameters are deleted from postgresql.conf:
- num_data_nodes
- preferred_data_nodes
- data_node_hosts
- data_node_ports
- primary_data_node
- num_coordinators
- coordinator_hosts
- coordinator_ports
pgxc_node_id is replaced by pgxc_node_name to identify the node-self.
Documentation is added for the new queries. Functionalities such as
EXECUTE DIRECT, CLEAN CONNECTION use node names instead of node numbers now.
|
|
When a btree index contains all columns required by the query, and the
visibility map shows that all tuples on a target heap page are
visible-to-all, we don't need to fetch that heap page. This patch depends
on the previous patches that made the visibility map reliable.
There's a fair amount left to do here, notably trying to figure out a less
chintzy way of estimating the cost of an index-only scan, but the core
functionality seems ready to commit.
Robert Haas and Ibrar Ahmed, with some previous work by Heikki Linnakangas.
|
|
pg_ctl use that to query the data directory for config-only installs.
This fixes awkward or impossible pg_ctl operation for config-only
installs.
|
|
Rewrite plancache.c so that a "cached plan" (which is rather a misnomer
at this point) can support generation of custom, parameter-value-dependent
plans, and can make an intelligent choice between using custom plans and
the traditional generic-plan approach. The specific choice algorithm
implemented here can probably be improved in future, but this commit is
all about getting the mechanism in place, not the policy.
In addition, restructure the API to greatly reduce the amount of extraneous
data copying needed. The main compromise needed to make that possible was
to split the initial creation of a CachedPlanSource into two steps. It's
worth noting in particular that SPI_saveplan is now deprecated in favor of
SPI_keepplan, which accomplishes the same end result with zero data
copying, and no need to then spend even more cycles throwing away the
original SPIPlan. The risk of long-term memory leaks while manipulating
SPIPlans has also been greatly reduced. Most of this improvement is based
on use of the recently-added MemoryContextSetParent primitive.
|
|
We were doing some amazingly complicated things in order to avoid running
the very expensive identify_system_timezone() procedure during GUC
initialization. But there is an obvious fix for that, which is to do it
once during initdb and have initdb install the system-specific default into
postgresql.conf, as it already does for most other GUC variables that need
system-environment-dependent defaults. This means that the timezone (and
log_timezone) settings no longer have any magic behavior in the server.
Per discussion.
|
|
As per my recent proposal, this refactors things so that these typedefs and
macros are available in a header that can be included in frontend-ish code.
I also changed various headers that were undesirably including
utils/timestamp.h to include datatype/timestamp.h instead. Unsurprisingly,
this showed that half the system was getting utils/timestamp.h by way of
xlog.h.
No actual code changes here, just header refactoring.
|
|
1. The parameter values are stored in RemoteQueryState rather than RemoteQuery
node, since they are execution time entities.
2. When the plan is generated for a parameterised query, the parameter types are
set in all the RemoteQuery nodes in the plan.
3. At the time of execution, the parameter type names are sent to the datanodes,
in the Parse message. Changes are done to send and receive parameter type names
instead of OIDs.
4. The GROUP BY optimizations are now applied even in the case when there are
bound parameters in the query.
|
|
In pursuit of this (and with the expectation that WaitLatch will be needed
in more places), convert the latch field that was already added to PGPROC
for sync rep into a generic latch that is activated for all PGPROC-owning
processes, and change many of the standard backend signal handlers to set
that latch when a signal happens. This will allow WaitLatch callers to be
wakened properly by these signals.
In passing, fix a whole bunch of signal handlers that had been hacked to do
things that might change errno, without adding the necessary save/restore
logic for errno. Also make some minor fixes in unix_latch.c, and clean
up bizarre and unsafe scheme for disowning the process's latch. Much of
this has to be back-patched into 9.1.
Peter Geoghegan, with additional work by Tom
|
|
90% of compilation warnings are cleaned with this commit.
There are still warnings remaining due to the strong dependance
between GTM and PGXC main code.
|
|
There may be some other places where we should use errdetail_internal,
but they'll have to be evaluated case-by-case. This commit just hits
a bunch of places where invoking gettext is obviously a waste of cycles.
|
|
This is the commit merge of Postgres-XC with the intersection of
PostgreSQL REL9_1_STABLE and master branches.
Conflicts:
COPYRIGHT
contrib/pgbench/pgbench.c
src/Makefile
src/backend/access/transam/recovery.conf.sample
src/backend/access/transam/varsup.c
src/backend/access/transam/xlog.c
src/backend/catalog/Makefile
src/backend/catalog/dependency.c
src/backend/catalog/system_views.sql
src/backend/commands/copy.c
src/backend/commands/explain.c
src/backend/commands/sequence.c
src/backend/commands/tablecmds.c
src/backend/commands/vacuum.c
src/backend/executor/nodeAgg.c
src/backend/nodes/copyfuncs.c
src/backend/nodes/equalfuncs.c
src/backend/nodes/outfuncs.c
src/backend/nodes/readfuncs.c
src/backend/optimizer/path/allpaths.c
src/backend/optimizer/plan/createplan.c
src/backend/optimizer/plan/setrefs.c
src/backend/parser/gram.y
src/backend/parser/parse_utilcmd.c
src/backend/postmaster/postmaster.c
src/backend/rewrite/rewriteHandler.c
src/backend/storage/lmgr/proc.c
src/backend/tcop/postgres.c
src/backend/utils/adt/ruleutils.c
src/backend/utils/init/postinit.c
src/backend/utils/misc/guc.c
src/backend/utils/misc/postgresql.conf.sample
src/backend/utils/sort/tuplesort.c
src/bin/initdb/initdb.c
src/bin/pg_ctl/pg_ctl.c
src/bin/pg_dump/pg_dump.c
src/include/access/xlog.h
src/include/catalog/catversion.h
src/include/catalog/indexing.h
src/include/catalog/pg_aggregate.h
src/include/catalog/pg_proc.h
src/include/commands/copy.h
src/include/nodes/parsenodes.h
src/include/nodes/primnodes.h
src/include/optimizer/pathnode.h
src/include/parser/kwlist.h
src/include/storage/procarray.h
src/test/regress/expected/.gitignore
src/test/regress/expected/aggregates.out
src/test/regress/expected/alter_table.out
src/test/regress/expected/bit.out
src/test/regress/expected/box.out
src/test/regress/expected/delete.out
src/test/regress/expected/float4.out
src/test/regress/expected/float8.out
src/test/regress/expected/int2.out
src/test/regress/expected/int8.out
src/test/regress/expected/interval.out
src/test/regress/expected/numeric.out
src/test/regress/expected/point.out
src/test/regress/expected/polygon.out
src/test/regress/expected/sequence.out
src/test/regress/expected/timestamp.out
src/test/regress/expected/timestamptz.out
src/test/regress/expected/transactions.out
src/test/regress/expected/window.out
src/test/regress/input/misc.source
src/test/regress/output/create_misc_1.source
src/test/regress/output/misc.source
src/test/regress/sql/aggregates.sql
src/test/regress/sql/alter_table.sql
src/test/regress/sql/bit.sql
src/test/regress/sql/box.sql
src/test/regress/sql/delete.sql
src/test/regress/sql/domain.sql
src/test/regress/sql/float4.sql
src/test/regress/sql/float8.sql
src/test/regress/sql/int2.sql
src/test/regress/sql/int8.sql
src/test/regress/sql/interval.sql
src/test/regress/sql/lseg.sql
src/test/regress/sql/numeric.sql
src/test/regress/sql/path.sql
src/test/regress/sql/point.sql
src/test/regress/sql/polygon.sql
src/test/regress/sql/portals.sql
src/test/regress/sql/sequence.sql
src/test/regress/sql/timestamp.sql
src/test/regress/sql/timestamptz.sql
src/test/regress/sql/transactions.sql
src/test/regress/sql/window.sql
src/test/regress/sql/with.sql
|
|
We had previously (af26857a2775e7ceb0916155e931008c2116632f)
established the U.S. spellings as standard.
|
|
|
|
This fixes a problem with autovacuum worker/launchers that tended to use
the connection allocated for postmaster to connect to GTM.
In the case of multiple vacuums running at the same time, this tended to mess the
way autovacuum was receiving GXID and snapshots from GTM.
This commit also adds some debug messages to look at the connection activity to GTM
and more strict connection control of autovacuum backends to GTM.
|
|
This fixes issues when JDBC was used with multi INSERT such as:
INSERT INTO table_name VALUES (1),(2);
|
|
distribution key
INT8, INT2, OID, INT4, BOOL, INT2VECTOR, OIDVECTOR
CHAR, NAME, TEXT, BPCHAR, BYTEA, VARCHAR
FLOAT4, FLOAT8, NUMERIC, CASH
ABSTIME, RELTIME, DATE, TIME, TIMESTAMP, TIMESTAMPTZ, INTERVAL, TIMETZ
A new function compute_hash is added in the system which is used to
compute hash of a any of the supported data types.
The computed hash is used in the function GetRelationNodes to
find the targeted data node.
EXPLAIN for RemoteQuery has been modified to show the number of
data nodes targeted for a certain query. This is essential
to spot bugs in the optimizer in case it is targeting all nodes
by mistake.
In case of optimisations where comparison with a constant leads
the optimiser to point to a single data node, there were a couple
of mistakes in examine_conditions_walker.
First it was not supporting RelabelType, which represents a "dummy"
type coercion between two binary compatible datatypes.
This was resulting in the optimization not working for varchar
type for example.
Secondly it was not catering for the case where the user specifies the
condition such that the constant expression is written towards LHS and the
variable towards the RHS of the = operator.
i.e. 23 = a
A number of test cases have been added in regression to make sure
further enhancements do not break this functionality.
This change has a sizeable impact on current regression tests in the following manner.
1. horology test case crashes the server and has been commented out in serial_schedule.
2. In money test case the planner optimizer wrongly kicks in to optimize this query
SELECT m = '$123.01' FROM money_data;
to point to a single data node.
3. There were a few un-necessary EXPLAINs in create_index test case.
Since we have added support in EXPLAIN to show the number of
data nodes targeted for RemoteQuery, this test case was producing
output dependent on the cluster configuration.
4. In guc test case
DROP ROLE temp_reset_user;
results in
ERROR: permission denied to drop role
|
|
Merge 9.0 PostgreSQL release into PGXC. Resolve conflicts thrown by git
and fix some issues raised during compilation. We still don't compile fine
at this point, but we should have resolved many conflicts to make further
progress.
Some of the changes in the regression tests are merged to reflect whats
there in 9.0 release. Those are easy to fix later when we run regressions
Conflicts:
contrib/Makefile
contrib/pgbench/pgbench.c
src/Makefile
src/backend/Makefile
src/backend/access/transam/varsup.c
src/backend/catalog/Makefile
src/backend/catalog/dependency.c
src/backend/catalog/genbki.sh
src/backend/commands/dbcommands.c
src/backend/commands/explain.c
src/backend/commands/vacuum.c
src/backend/executor/execMain.c
src/backend/executor/execProcnode.c
src/backend/executor/execTuples.c
src/backend/parser/analyze.c
src/backend/parser/gram.y
src/backend/parser/parse_utilcmd.c
src/backend/postmaster/postmaster.c
src/backend/rewrite/rewriteHandler.c
src/backend/storage/ipc/procarray.c
src/backend/storage/lmgr/proc.c
src/backend/tcop/postgres.c
src/backend/tcop/utility.c
src/backend/utils/cache/relcache.c
src/backend/utils/init/postinit.c
src/backend/utils/misc/guc.c
src/bin/pg_ctl/pg_ctl.c
src/include/Makefile
src/include/access/twophase.h
src/include/bootstrap/bootstrap.h
src/include/catalog/catversion.h
src/include/catalog/dependency.h
src/include/catalog/indexing.h
src/include/catalog/pg_proc.h
src/include/nodes/nodes.h
src/include/storage/lwlock.h
src/include/storage/proc.h
src/include/storage/procarray.h
src/include/utils/lsyscache.h
src/test/regress/expected/delete.out
src/test/regress/expected/float4.out
src/test/regress/expected/float8.out
src/test/regress/expected/geometry.out
src/test/regress/expected/join.out
src/test/regress/expected/point.out
src/test/regress/expected/rowtypes.out
src/test/regress/expected/timestamp.out
src/test/regress/expected/timestamptz.out
src/test/regress/expected/tsearch.out
src/test/regress/sql/numeric.sql
src/test/regress/sql/point.sql
|
|
A little bit late, but change the headers for 2011.
|
|
Multiple insert means using a single insert statement to insert
multiple rows into a table using the syntax e.g.
insert into students(rno, class, pos) values
(1, 10, 5), (2, 10, 6), (3, 10, 7), (4, 10, 8);
Without the patch statements like these pass,
but actually do not insert any thing in the table.
The main code changes are in re-writer.
The patch checks to see if the insert statement
has more than one sets in the provided list of values
(FOUR in the above example), and in that case rewrites the insert statement.
The insert rewriter separates the sets in the provided
list of values into independent lists depending on the
distribution of the table, the distribution column and
the value provided for the distribution column.
Next the main re-writer is separated into two possible
paths, one without a for loop and if we have a separated
list of insert values, we run a for loop on the list
and create an insert statement for each of the data nodes
providing it that sub-group of the original list
that is supposed to run on this particular data node.
Main work is done now, all that is left is to handle
multiple command result tags from the data nodes.
HandleCmdComplete does this, it simply keeps adding
into the insert row count until all data nodes are done.
With this patch, multi insert does not work for replicated tables.
Additional comments are also necessary.
|
|
There was some code that was used to clean up connection thread between Coordinator and Datanodes
that was not really necessary.
This has been introduced with version 0.9.2 to stabilize the code by consuming messages on connections
where error happened on backend Noce.
This patch also corrects a bug on Datanode with GXID that was not correctly set
at initialization.
This leaded to transactions being committed twice on backend nodes, crashing it with a FATAL error.
Patch written by Andrei Martsinchyk
|
|
Utility to clean up Postgres-XC Pooler connections.
This utility is launched to all the Coordinators of the cluster
Use of CLEAN CONNECTION is limited to a super user.
It is advised to clean connections before dropping a Database.
SQL query synopsis is as follows:
CLEAN CONNECTION TO
(COORDINATOR num | DATANODE num | ALL {FORCE})
FOR DATABASE dbname
Connection cleaning has to be made on a chosen database called dbname.
It is also possible to clean connections of several Coordinators or Datanodes
Ex: CLEAN CONNECTION TO DATANODE 1,5,7 FOR DATABASE template1
CLEAN CONNECTION TO COORDINATOR 2,4,6 FOR DATABASE template1
Or even to all Coordinators/Datanodes at the same time
Ex: CLEAN CONNECTION TO DATANODE * FOR DATABASE template1
CLEAN CONNECTION TO COORDINATOR * FOR DATABASE template1
When FORCE is used, all the transactions using pooler connections are aborted,
and pooler connections are cleaned up.
Ex: CLEAN CONNECTION TO ALL FORCE FOR DATABASE template1;
FORCE can only be used with TO ALL, as it takes a lock on pooler to stop requests
asking for connections, aborts all the connections in the cluster, and cleans up
pool connections
|
|
1) Support for DDL and utility command synchronisation among Coordinators.
DDL is now synchronized amongst multiple coordinators. Previously, after
DDL it was required to use an extra utility to resync the nodes and
restart other Coordinators. This is no longer necessary.
DDL support works also with common BEGIN, COMMIT and ROLLBACK instructions
in the cluster.
DDL may be initiated at any node. Each Coordinator can connect to any
other one.
Just as Coordinators use pools for connecting to Data Nodes, Coordinators
now use pools for connecting to the other Coordinators.
2) Support for PREPARE TRANSACTION and COMMIT TRANSACTION, ROLLBACK PREPARED.
When a transaction is prepared or committed, based on the SQL, it will only
execute on the involved nodes, including DDL on Coordinators.
GTM is used track which xid and nodes are involved in the transaction,
identified by the user or application specified transaction identifier,
when it is prepared.
New GUCs
--------
There are some new GUCs for handling Coordinator communication
num_coordinators
coordinator_hosts
coordinator_ports
coordinator_users
coordinator_passwords
In addition, a new GUC replaces coordinator_id:
pgxc_node_id
Open Issues
-----------
Implicit two phase commit (client in autocommit mode, but distributed
transaction required because of multiple nodes) does not first prepare
on the originating coordinator before committing, if DDL is involved.
We really should prepare here before committing on all nodes.
We also need to add a bit of special handling for COMMIT PREPARED.
If there is an error, and it got committed on some nodes, we still
should force it to be committed on the originating coordinator, if
involved, and still return an error of some sort that it was partially
committed. (When the downed node recovers, in the future it will determine
if any other node has committed the transaction, and if so, it, too, must
commit.) It is a pretty rare case, but we should handle it.
With this current configuration, DDL will fail if at least one
Coordinator is down. In the future, we will make this more flexible.
Written by Michael Paquier
|
|
This initial version implements support by creating them on the Coordinator
only; they are not created on the data nodes.
Not yet supported is UPDATE / DELETE WHERE CURRENT OF, but basic read-only
cursor capability works, including SCROLL cursors.
Written by Andrei Martsinchyk
|
|
When a transaction is begun on Coordinator,
a transaction sending a BEGIN message to GTM receives back
a timestamp with the usual GXID.
This timestamp is calculated from the clock of GTM server.
With that, nodes in the cluster can adjust their own timeline
with GTM by calculating a delta value based
on the GTM timestamp and their local clock.
Like GXID and snapshot, a timestamp is also sent down to Datanodes
in case so as to keep consistent timestamp values between coordinator and datanodes.
This commit supports global timestamp values for now(), statement_timestamp,
transaction_timestamp,current_date, current_time, current_timestamp,
localtime, local_timestamp and now().
clock_timestamp and timeofday make their calculation based
on the local server clock so they get their results from the local node where it is run.
Their use could lead to inconsistencies if used in a transaction involving several Datanodes.
|
|
Note that this is a "version 1.0" implementation, borrowing some code
from the SQL/MED patch. This means that all cross-node joins
take place on a Coordinator by pulling up data from the data nodes.
Some queries will therefore execute quite slowly, but they will
at least execute. In this patch, all columns are SELECTed from
the remote table, but at least simple WHERE clauses are pushed
down to the remote nodes. We will optimize query processing
in the future.
Note that the same connections to remote nodes are used in
multiple steps. To get around that problem, we just
add a materialization node above each RemoteQuery node,
and force all results to be fetched first on the
Coordinator.
This patch also allows UNION, EXCEPT and INTERSECT, and other
more complex SELECT statements to run now.
It includes a fix for single-step, multi-node LIMIT and OFFSET.
It also includes EXPLAIN output from the Coordinator's
point of view.
Adding these changes introduced a problem with AVG(),
which is currently not working.
|
|
This integrates Postgres-XC code deeper into PostgreSQL.
The Extended Query Protocol can now be used, which means that
JDBC will now work. It also lays more groundwork for
supporting multi-step queries (cross-node joins).
Note that statements with parameters cannot yet
be prepared and executed, only those without parameters
will work.
Note also that this patch introduces additional performance
degradation because more processing occurs with
each request. We will be working to address these
issues in the coming weeks.
Written by Andrei Martsinchyk
|
|
This includes forcing the release of connections in an unexpected
state and bug fixes.
This was written by Andrei Martsinchyk, with some additional
handling added by Mason.
|
|
AbortTransaction may be called multiple times, each
time calling DataNodeRollback, which may fail again
if a data node is down.
Instead, if we are already in an abort state, we do not
bother repeating abort actions.
|
|
This is handled on the Coordinator. It will push down the ORDER BY
and merge-sort the sorted input streams from the nodes.
It converts from DataRow to tuple format as needed.
If one of the SELECT clause expressions is not in the ORDER BY,
it appends it to the ORDER BY when pushing it down to the data nodes
and leaves it off when returning to the client.
With DISTINCT, an ORDER BY will be used and pushed down to the data
nodes such that a merge-sort can be done and de-duplication can
occur.
By Andrei Martsinchyk
|
|
of time whether or not it should only execute on the
Coordinator (pg_catalog tables).
Written by Michael Paquier
|
|
Moved up the call to be above setting sigjmp
|
|
It currently only supports from a single table, copy with SELECT is not
yet supported.
This was written by Michael Paquier.
|
|
Some additional work was done related to the combiner and
error handling to make this code a little cleaner.
This was written by Andrei Martsinchyk.
|
|
It could still happen that we do not consume Z ReadyForQuery after an error.
We introduce a new connection state to detect this.
Also, previously it was possible that DDL may succeed on the coordinator
and get committed but not on the datanodes. We now make sure it does
not get committed on the coordinator.
|
|
the likelihood of distributed deadlocks. That is, if all writes
for a table first go through the same data node, if the same tuple is
updated by multiple clients, we can at least ensure that the first
session that obtains the lock can similarly obtain the lock for
the same tuple on all of the nodes. (Usual deadlocks are still
possible.)
There is a new GUC parameter, primary_data_node. By default it is 1,
the node number where to execute writes to replicated tables first,
before doing the other ones. If it is set to 0, then the primary
node technique is not used, and it will update all replicas
simultaneously.
Instead of the planner returning a list of nodes to execute on,
it returns a pointer to Exec_Nodes, which contains the primary and
secondary nodes to execute on. DataNodeExec() now uses this information.
I also added a new check so that if a different number of rows were
affected on replicated tables (an UPDATE, for example), an error occurs.
This happens for COMBINE_TYPE_SAME. (I tested with the help of EXECUTE
DIRECT, intentionally messing up the data.)
|
|
Application of patch PGXC-PG_REL8_4_3.patch.gz
on PostgreSQL version 8.4.3
|
|
This option turns off autovacuum, prevents non-super-user connections,
and enables oid setting hooks in the backend. The code continues to use
the old autoavacuum disable settings for servers with earlier catalog
versions.
This includes a catalog version bump to identify servers that support
the -b option.
|
|
to the regular stack. The code to do that is platform and compiler specific,
add support for the HP-UX native compiler.
|
|
|
|
The previous functions of assign hooks are now split between check hooks
and assign hooks, where the former can fail but the latter shouldn't.
Aside from being conceptually clearer, this approach exposes the
"canonicalized" form of the variable value to guc.c without having to do
an actual assignment. And that lets us fix the problem recently noted by
Bernd Helmle that the auto-tune patch for wal_buffers resulted in bogus
log messages about "parameter "wal_buffers" cannot be changed without
restarting the server". There may be some speed advantage too, because
this design lets hook functions avoid re-parsing variable values when
restoring a previous state after a rollback (they can store a pre-parsed
representation of the value instead). This patch also resolves a
longstanding annoyance about custom error messages from variable assign
hooks: they should modify, not appear separately from, guc.c's own message
about "invalid parameter value".
|
|
1. Don't ignore query cancel interrupts. Instead, if the user asks to
cancel the query after we've already committed it, but before it's on
the standby, just emit a warning and let the COMMIT finish.
2. Don't ignore die interrupts (pg_terminate_backend or fast shutdown).
Instead, emit a warning message and close the connection without
acknowledging the commit. Other backends will still see the effect of
the commit, but there's no getting around that; it's too late to abort
at this point, and ignoring die interrupts altogether doesn't seem like
a good idea.
3. If synchronous_standby_names becomes empty, wake up all backends
waiting for synchronous replication to complete. Without this, someone
attempting to shut synchronous replication off could easily wedge the
entire system instead.
4. Avoid depending on the assumption that if a walsender updates
MyProc->syncRepState, we'll see the change even if we read it without
holding the lock. The window for this appears to be quite narrow (and
probably doesn't exist at all on machines with strong memory ordering)
but protecting against it is practically free, so do that.
5. Remove useless state SYNC_REP_MUST_DISCONNECT, which isn't needed and
doesn't actually do anything.
There's still some further work needed here to make the behavior of fast
shutdown plausible, but that looks complex, so I'm leaving it for a
separate commit. Review by Fujii Masao.
|
|
With this patch, portals, SQL functions, and SPI all agree that there
should be only a CommandCounterIncrement between the queries that are
generated from a single SQL command by rule expansion. Fetching a whole
new snapshot now happens only between original queries. This is equivalent
to the existing behavior of EXPLAIN ANALYZE, and it was judged to be the
best choice since it eliminates one source of concurrency hazards for
rules. The patch should also make things marginally faster by reducing the
number of snapshot push/pop operations.
The patch removes pg_parse_and_rewrite(), which is no longer used anywhere.
There was considerable discussion about more aggressive refactoring of the
query-processing functions exported by postgres.c, but for the moment
nothing more has been done there.
I also took the opportunity to refactor snapmgr.c's API slightly: the
former PushUpdatedSnapshot() has been split into two functions.
Marko Tiikkaja, reviewed by Steve Singer and Tom Lane
|
|
|
|
Previously reported as ERRCODE_ADMIN_SHUTDOWN, this case is now
reported as ERRCODE_T_R_DATABASE_DROPPED. No message text change.
Unlikely to happen on most servers, so low impact change to allow
session poolers to correctly handle this situation.
Tatsuo Ishii, edits by me, review by Robert Haas
|