You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
(4) |
May
(28) |
Jun
(12) |
Jul
(11) |
Aug
(12) |
Sep
(5) |
Oct
(19) |
Nov
(14) |
Dec
(12) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(18) |
Feb
(30) |
Mar
(115) |
Apr
(89) |
May
(50) |
Jun
(44) |
Jul
(22) |
Aug
(13) |
Sep
(11) |
Oct
(30) |
Nov
(28) |
Dec
(39) |
2012 |
Jan
(38) |
Feb
(18) |
Mar
(43) |
Apr
(91) |
May
(108) |
Jun
(46) |
Jul
(37) |
Aug
(44) |
Sep
(33) |
Oct
(29) |
Nov
(36) |
Dec
(15) |
2013 |
Jan
(35) |
Feb
(611) |
Mar
(5) |
Apr
(55) |
May
(30) |
Jun
(28) |
Jul
(458) |
Aug
(34) |
Sep
(9) |
Oct
(39) |
Nov
(22) |
Dec
(32) |
2014 |
Jan
(16) |
Feb
(16) |
Mar
(42) |
Apr
(179) |
May
(7) |
Jun
(6) |
Jul
(9) |
Aug
|
Sep
(4) |
Oct
|
Nov
(3) |
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
(4) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Michael P. <mic...@us...> - 2010-10-28 01:46:25
|
Project "website". The branch, master has been updated via e7e3e50f49fc4e8aad9b69364b263bb0b8c4b18c (commit) via d365e89c8dcd4231fe15b634df6e0b6cb62f73d4 (commit) via eaef610bc5a85efb93ce617e0bf8e19ad70ffc9f (commit) via 3dbebc282d6facdde1589cbf97c4c580a52185dd (commit) from d56fffa616e25f0b5c35e1e517855113a7abbad3 (commit) - Log ----------------------------------------------------------------- commit e7e3e50f49fc4e8aad9b69364b263bb0b8c4b18c Author: Michael P <mic...@us...> Date: Thu Oct 28 10:46:24 2010 +0900 Reformulation in Roadmap diff --git a/roadmap.html b/roadmap.html index 7fb6ab4..50aa661 100755 --- a/roadmap.html +++ b/roadmap.html @@ -53,7 +53,7 @@ SQL Limitations </a> document for further details. There is no support yet for <code>SELECT</code> in <code>FROM</code> clause. </p> -<p>We will be expanding the coverage of supported SQL and High-Availability (HA) features as main guidelines in the coming months.</p> +<p>We will be expanding the coverage of supported SQL and as well High-Availability (HA) features in the coming months.</p> <!-- ==== Planned feature === --> <h3> Upcoming Releases and Features commit d365e89c8dcd4231fe15b634df6e0b6cb62f73d4 Author: Michael P <mic...@us...> Date: Thu Oct 28 10:45:31 2010 +0900 Code cleaning in Roadmap diff --git a/roadmap.html b/roadmap.html index 2ccad3b..7fb6ab4 100755 --- a/roadmap.html +++ b/roadmap.html @@ -32,10 +32,10 @@ At present, Postgres-XC provides major transaction management features similar to PostgreSQL, except for savepoints. </p> <p> -On the other hand, Postgres-XC needs to enhance support for general statements.<br> +On the other hand, Postgres-XC needs to enhance support for general statements.<br /> As of Version 0.9.3, Postgres-XC supports statements which can be executed -on a single data node, or on multiple nodes for single and multi step.<br> -This new version adds support for: +on a single data node, or on multiple nodes for single and multi step.<br /> +This new version adds support for:<br /> - Cursor Support<br /> - Basic cross-node operation<br /> - Global timestamp<br /> @@ -89,20 +89,20 @@ Version 1.0 (Late in December, 2010) </h4> <p class="inner"> -Physical backup/restore incl. PITR<br> -Cross-node oepration optimization<br> -More variety of statements such as <code>SELECT</code> in <code>INSERT</code><br> -Full support Prepared statements and cluster-wide recovery<br> +Physical backup/restore incl. PITR<br /> +Cross-node oepration optimization<br /> +More variety of statements such as <code>SELECT</code> in <code>INSERT</code><br /> +Full support Prepared statements and cluster-wide recovery<br /> HA Capability<br /> -General aggregate functions<br> -Savepoint<br> -Session Parameters<br> -Forward cursor with <code>ORDER BY</code><br> -Backward cursor<br> -Batch, statement pushdown<br> -Global constraints<br> -Tuple relocation (distrubute key update)<br> -Performance improvement <br> +General aggregate functions<br /> +Savepoint<br /> +Session Parameters<br /> +Forward cursor with <code>ORDER BY</code><br /> +Backward cursor<br /> +Batch, statement pushdown<br /> +Global constraints<br /> +Tuple relocation (distrubute key update)<br /> +Performance improvement <br /> Regression tests </p> commit eaef610bc5a85efb93ce617e0bf8e19ad70ffc9f Author: Michael P <mic...@us...> Date: Thu Oct 28 10:42:44 2010 +0900 Version release roadmap updated diff --git a/roadmap.html b/roadmap.html index 4f8802e..2ccad3b 100755 --- a/roadmap.html +++ b/roadmap.html @@ -24,35 +24,36 @@ Postgres-XC Roadmap <!-- ==== Current Limintation ==== --> <h3> -Current Limitation of Postgres-XC +Current Limitations of Postgres-XC </h3> <p> At present, Postgres-XC provides major transaction management features -similar to PostgreSQL, except for two phase commit (2PC) and savepoints. -(XC uses 2PC for internal use). +similar to PostgreSQL, except for savepoints. </p> <p> On the other hand, Postgres-XC needs to enhance support for general statements.<br> -As of Version 0.9.2, Postgres-XC supports statements which can be executed -on a single data node, or on multiple nodes but as a single step.<br> +As of Version 0.9.3, Postgres-XC supports statements which can be executed +on a single data node, or on multiple nodes for single and multi step.<br> This new version adds support for: -- views<br> -- extra DDLs<br> -- ORDER BY/DISTINCT<br> -- pg_dump, pg_restore<br> -- sequence full support with GTM<br> -- basic stored function support.<br> -- Cold synchronization of Coordinator's Catalog files<br> -However there are some limitations please refer to <a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9.2/PG-XC_SQL_Limitations_v0_9_2.pdf/download" target="_blank"> +- Cursor Support<br /> +- Basic cross-node operation<br /> +- Global timestamp<br /> +- DDL synchronisation<br /> +- Cluster-wide installer<br /> +- Cluster-wide operation utilities<br /> +- Driver support (ECPG, JDBC, PHP, etc.)<br /> +- Extended Query Protocol (for JDBC)<br /> +- Support of external 2PC from application<br /> + +However there are some limitations please refer to <a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9.3/PG-XC_SQL_Limitations_v0_9_3.pdf/download" target="_blank"> SQL Limitations </a> document for further details. </p> <p> There is no support yet for <code>SELECT</code> in <code>FROM</code> clause. -Support for <code>CURSOR</code> is a future issue too. </p> -<p>We will be expanding the coverage of supported SQL as an item of particular focus in the coming months.</p> +<p>We will be expanding the coverage of supported SQL and High-Availability (HA) features as main guidelines in the coming months.</p> <!-- ==== Planned feature === --> <h3> Upcoming Releases and Features @@ -63,7 +64,7 @@ Current plan of future releases and features are as follows: </p> <!-- ==== For version 0.9.3 ==== --> -<h4> +<!-- <h4> Version 0.9.3 (Late in September, 2010) </h4> @@ -80,7 +81,7 @@ Global timestamp<br> Driver support (ECPG, JDBC, PHP, etc.)<br> Forward Cursor (w/o <code>ORDER BY</code>)<br> subqueries<br> -</p> +</p> --> <!-- ==== For Version 1.0 ==== --> <h4> @@ -91,17 +92,15 @@ Version 1.0 (Late in December, 2010) Physical backup/restore incl. PITR<br> Cross-node oepration optimization<br> More variety of statements such as <code>SELECT</code> in <code>INSERT</code><br> -Prepared statements<br> -General aggregate functions<br> +Full support Prepared statements and cluster-wide recovery<br> +HA Capability<br /> +General aggregate functions<br> Savepoint<br> Session Parameters<br> -2PC from Apps<br> Forward cursor with <code>ORDER BY</code><br> Backward cursor<br> Batch, statement pushdown<br> -Caralog synchronize with DDLs<br> -Trigger<br> -GLobal constraints<br> +Global constraints<br> Tuple relocation (distrubute key update)<br> Performance improvement <br> Regression tests @@ -113,8 +112,9 @@ Beyond Version 1.0 </h4> <p class="inner"> -HA Capability<br> -GTM-Standby<br> +HA Capability<br /> +GTM-Standby<br /> +Trigger<br /> </p> </body> commit 3dbebc282d6facdde1589cbf97c4c580a52185dd Author: Michael P <mic...@us...> Date: Thu Oct 28 10:32:16 2010 +0900 Event page updated with 2010 and 2011 upcoming events diff --git a/events.html b/events.html index 1cb57b5..cc3a412 100755 --- a/events.html +++ b/events.html @@ -13,7 +13,12 @@ --> <h2 class="plain">Events</h2> <p class="plain"> -Upcoming events to be decided soon! +A lot of opportunities to meet the Core developpers!! +<ul> +<li><a href="http://2010.pgday.eu/" target="_blank">PGDay-EU</a> in November 2010</li> +<li>PG-East in March 2011</li> +<li>PG-Con 2010 in May 2011</li> +</ul> </p> <!-- Event title --> @@ -30,10 +35,10 @@ Description of this event. UPDATES --> <h2 class="plain">Updates</h2> -<!-- Postgres-XC 0.9.2 download --> +<!-- Postgres-XC 0.9.3 download --> <p class="plain"> -Postgres-XC 0.9.2 is now available!! Download -<a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9.2/pgxc_v0_9_2.tar.gz/download" target="_blank"> +Postgres-XC 0.9.3 is now available!! Download +<a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9.3/pgxc_v0_9_3.tar.gz/download" target="_blank"> here. </a> </p> ----------------------------------------------------------------------- Summary of changes: events.html | 13 +++++++--- roadmap.html | 74 +++++++++++++++++++++++++++++----------------------------- 2 files changed, 46 insertions(+), 41 deletions(-) hooks/post-receive -- website |
From: Michael P. <mic...@us...> - 2010-10-28 01:16:50
|
Project "website". The branch, master has been updated via d56fffa616e25f0b5c35e1e517855113a7abbad3 (commit) from 8c2cfec1e2cf6263a2a1bbea33d09643fb6a942a (commit) - Log ----------------------------------------------------------------- commit d56fffa616e25f0b5c35e1e517855113a7abbad3 Author: Michael P <mic...@us...> Date: Thu Oct 28 10:17:07 2010 +0900 Update download list according to 0.9.3 release documents diff --git a/download.html b/download.html index 9bb550d..79dcd2f 100755 --- a/download.html +++ b/download.html @@ -38,32 +38,32 @@ Please also note tarball files do not include Postgres-XC documents. <!-- Documents of version 0.9.2 --> <h4> -Version 0.9.2 +Version 0.9.3 </h4> <p> <ul> -<!-- tarball of 0.9.2, main download--> +<!-- tarball of 0.9.3, main download--> <li> -<code>pgxc_v0.9.2.tar.gz</code>: <br> +<code>pgxc_v0.9.3.tar.gz</code>: <br> Latest version of Postgres-XC available.<br> Please note that Postgres-XC documentation is not included in this file. ⇒ -<a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9.2/pgxc_v0_9_2.tar.gz/download" target="_blank"> +<a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9.3/pgxc_v0_9_3.tar.gz/download" target="_blank"> (download) </a> </li> <!-- tarball (diff) --> <li> -<code>PGXC_v0_9_2-PG_REL8_4_3.patch.gz</code>: <br> +<code>PGXC_v0_9_3-PG_REL8_4_3.patch.gz</code>: <br> The same material as above, but this file includes only the patch to apply to the PostgreSQL 8.4.3 release source code.<br> It is useful if you would like to see just a difference between PostgreSQL and Postgres-XC.<br> No Postgres-XC documentation is included in this file either. ⇒ -<a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9.2/PGXC_v0_9_2-PG_REL8_4_3.patch.gz/download" target="_blank"> +<a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9.3/PGXC_v0_9_3-PG_REL8_4_3.patch.gz/download" target="_blank"> (download) </a> </li> @@ -73,7 +73,7 @@ No Postgres-XC documentation is included in this file either. <code>COPYING</code>: <br> License description. Postgres-XC is distributed under LGPL version 2.1 ⇒ -<a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9.2/COPYING/download" target="_blank"> +<a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9.3/COPYING/download" target="_blank"> (download) </a> </li> @@ -81,61 +81,61 @@ License description. Postgres-XC is distributed under LGPL version 2.1 <!-- Files --> <li> <code>FILES</code>: <br> -Description of files included in Postgres-XC 0.9.2 release. +Description of files included in Postgres-XC 0.9.3 release. ⇒ -<a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9.2/FILES/download" target="_blank"> +<a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9.3/FILES/download" target="_blank"> (download) </a> </li> <!-- Reference Manual --> <li> -<code>PG-XC_ReferenceManual_v0_9_2.pdf</code>: <br> +<code>PG-XC_ReferenceManual_v0_9_3.pdf</code>: <br> Reference of Postgres-XC extension. ⇒ -<a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9.2/PG-XC_ReferenceManual_v0_9_2.pdf/download" target="_blank"> +<a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9.3/PG-XC_ReferenceManual_v0_9_3.pdf/download" target="_blank"> (download) </a> </li> <!-- pgbench Tutorial Manual --> <li> -<code>PG-XC_pgbench_Tutorial_v0_9_2.pdf</code>: <br> +<code>PG-XC_pgbench_Tutorial_v0_9_3.pdf</code>: <br> Step by step description how to build and configure pgbench to run with Postgres-XC. ⇒ -<a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9.2/PG-XC_pgbench_Tutorial_v0_9_2.pdf/download" target="_blank"> +<a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9.3/PG-XC_pgbench_Tutorial_v0_9_3.pdf/download" target="_blank"> (download) </a> </li> <!-- DBT-1 Tutorial Manual --> <li> -<code>PG-XC_DBT1_Tutorial_v0_9_2.pdf</code>: <br> +<code>PG-XC_DBT1_Tutorial_v0_9_3.pdf</code>: <br> Step by step description how to build and configure DBT-1 to run with Postgres-XC. ⇒ -<a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9.2/PG-XC_DBT1_Tutorial_v0_9_2.pdf/download" target="_blank"> +<a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9.3/PG-XC_DBT1_Tutorial_v0_9_3.pdf/download" target="_blank"> (download) </a> </li> <!-- Install Manual --> <li> -<code>PG-XC_InstallManual_v0_9_2.pdf</code>: <br> +<code>PG-XC_InstallManual_v0_9_3.pdf</code>: <br> Step by step description how to build, install and configure Postgres-XC. ⇒ -<a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9.2/PG-XC_InstallManual_v0_9_2.pdf/download" target="_blank"> +<a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9.3/PG-XC_InstallManual_v0_9_3.pdf/download" target="_blank"> (download) </a> </li> <!-- SQL limitation manual --> <li> -<code>PG-XC_SQL_Limitations_v0_9_2.pdf</code>: <br> -SQL restrictions available for Postgres-XC 0.9.2. +<code>PG-XC_SQL_Limitations_v0_9_3.pdf</code>: <br> +SQL restrictions available for Postgres-XC 0.9.3. ⇒ -<a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9.2/PG-XC_SQL_Limitations_v0_9_2.pdf/download" target="_blank"> +<a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9.3/PG-XC_SQL_Limitations_v0_9_3.pdf/download" target="_blank"> (download) </a> </li> diff --git a/members.html b/members.html index c230c54..4db05ef 100755 --- a/members.html +++ b/members.html @@ -55,7 +55,7 @@ He is also GridSQL developer and is now developping aggregate functions and other cross-node operation. </p> -<h4>Michael Paquier</h4> +<h4><a href="http://michaelpq.users.sourceforge.net/">Michael Paquier</a></h4> <p class="inner"> Coordinator feature developer.<br> diff --git a/prev_vers/version0_9.html b/prev_vers/version0_9.html index 4ed4cf1..487592e 100644 --- a/prev_vers/version0_9.html +++ b/prev_vers/version0_9.html @@ -238,3 +238,121 @@ Description of the outline of Postgres-XC internals. </body> </html> + + +<!-- Documents of version 0.9.2 --> +<h4> +Version 0.9.2 +</h4> + +<p> +<ul> +<!-- tarball of 0.9.2, main download--> +<li> +<code>pgxc_v0.9.2.tar.gz</code>: <br> +Latest version of Postgres-XC available.<br> +Please note that Postgres-XC documentation is not included in this file. +⇒ +<a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9.2/pgxc_v0_9_2.tar.gz/download" target="_blank"> +(download) +</a> +</li> + +<!-- tarball (diff) --> +<li> +<code>PGXC_v0_9_2-PG_REL8_4_3.patch.gz</code>: <br> +The same material as above, but this file includes only the patch to apply +to the PostgreSQL 8.4.3 release source code.<br> +It is useful if you would like to see just a difference between PostgreSQL +and Postgres-XC.<br> +No Postgres-XC documentation is included in this file either. +⇒ +<a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9.2/PGXC_v0_9_2-PG_REL8_4_3.patch.gz/download" target="_blank"> +(download) +</a> +</li> + +<!-- License --> +<li> +<code>COPYING</code>: <br> +License description. Postgres-XC is distributed under LGPL version 2.1 +⇒ +<a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9.2/COPYING/download" target="_blank"> +(download) +</a> +</li> + +<!-- Files --> +<li> +<code>FILES</code>: <br> +Description of files included in Postgres-XC 0.9.2 release. +⇒ +<a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9.2/FILES/download" target="_blank"> +(download) +</a> +</li> + +<!-- Reference Manual --> +<li> +<code>PG-XC_ReferenceManual_v0_9_2.pdf</code>: <br> +Reference of Postgres-XC extension. +⇒ +<a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9.2/PG-XC_ReferenceManual_v0_9_2.pdf/download" target="_blank"> +(download) +</a> +</li> + +<!-- pgbench Tutorial Manual --> +<li> +<code>PG-XC_pgbench_Tutorial_v0_9_2.pdf</code>: <br> +Step by step description how to build and configure pgbench to run with +Postgres-XC. +⇒ +<a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9.2/PG-XC_pgbench_Tutorial_v0_9_2.pdf/download" target="_blank"> +(download) +</a> +</li> + +<!-- DBT-1 Tutorial Manual --> +<li> +<code>PG-XC_DBT1_Tutorial_v0_9_2.pdf</code>: <br> +Step by step description how to build and configure DBT-1 to run with +Postgres-XC. +⇒ +<a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9.2/PG-XC_DBT1_Tutorial_v0_9_2.pdf/download" target="_blank"> +(download) +</a> +</li> + +<!-- Install Manual --> +<li> +<code>PG-XC_InstallManual_v0_9_2.pdf</code>: <br> +Step by step description how to build, install and configure Postgres-XC. +⇒ +<a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9.2/PG-XC_InstallManual_v0_9_2.pdf/download" target="_blank"> +(download) +</a> +</li> + +<!-- SQL limitation manual --> +<li> +<code>PG-XC_SQL_Limitations_v0_9_2.pdf</code>: <br> +SQL restrictions available for Postgres-XC 0.9.2. +⇒ +<a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9.2/PG-XC_SQL_Limitations_v0_9_2.pdf/download" target="_blank"> +(download) +</a> +</li> + +<!-- Architecture Document --> +<li> +<code>PG-XC_Architecture_v0_9.pdf</code>: <br> +Description of the outline of Postgres-XC internals. +⇒ +<a href="https://sourceforge.net/projects/postgres-xc/files/Version_0.9/PG-XC_Architecture.pdf/download" target="_blank"> +(download) +</a> +</li> + +</ul> +</p> ----------------------------------------------------------------------- Summary of changes: download.html | 40 ++++++++-------- members.html | 2 +- prev_vers/version0_9.html | 118 +++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 139 insertions(+), 21 deletions(-) hooks/post-receive -- website |
From: Michael P. <mic...@us...> - 2010-10-28 00:30:45
|
Project "Postgres-XC". The branch, REL0_9_3_STABLE has been created at d7d492eaeca181add193b4705de58637f5ba7c58 (commit) - Log ----------------------------------------------------------------- ----------------------------------------------------------------------- hooks/post-receive -- Postgres-XC |
From: Michael P. <mic...@us...> - 2010-10-28 00:28:38
|
Project "Postgres-XC". The annotated tag, v0.9.3 has been created at 4fa8496749b9d27ca702facbc78df9a45f6cda9c (tag) tagging d7d492eaeca181add193b4705de58637f5ba7c58 (commit) replaces v0.9.2 tagged by Michael P on Thu Oct 28 09:26:23 2010 +0900 - Log ----------------------------------------------------------------- Postgres-XC version 0.9.3 tag M S (2): Portal integration changes. Initial support for multi-step queries, including cross-node joins. Mason S (2): Added more handling to deal with data node connection failures. There is a race condition that could lead to problems Mason Sharp (13): In Postgres-XC, when extedngin the clog the status assertion Fix a visibility warning due to not taking into account Fixed a bug in GTM introduced with timestamp piggybacking with GXID. Fix a bug with AVG() Improved error handling. Address performance issues that were introduced in the last Initial support for cursors (DECLARE, FETCH). Handle stored functions in queries. Fix a bug with EXPLAIN and EXPLAIN VERBOSE. Fixed bug where extra materialization nodes were being created. Fix bug with pooler. SourceForge Bug ID: 3076224 checkpoint command causes seg fault When there is a data node crash, sometimes we were trying to read Michael P (6): Correction of bugs in pgxc_ddl Support for Global timestamp in Postgres-XC. Implementation of 2PC from applications Added support for two new pieces of functionality. After a Commit of prepared transaction on GTM, Deletion of a DEBUG message in postmaster.c ----------------------------------------------------------------------- hooks/post-receive -- Postgres-XC |
From: Pavan D. <pa...@us...> - 2010-10-27 10:48:12
|
Project "Postgres-XC". The branch, PGXC-sqlmed has been updated via eb50a76cb929fbe4a31d093b43e1589382c892a0 (commit) from 69bb66c62f71b9be918475ea65931adb3bbfba20 (commit) - Log ----------------------------------------------------------------- commit eb50a76cb929fbe4a31d093b43e1589382c892a0 Author: Pavan Deolasee <pav...@gm...> Date: Wed Oct 27 16:09:28 2010 +0530 Set remote relation stats (pages, rows etc) to a lower value so that NestLoop joins are preferred over other join types. This is necessary until we can handle other join types for remote join reduction diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c index 957a515..b1c8bcb 100644 --- a/src/backend/optimizer/util/relnode.c +++ b/src/backend/optimizer/util/relnode.c @@ -102,6 +102,20 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptKind reloptkind) case RTE_RELATION: /* Table --- retrieve statistics from the system catalogs */ get_relation_info(root, rte->relid, rte->inh, rel); +#ifdef PGXC + /* + * This is a remote table... we have no idea how many pages/rows + * we may get from a scan of this table. However, we should set the + * costs in such a manner that cheapest paths should pick up the + * ones involving these remote rels + * + * These allow for maximum query shipping to the remote + * side later during the planning phase + */ + rel->pages = 1; + rel->tuples = 1; + rel->rows = 1; +#endif break; case RTE_SUBQUERY: case RTE_FUNCTION: ----------------------------------------------------------------------- Summary of changes: src/backend/optimizer/util/relnode.c | 14 ++++++++++++++ 1 files changed, 14 insertions(+), 0 deletions(-) hooks/post-receive -- Postgres-XC |
From: Michael P. <mic...@us...> - 2010-10-27 00:07:14
|
Project "Postgres-XC". The branch, master has been updated via d7d492eaeca181add193b4705de58637f5ba7c58 (commit) from fee989010d22b6ca6c47b72d2d9b0620e4ab42b8 (commit) - Log ----------------------------------------------------------------- commit d7d492eaeca181add193b4705de58637f5ba7c58 Author: Michael P <mic...@us...> Date: Wed Oct 27 09:06:15 2010 +0900 Deletion of a DEBUG message in postmaster.c When opening a child under postmaster, there was always a message written in log telling about the PID number. This was written for bug pruposes only. diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 56974b0..0add0e6 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -3181,10 +3181,6 @@ BackendStartup(Port *port) pid = fork_process(); if (pid == 0) /* child */ { - //// FOR DEBUG - printf("The session started: %d\n", getpid()); - //sleep(60); - //// FOR DEBUG free(bn); /* ----------------------------------------------------------------------- Summary of changes: src/backend/postmaster/postmaster.c | 4 ---- 1 files changed, 0 insertions(+), 4 deletions(-) hooks/post-receive -- Postgres-XC |
From: mason_s <ma...@us...> - 2010-10-26 00:30:31
|
Project "Postgres-XC". The branch, master has been updated via fee989010d22b6ca6c47b72d2d9b0620e4ab42b8 (commit) from e11ab021fa203b8902c790e83e5bd78dbc4b2729 (commit) - Log ----------------------------------------------------------------- commit fee989010d22b6ca6c47b72d2d9b0620e4ab42b8 Author: Mason Sharp <ma...@us...> Date: Mon Oct 25 20:29:24 2010 -0400 When there is a data node crash, sometimes we were trying to read from a bad socket. diff --git a/src/backend/pgxc/pool/pgxcnode.c b/src/backend/pgxc/pool/pgxcnode.c index 5340a93..0d90273 100644 --- a/src/backend/pgxc/pool/pgxcnode.c +++ b/src/backend/pgxc/pool/pgxcnode.c @@ -272,10 +272,16 @@ pgxc_node_receive(const int conn_count, continue; /* prepare select params */ - if (nfds < connections[i]->sock) + if (connections[i]->sock > 0) + { + FD_SET(connections[i]->sock, &readfds); nfds = connections[i]->sock; - - FD_SET(connections[i]->sock, &readfds); + } + else + { + /* flag as bad, it will be removed from the list */ + connections[i]->state == DN_CONNECTION_STATE_ERROR_NOT_READY; + } } /* ----------------------------------------------------------------------- Summary of changes: src/backend/pgxc/pool/pgxcnode.c | 12 +++++++++--- 1 files changed, 9 insertions(+), 3 deletions(-) hooks/post-receive -- Postgres-XC |
From: Pavan D. <pa...@us...> - 2010-10-19 06:52:25
|
Project "Postgres-XC". The branch, PGXC-sqlmed has been updated via 69bb66c62f71b9be918475ea65931adb3bbfba20 (commit) from 2a313446f3e714ba36c9ccc5c5167309b7c89a95 (commit) - Log ----------------------------------------------------------------- commit 69bb66c62f71b9be918475ea65931adb3bbfba20 Author: Pavan Deolasee <pav...@gm...> Date: Tue Oct 19 12:20:44 2010 +0530 Set aliases properly for join reduction diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index c3cb3b7..a753e95 100644 --- a/src/backend/optimizer/plan/createplan.c +++ b/src/backend/optimizer/plan/createplan.c @@ -827,6 +827,8 @@ create_remotejoin_plan(PlannerInfo *root, JoinPath *best_path, Plan *parent, Pla result->outer_alias = pstrdup(out_alias); result->inner_reduce_level = inner->reduce_level; result->outer_reduce_level = outer->reduce_level; + result->inner_relids = in_relids; + result->outer_relids = out_relids; appendStringInfo(&fromlist, " %s (%s) %s", pname, inner->sql_statement, quote_identifier(in_alias)); ----------------------------------------------------------------------- Summary of changes: src/backend/optimizer/plan/createplan.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) hooks/post-receive -- Postgres-XC |
From: mason_s <ma...@us...> - 2010-10-18 20:21:32
|
Project "Postgres-XC". The branch, PGXC-sqlmed has been updated via 2a313446f3e714ba36c9ccc5c5167309b7c89a95 (commit) from f275fa535e9673af0964ecc7ca93ab1b49df2317 (commit) - Log ----------------------------------------------------------------- commit 2a313446f3e714ba36c9ccc5c5167309b7c89a95 Author: Mason Sharp <ma...@us...> Date: Mon Oct 18 16:15:16 2010 -0400 Added IsJoinReducible to determine if the two plan nodes can be joined. See comments for this function for more details. Basically, we use examine_conditions_walker to check if it is safe to join the two. Partitioned-partitioned joins are safe to collapse, and partitioned-replicated are safe iff one of the nodes does not already contain such a collapsed node. diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index d9d5e4c..c3cb3b7 100644 --- a/src/backend/optimizer/plan/createplan.c +++ b/src/backend/optimizer/plan/createplan.c @@ -77,7 +77,7 @@ static WorkTableScan *create_worktablescan_plan(PlannerInfo *root, Path *best_pa #ifdef PGXC static RemoteQuery *create_remotequery_plan(PlannerInfo *root, Path *best_path, List *tlist, List *scan_clauses); -static Plan *create_remotejoin_plan(PlannerInfo *root, Path *best_path, +static Plan *create_remotejoin_plan(PlannerInfo *root, JoinPath *best_path, Plan *parent, Plan *outer_plan, Plan *inner_plan); static void create_remote_target_list(PlannerInfo *root, StringInfo targets, List *out_tlist, List *in_tlist, @@ -574,7 +574,7 @@ create_join_plan(PlannerInfo *root, JoinPath *best_path) #ifdef PGXC /* check if this join can be reduced to an equiv. remote scan node */ - plan = create_remotejoin_plan(root, (Path *)best_path, plan, outer_plan, inner_plan); + plan = create_remotejoin_plan(root, best_path, plan, outer_plan, inner_plan); #endif return plan; @@ -627,7 +627,7 @@ create_join_plan(PlannerInfo *root, JoinPath *best_path) * this code a lot much readable and easier. */ static Plan * -create_remotejoin_plan(PlannerInfo *root, Path *best_path, Plan *parent, Plan *outer_plan, Plan *inner_plan) +create_remotejoin_plan(PlannerInfo *root, JoinPath *best_path, Plan *parent, Plan *outer_plan, Plan *inner_plan) { NestLoop *nest_parent; @@ -662,19 +662,37 @@ create_remotejoin_plan(PlannerInfo *root, Path *best_path, Plan *parent, Plan *o IsA(inner_plan, Material) && IsA(((Material *) inner_plan)->plan.lefttree, RemoteQuery)) { + int i; + List *rtable_list = NIL; + bool partitioned_replicated_join = false; + Material *outer_mat = (Material *)outer_plan; Material *inner_mat = (Material *)inner_plan; RemoteQuery *outer = (RemoteQuery *)outer_mat->plan.lefttree; RemoteQuery *inner = (RemoteQuery *)inner_mat->plan.lefttree; + /* * Check if both these plans are from the same remote node. If yes, * replace this JOIN along with it's two children with one equivalent * remote node */ + /* + * Build up rtable for XC Walker + * (was not sure I could trust this, but it seems to work in various cases) + */ + for (i = 0; i < root->simple_rel_array_size; i++) + { + RangeTblEntry *rte = root->simple_rte_array[i]; + + /* Check for NULL first, sometimes it is NULL at position 0 */ + if (rte) + rtable_list = lappend(rtable_list, root->simple_rte_array[i]); + } + /* XXX Check if the join optimization is possible */ - if (true) + if (IsJoinReducible(inner, outer, rtable_list, best_path, &partitioned_replicated_join)) { RemoteQuery *result; Plan *result_plan; @@ -898,6 +916,8 @@ create_remotejoin_plan(PlannerInfo *root, Path *best_path, Plan *parent, Plan *o result->base_tlist = base_tlist; result->relname = "__FOREIGN_QUERY__"; + result->partitioned_replicated = partitioned_replicated_join; + /* * if there were any local scan clauses stick them up here. They * can come from the join node or from remote scan node themselves. diff --git a/src/backend/pgxc/plan/planner.c b/src/backend/pgxc/plan/planner.c index 29e4ee0..e678a14 100644 --- a/src/backend/pgxc/plan/planner.c +++ b/src/backend/pgxc/plan/planner.c @@ -87,7 +87,8 @@ typedef struct /* If two relations are joined based on special location information */ typedef enum PGXCJoinType { - JOIN_REPLICATED, + JOIN_REPLICATED_ONLY, + JOIN_REPLICATED_PARTITIONED, JOIN_COLOCATED_PARTITIONED, JOIN_OTHER } PGXCJoinType; @@ -144,6 +145,7 @@ static ExecNodes *get_plan_nodes(Query *query, bool isRead); static bool get_plan_nodes_walker(Node *query_node, XCWalkerContext *context); static bool examine_conditions_walker(Node *expr_node, XCWalkerContext *context); static int handle_limit_offset(RemoteQuery *query_step, Query *query, PlannedStmt *plan_stmt); +static void InitXCWalkerContext(XCWalkerContext *context); /* * True if both lists contain only one node and are the same @@ -693,15 +695,20 @@ examine_conditions_walker(Node *expr_node, XCWalkerContext *context) if (rel_loc_info1->locatorType == LOCATOR_TYPE_REPLICATED) { + /* add to replicated join conditions */ context->conditions->replicated_joins = - lappend(context->conditions->replicated_joins, opexpr); + lappend(context->conditions->replicated_joins, pgxc_join); if (colvar->varlevelsup != colvar2->varlevelsup) context->multilevel_join = true; - if (rel_loc_info2->locatorType != LOCATOR_TYPE_REPLICATED) + if (rel_loc_info2->locatorType == LOCATOR_TYPE_REPLICATED) + pgxc_join->join_type = JOIN_REPLICATED_ONLY; + else { + pgxc_join->join_type = JOIN_REPLICATED_PARTITIONED; + /* Note other relation, saves us work later. */ context->conditions->base_rel_name = column_base2->relname; context->conditions->base_rel_loc_info = rel_loc_info2; @@ -717,23 +724,21 @@ examine_conditions_walker(Node *expr_node, XCWalkerContext *context) FreeRelationLocInfo(rel_loc_info2); } - /* note nature of join between the two relations */ - pgxc_join->join_type = JOIN_REPLICATED; return false; } else if (rel_loc_info2->locatorType == LOCATOR_TYPE_REPLICATED) { + /* note nature of join between the two relations */ + pgxc_join->join_type = JOIN_REPLICATED_PARTITIONED; + /* add to replicated join conditions */ context->conditions->replicated_joins = - lappend(context->conditions->replicated_joins, opexpr); + lappend(context->conditions->replicated_joins, pgxc_join); /* other relation not replicated, note it for later */ context->conditions->base_rel_name = column_base->relname; context->conditions->base_rel_loc_info = rel_loc_info1; - /* note nature of join between the two relations */ - pgxc_join->join_type = JOIN_REPLICATED; - if (rel_loc_info2) FreeRelationLocInfo(rel_loc_info2); @@ -1259,6 +1264,23 @@ get_plan_nodes_walker(Node *query_node, XCWalkerContext *context) return false; } +/* + * Set initial values for expression walker + */ +static void +InitXCWalkerContext(XCWalkerContext *context) +{ + context->isRead = true; + context->exec_nodes = NULL; + context->conditions = (Special_Conditions *) palloc0(sizeof(Special_Conditions)); + context->rtables = NIL; + context->multilevel_join = false; + context->varno = 0; + context->within_or = false; + context->within_not = false; + context->exec_on_coord = false; + context->join_list = NIL; +} /* * Top level entry point before walking query to determine plan nodes @@ -1271,18 +1293,9 @@ get_plan_nodes(Query *query, bool isRead) XCWalkerContext context; - context.query = query; + InitXCWalkerContext(&context); context.isRead = isRead; - context.exec_nodes = NULL; - context.conditions = (Special_Conditions *) palloc0(sizeof(Special_Conditions)); - context.rtables = NIL; context.rtables = lappend(context.rtables, query->rtable); - context.multilevel_join = false; - context.varno = 0; - context.within_or = false; - context.within_not = false; - context.exec_on_coord = false; - context.join_list = NIL; if (!get_plan_nodes_walker((Node *) query, &context)) result_nodes = context.exec_nodes; @@ -2315,3 +2328,148 @@ free_query_step(RemoteQuery *query_step) list_free_deep(query_step->simple_aggregates); pfree(query_step); } + + +/* + * See if we can reduce the passed in RemoteQuery nodes to a single step. + * + * We need to check when we can further collapse already collapsed nodes. + * We cannot always collapse- we do not want to allow a replicated table + * to be used twice. That is if we have + * + * partitioned_1 -- replicated -- partitioned_2 + * + * partitioned_1 and partitioned_2 cannot (usually) be safely joined only + * locally. + * We can do this by checking (may need tracking) what type it is, + * and looking at context->conditions->replicated_joins + * + * The following cases are possible, and whether or not it is ok + * to reduce. + * + * If the join between the two RemoteQuery nodes is replicated + * + * Node 1 Node 2 + * rep-part folded rep-part folded ok to reduce? + * 0 0 0 1 1 + * 0 0 1 1 1 + * 0 1 0 1 1 + * 0 1 1 1 1 + * 1 1 1 1 0 + * + * + * If the join between the two RemoteQuery nodes is replicated - partitioned + * + * Node 1 Node 2 + * rep-part folded rep-part folded ok to reduce? + * 0 0 0 1 1 + * 0 0 1 1 0 + * 0 1 0 1 1 + * 0 1 1 1 0 + * 1 1 1 1 0 + * + * + * If the join between the two RemoteQuery nodes is partitioned - partitioned + * it is always reducibile safely, + * + * RemoteQuery *innernode - the inner node + * RemoteQuery *outernode - the outer node + * bool *partitioned_replicated - set to true if we have a partitioned-replicated + * join. We want to use replicated tables with non-replicated + * tables ony once. Only use this value if this function + * returns true. + */ +bool +IsJoinReducible(RemoteQuery *innernode, RemoteQuery *outernode, + List *rtable_list, JoinPath *join_path, bool *partitioned_replicated) +{ + XCWalkerContext context; + ListCell *cell; + bool maybe_reducible = false; + bool result = false; + + + *partitioned_replicated = false; + + InitXCWalkerContext(&context); + context.isRead = true; /* PGXCTODO - determine */ + context.rtables = NIL; + context.rtables = lappend(context.rtables, rtable_list); /* add to list of lists */ + + + + foreach(cell, join_path->joinrestrictinfo) + { + RestrictInfo *node = (RestrictInfo *) lfirst(cell); + + /* + * Check if we can fold these safely. + * + * If examine_conditions_walker() returns true, + * then it definitely is not collapsable. + * If it returns false, it may or may not be, we have to check + * context.conditions at the end. + * We keep trying, because another condition may fulfill the criteria. + */ + maybe_reducible = !examine_conditions_walker((Node *) node->clause, &context); + + if (!maybe_reducible) + break; + + } + + /* check to see if we found any partitioned or replicated joins */ + if (maybe_reducible && + (context.conditions->partitioned_parent_child + || context.conditions->replicated_joins)) + { + /* + * If we get here, we think that we can fold the + * RemoteQuery nodes into a single one. + */ + result = true; + + /* Check replicated-replicated and replicated-partitioned joins */ + if (context.conditions->replicated_joins) + { + ListCell *cell; + + /* if we already reduced with replicated tables already, we + * cannot here. + * PGXCTODO - handle more cases and use outer_relids and inner_relids + * For now we just give up. + */ + if ((innernode->remotejoin && innernode->partitioned_replicated) && + (outernode->remotejoin && outernode->partitioned_replicated)) + { + /* not reducible after all */ + return false; + } + + foreach(cell, context.conditions->replicated_joins) + { + PGXC_Join *pgxc_join = (PGXC_Join *) lfirst(cell); + + if (pgxc_join->join_type == JOIN_REPLICATED_PARTITIONED) + { + *partitioned_replicated = true; + + /* + * If either of these already have such a join, we do not + * want to add it a second time. + */ + if ((innernode->remotejoin && innernode->partitioned_replicated) || + (outernode->remotejoin && outernode->partitioned_replicated)) + { + /* not reducible after all */ + return false; + } + } + } + } + } + + return result; +} + + diff --git a/src/include/pgxc/planner.h b/src/include/pgxc/planner.h index ef00f27..8aae356 100644 --- a/src/include/pgxc/planner.h +++ b/src/include/pgxc/planner.h @@ -89,6 +89,7 @@ typedef struct char *relname; bool remotejoin; /* True if this is a reduced remote join */ + bool partitioned_replicated; /* True if reduced and contains replicated-partitioned join */ int reduce_level; /* in case of reduced JOIN, it's level */ List *base_tlist; /* in case of isReduced, the base tlist */ char *outer_alias; @@ -177,4 +178,8 @@ extern PlannedStmt *pgxc_planner(Query *query, int cursorOptions, extern bool IsHashDistributable(Oid col_type); extern bool is_immutable_func(Oid funcid); + +extern bool IsJoinReducible(RemoteQuery *innernode, RemoteQuery *outernode, + List *rtable_list, JoinPath *join_path, bool *partitioned_replicated); + #endif /* PGXCPLANNER_H */ ----------------------------------------------------------------------- Summary of changes: src/backend/optimizer/plan/createplan.c | 28 ++++- src/backend/pgxc/plan/planner.c | 196 ++++++++++++++++++++++++++++--- src/include/pgxc/planner.h | 5 + 3 files changed, 206 insertions(+), 23 deletions(-) hooks/post-receive -- Postgres-XC |
From: mason_s <ma...@us...> - 2010-10-18 18:57:34
|
Project "Postgres-XC". The branch, master has been updated via e11ab021fa203b8902c790e83e5bd78dbc4b2729 (commit) from ca4fb6103add2b4560b8efe142f24d94ed03d56e (commit) - Log ----------------------------------------------------------------- commit e11ab021fa203b8902c790e83e5bd78dbc4b2729 Author: Mason Sharp <ma...@us...> Date: Mon Oct 18 14:52:45 2010 -0400 SourceForge Bug ID: 3076224 checkpoint command causes seg fault Prevent a manual checkpoint from crashing nodes. Note, this does not mean that there is a cluster-wide coordinated checkpoint; it just passes it down to the nodes. Written by Benny Mei Le diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c index 053751c..eb704ce 100644 --- a/src/backend/tcop/pquery.c +++ b/src/backend/tcop/pquery.c @@ -21,6 +21,9 @@ #include "executor/tstoreReceiver.h" #include "miscadmin.h" #include "pg_trace.h" +#ifdef PGXC +#include "pgxc/pgxc.h" +#endif #include "tcop/pquery.h" #include "tcop/tcopprot.h" #include "tcop/utility.h" @@ -1192,7 +1195,11 @@ PortalRunUtility(Portal portal, Node *utilityStmt, bool isTopLevel, IsA(utilityStmt, ListenStmt) || IsA(utilityStmt, NotifyStmt) || IsA(utilityStmt, UnlistenStmt) || +#ifdef PGXC + (IsA(utilityStmt, CheckPointStmt) && IS_PGXC_DATANODE))) +#else IsA(utilityStmt, CheckPointStmt))) +#endif { PushActiveSnapshot(GetTransactionSnapshot()); active_snapshot_set = true; ----------------------------------------------------------------------- Summary of changes: src/backend/tcop/pquery.c | 7 +++++++ 1 files changed, 7 insertions(+), 0 deletions(-) hooks/post-receive -- Postgres-XC |
From: Pavan D. <pa...@us...> - 2010-10-18 06:25:02
|
Project "Postgres-XC". The branch, PGXC-sqlmed has been updated via f275fa535e9673af0964ecc7ca93ab1b49df2317 (commit) from 6af07721357944af801a384ed1eb54e363839403 (commit) - Log ----------------------------------------------------------------- commit f275fa535e9673af0964ecc7ca93ab1b49df2317 Author: Pavan Deolasee <pav...@gm...> Date: Mon Oct 18 11:53:54 2010 +0530 Fix a bug where rte/alias were not getting set up properly diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index e8134d1..d9d5e4c 100644 --- a/src/backend/optimizer/plan/createplan.c +++ b/src/backend/optimizer/plan/createplan.c @@ -1159,8 +1159,9 @@ create_remote_expr(PlannerInfo *root, Plan *parent, StringInfo expr, Assert(cell != NULL); rte->eref = lfirst(cell); - rte->alias = lfirst(lnext(cell)); + cell = lnext(cell); + rte->alias = lfirst(cell); cell = lnext(cell); } bms_free(tmprelids); ----------------------------------------------------------------------- Summary of changes: src/backend/optimizer/plan/createplan.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) hooks/post-receive -- Postgres-XC |
From: Pavan D. <pa...@us...> - 2010-10-18 06:08:18
|
Project "Postgres-XC". The branch, PGXC-sqlmed has been created at 6af07721357944af801a384ed1eb54e363839403 (commit) - Log ----------------------------------------------------------------- commit 6af07721357944af801a384ed1eb54e363839403 Author: Pavan Deolasee <pav...@gm...> Date: Mon Oct 18 11:35:43 2010 +0530 Update some missing copy/out/read functions diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index 1d3155b..d4ae006 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -1849,6 +1849,7 @@ _copyRangeTblEntry(RangeTblEntry *from) COPY_SCALAR_FIELD(rtekind); #ifdef PGXC + COPY_STRING_FIELD(relname); if (from->reltupdesc) newnode->reltupdesc = CreateTupleDescCopy(from->reltupdesc); #endif diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index 9fcbe4c..85cfaca 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -2028,6 +2028,9 @@ _outRangeTblEntry(StringInfo str, RangeTblEntry *node) WRITE_NODE_FIELD(alias); WRITE_NODE_FIELD(eref); WRITE_ENUM_FIELD(rtekind, RTEKind); +#ifdef PGXC + WRITE_STRING_FIELD(relname); +#endif switch (node->rtekind) { diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c index 72a2156..c928bf8 100644 --- a/src/backend/nodes/readfuncs.c +++ b/src/backend/nodes/readfuncs.c @@ -1118,6 +1118,9 @@ _readRangeTblEntry(void) READ_NODE_FIELD(alias); READ_NODE_FIELD(eref); READ_ENUM_FIELD(rtekind, RTEKind); +#ifdef PGXC + READ_STRING_FIELD(relname); +#endif switch (local_node->rtekind) { commit 7bcb490dc50eeb1ad1569d90cc5eb759b766aa91 Author: Pavan Deolasee <pav...@gm...> Date: Mon Oct 18 11:33:54 2010 +0530 Initial implementation of remote join reduction. We still don't have the logic to determine whether its safe to reduce two join trees or not diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index aa92917..5099162 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -686,7 +686,11 @@ explain_outNode(StringInfo str, Assert(rte->rtekind == RTE_RELATION); /* We only show the rel name, not schema name */ +#ifdef PGXC + relname = rte->relname; +#else relname = get_rel_name(rte->relid); +#endif appendStringInfo(str, " on %s", quote_identifier(relname)); diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index 134b9e1..1d3155b 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -840,6 +840,17 @@ _copyRemoteQuery(RemoteQuery *from) COPY_SCALAR_FIELD(read_only); COPY_SCALAR_FIELD(force_autocommit); + COPY_STRING_FIELD(relname); + COPY_SCALAR_FIELD(remotejoin); + COPY_SCALAR_FIELD(reduce_level); + COPY_NODE_FIELD(base_tlist); + COPY_STRING_FIELD(outer_alias); + COPY_STRING_FIELD(inner_alias); + COPY_SCALAR_FIELD(outer_reduce_level); + COPY_SCALAR_FIELD(inner_reduce_level); + COPY_BITMAPSET_FIELD(outer_relids); + COPY_BITMAPSET_FIELD(inner_relids); + return newnode; } diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index be80f18..9fcbe4c 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -1502,6 +1502,9 @@ _outPlannerInfo(StringInfo str, PlannerInfo *node) WRITE_BOOL_FIELD(hasHavingQual); WRITE_BOOL_FIELD(hasPseudoConstantQuals); WRITE_BOOL_FIELD(hasRecursion); +#ifdef PGXC + WRITE_INT_FIELD(rs_alias_index); +#endif WRITE_INT_FIELD(wt_param_id); } diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index fcbb8ca..8bb9057 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -109,6 +109,9 @@ bool enable_hashagg = true; bool enable_nestloop = true; bool enable_mergejoin = true; bool enable_hashjoin = true; +#ifdef PGXC +bool enable_remotejoin = true; +#endif typedef struct { diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index 4f3a7c6..e8134d1 100644 --- a/src/backend/optimizer/plan/createplan.c +++ b/src/backend/optimizer/plan/createplan.c @@ -38,6 +38,7 @@ #include "utils/builtins.h" #include "utils/syscache.h" #include "catalog/pg_proc.h" +#include "executor/executor.h" #endif #include "utils/lsyscache.h" @@ -76,6 +77,14 @@ static WorkTableScan *create_worktablescan_plan(PlannerInfo *root, Path *best_pa #ifdef PGXC static RemoteQuery *create_remotequery_plan(PlannerInfo *root, Path *best_path, List *tlist, List *scan_clauses); +static Plan *create_remotejoin_plan(PlannerInfo *root, Path *best_path, + Plan *parent, Plan *outer_plan, Plan *inner_plan); +static void create_remote_target_list(PlannerInfo *root, + StringInfo targets, List *out_tlist, List *in_tlist, + char *out_alias, int out_index, + char *in_alias, int in_index); +static Alias *generate_remote_rte_alias(RangeTblEntry *rte, int varno, + char *aliasname, int reduce_level); #endif static NestLoop *create_nestloop_plan(PlannerInfo *root, NestPath *best_path, Plan *outer_plan, Plan *inner_plan); @@ -146,7 +155,12 @@ static Sort *make_sort(PlannerInfo *root, Plan *lefttree, int numCols, static Material *make_material(Plan *lefttree); #ifdef PGXC +static void findReferencedVars(List *parent_vars, Plan *plan, List **out_tlist, Relids *out_relids); extern bool is_foreign_qual(Node *clause); +static void create_remote_clause_expr(PlannerInfo *root, Plan *parent, StringInfo clauses, + List *qual, RemoteQuery *scan); +static void create_remote_expr(PlannerInfo *root, Plan *parent, StringInfo expr, + Node *node, RemoteQuery *scan); #endif /* @@ -228,9 +242,6 @@ create_scan_plan(PlannerInfo *root, Path *best_path) List *tlist; List *scan_clauses; Plan *plan; -#ifdef PGXC - Plan *matplan; -#endif /* * For table scans, rather than using the relation targetlist (which is @@ -561,9 +572,604 @@ create_join_plan(PlannerInfo *root, JoinPath *best_path) get_actual_clauses(get_loc_restrictinfo(best_path)))); #endif +#ifdef PGXC + /* check if this join can be reduced to an equiv. remote scan node */ + plan = create_remotejoin_plan(root, (Path *)best_path, plan, outer_plan, inner_plan); +#endif + return plan; } +#ifdef PGXC +/* + * create_remotejoin_plan + * check if the children plans involve remote entities from the same remote + * node. If so, this join can be reduced to an equivalent remote scan plan + * node + * + * RULES: + * + * * provide unique aliases to both inner and outer nodes to represent their + * corresponding subqueries + * + * * identify target entries from both inner and outer that appear in the join + * targetlist, only those need to be selected from these aliased subqueries + * + * * a join node has a joinqual list which represents the join condition. E.g. + * SELECT * from emp e LEFT JOIN emp2 d ON e.x = d.x + * Here the joinqual contains "e.x = d.x". If the joinqual itself has a local + * dependency, e.g "e.x = localfunc(d.x)", then this join cannot be reduced + * + * * other than the joinqual, the join node can contain additional quals. Even + * if they have any local dependencies, we can reduce the join and just + * append these quals into the reduced remote scan node. We DO do a pass to + * identify remote quals and ship those in the squery though + * + * * these quals (both joinqual and normal quals with no local dependencies) + * need to be converted into expressions referring to the aliases assigned to + * the nodes. These expressions will eventually become part of the squery of + * the reduced remote scan node + * + * * the children remote scan nodes themselves can have local dependencies in + * their quals (the remote ones are already part of the squery). We can still + * reduce the join and just append these quals into the reduced remote scan + * node + * + * * if we reached successfully so far, generate a new remote scan node with + * this new squery generated using the aliased references + * + * One important point to note here about targetlists is that this function + * does not set any DUMMY var references in the Var nodes appearing in it. It + * follows the standard mechanism as is followed by other nodes. Similar to the + * existing nodes, the references which point to DUMMY vars is done in + * set_remote_references() function in set_plan_references phase at the fag + * end. Avoiding such DUMMY references manipulations till the end also makes + * this code a lot much readable and easier. + */ +static Plan * +create_remotejoin_plan(PlannerInfo *root, Path *best_path, Plan *parent, Plan *outer_plan, Plan *inner_plan) +{ + NestLoop *nest_parent; + + if (!enable_remotejoin) + return parent; + + /* meh, what are these for :( */ + if (root->hasPseudoConstantQuals) + return parent; + + /* Works only for SELECT commands right now */ + if (root->parse->commandType != CMD_SELECT) + return parent; + + /* do not optimize CURSOR based select statements */ + if (root->parse->rowMarks != NIL) + return parent; + + /* + * optimize only simple NestLoop joins for now. Other joins like Merge and + * Hash can be reduced too. But they involve additional intermediate nodes + * and we need to understand them a bit more as yet + */ + if (!IsA(parent, NestLoop)) + return parent; + else + nest_parent = (NestLoop *)parent; + + /* check if both the nodes qualify for reduction */ + if (IsA(outer_plan, Material) && + IsA(((Material *) outer_plan)->plan.lefttree, RemoteQuery) && + IsA(inner_plan, Material) && + IsA(((Material *) inner_plan)->plan.lefttree, RemoteQuery)) + { + Material *outer_mat = (Material *)outer_plan; + Material *inner_mat = (Material *)inner_plan; + + RemoteQuery *outer = (RemoteQuery *)outer_mat->plan.lefttree; + RemoteQuery *inner = (RemoteQuery *)inner_mat->plan.lefttree; + /* + * Check if both these plans are from the same remote node. If yes, + * replace this JOIN along with it's two children with one equivalent + * remote node + */ + + /* XXX Check if the join optimization is possible */ + if (true) + { + RemoteQuery *result; + Plan *result_plan; + StringInfoData targets, clauses, scan_clauses, fromlist; + StringInfoData squery; + List *parent_vars, *out_tlist = NIL, *in_tlist = NIL, *base_tlist; + ListCell *l; + char in_alias[15], out_alias[15]; + Relids out_relids = NULL, in_relids = NULL; + bool use_where = false; + Index dummy_rtindex; + RangeTblEntry *dummy_rte; + List *local_scan_clauses = NIL, *remote_scan_clauses = NIL; + char *pname; + + + /* KISS! As long as distinct aliases are provided for all the objects in + * involved in query, remote server should not crib! */ + sprintf(in_alias, "out_%d", root->rs_alias_index); + sprintf(out_alias, "in_%d", root->rs_alias_index); + + /* + * Walk the left, right trees and identify which vars appear in the + * parent targetlist, only those need to be selected. Note that + * depending on whether the parent targetlist is top-level or + * intermediate, the children vars may or may not be referenced + * multiple times in it. + */ + parent_vars = pull_var_clause((Node *)parent->targetlist, PVC_REJECT_PLACEHOLDERS); + + findReferencedVars(parent_vars, outer_plan, &out_tlist, &out_relids); + findReferencedVars(parent_vars, inner_plan, &in_tlist, &in_relids); + + /* + * If the JOIN ON clause has a local dependency then we cannot ship + * the join to the remote side at all, bail out immediately. + */ + if (!is_foreign_qual((Node *)nest_parent->join.joinqual)) + { + elog(DEBUG1, "cannot reduce: local dependencies in the joinqual"); + return parent; + } + + /* + * If the normal plan qual has local dependencies, the join can + * still be shipped. Try harder to ship remote clauses out of the + * entire list. These local quals will become part of the quals + * list of the reduced remote scan node down later. + */ + if (!is_foreign_qual((Node *)nest_parent->join.plan.qual)) + { + elog(DEBUG1, "local dependencies in the join plan qual"); + + /* + * trawl through each entry and come up with remote and local + * clauses... sigh + */ + foreach(l, nest_parent->join.plan.qual) + { + Node *clause = lfirst(l); + + /* + * if the currentof in the above call to + * clause_is_local_bound is set, somewhere in the list there + * is currentof clause, so keep that information intact and + * pass a dummy argument here. + */ + if (!is_foreign_qual((Node *)clause)) + local_scan_clauses = lappend(local_scan_clauses, clause); + else + remote_scan_clauses = lappend(remote_scan_clauses, clause); + } + } + else + { + /* + * there is no local bound clause, all the clauses are remote + * scan clauses + */ + remote_scan_clauses = nest_parent->join.plan.qual; + } + + /* generate the tlist for the new RemoteScan node using out_tlist, in_tlist */ + initStringInfo(&targets); + create_remote_target_list(root, &targets, out_tlist, in_tlist, + out_alias, outer->reduce_level, in_alias, inner->reduce_level); + + /* + * generate the fromlist now. The code has to appropriately mention + * the JOIN type in the string being generated. + */ + initStringInfo(&fromlist); + appendStringInfo(&fromlist, " (%s) %s ", + outer->sql_statement, quote_identifier(out_alias)); + + use_where = false; + switch (nest_parent->join.jointype) + { + case JOIN_INNER: + pname = ", "; + use_where = true; + break; + case JOIN_LEFT: + pname = "LEFT JOIN"; + break; + case JOIN_FULL: + pname = "FULL JOIN"; + break; + case JOIN_RIGHT: + pname = "RIGHT JOIN"; + break; + case JOIN_SEMI: + case JOIN_ANTI: + default: + return parent; + } + + /* + * splendid! we can actually replace this join hierarchy with a + * single RemoteScan node now. Start off by constructing the + * appropriate new tlist and tupdescriptor + */ + result = makeNode(RemoteQuery); + + /* + * Save various information about the inner and the outer plans. We + * may need this information later if more entries are added to it + * as part of the remote expression optimization + */ + result->remotejoin = true; + result->inner_alias = pstrdup(in_alias); + result->outer_alias = pstrdup(out_alias); + result->inner_reduce_level = inner->reduce_level; + result->outer_reduce_level = outer->reduce_level; + + appendStringInfo(&fromlist, " %s (%s) %s", + pname, inner->sql_statement, quote_identifier(in_alias)); + + /* generate join.joinqual remote clause string representation */ + initStringInfo(&clauses); + if (nest_parent->join.joinqual != NIL) + { + create_remote_clause_expr(root, parent, &clauses, + nest_parent->join.joinqual, result); + } + + /* generate join.plan.qual remote clause string representation */ + initStringInfo(&scan_clauses); + if (remote_scan_clauses != NIL) + { + create_remote_clause_expr(root, parent, &scan_clauses, + remote_scan_clauses, result); + } + + /* + * set the base tlist of the involved base relations, useful in + * set_plan_refs later. Additionally the tupledescs should be + * generated using this base_tlist and not the parent targetlist. + * This is because we want to take into account any additional + * column references from the scan clauses too + */ + base_tlist = add_to_flat_tlist(NIL, list_concat(out_tlist, in_tlist)); + + /* cook up the reltupdesc using this base_tlist */ + dummy_rte = makeNode(RangeTblEntry); + dummy_rte->reltupdesc = ExecTypeFromTL(base_tlist, false); + dummy_rte->rtekind = RTE_RELATION; + + /* use a dummy relname... */ + dummy_rte->relname = "__FOREIGN_QUERY__"; + dummy_rte->eref = makeAlias("__FOREIGN_QUERY__", NIL); + /* not sure if we need to set the below explicitly.. */ + dummy_rte->inh = false; + dummy_rte->inFromCl = false; + dummy_rte->requiredPerms = 0; + dummy_rte->checkAsUser = 0; + dummy_rte->selectedCols = NULL; + dummy_rte->modifiedCols = NULL; + + /* + * Append the dummy range table entry to the range table. + * Note that this modifies the master copy the caller passed us, otherwise + * e.g EXPLAIN VERBOSE will fail to find the rte the Vars built below refer + * to. + */ + root->parse->rtable = lappend(root->parse->rtable, dummy_rte); + dummy_rtindex = list_length(root->parse->rtable); + + result_plan = &result->scan.plan; + + /* the join targetlist becomes this node's tlist */ + result_plan->targetlist = parent->targetlist; + result_plan->lefttree = NULL; + result_plan->righttree = NULL; + result->scan.scanrelid = dummy_rtindex; + + /* generate the squery for this node */ + + /* NOTE: it's assumed that the remote_paramNums array is + * filled in the same order as we create the query here. + * + * TODO: we need some way to ensure that the remote_paramNums + * is filled in the same order as the order in which the clauses + * are added in the query below. + */ + initStringInfo(&squery); + appendStringInfo(&squery, "SELECT %s FROM %s", targets.data, fromlist.data); + + if (clauses.data[0] != '\0') + appendStringInfo(&squery, " %s %s", use_where? " WHERE " : " ON ", clauses.data); + + if (scan_clauses.data[0] != '\0') + appendStringInfo(&squery, " %s %s", use_where? " AND " : " WHERE ", scan_clauses.data); + + result->sql_statement = squery.data; + /* don't forget to increment the index for the next time around! */ + result->reduce_level = root->rs_alias_index++; + + + /* set_plan_refs needs this later */ + result->base_tlist = base_tlist; + result->relname = "__FOREIGN_QUERY__"; + + /* + * if there were any local scan clauses stick them up here. They + * can come from the join node or from remote scan node themselves. + * Because of the processing being done earlier in + * create_remotescan_plan, all of the clauses if present will be + * local ones and hence can be stuck without checking for + * remoteness again here into result_plan->qual + */ + result_plan->qual = list_concat(result_plan->qual, outer_plan->qual); + result_plan->qual = list_concat(result_plan->qual, inner_plan->qual); + result_plan->qual = list_concat(result_plan->qual, local_scan_clauses); + + /* we actually need not worry about costs since this is the final plan */ + result_plan->startup_cost = outer_plan->startup_cost; + result_plan->total_cost = outer_plan->total_cost; + result_plan->plan_rows = outer_plan->plan_rows; + result_plan->plan_width = outer_plan->plan_width; + + return (Plan *)make_material(result_plan); + } + } + + return parent; +} + +/* + * Generate aliases for columns of remote tables using the + * colname_varno_varattno_reduce_level nomenclature + */ +static Alias * +generate_remote_rte_alias(RangeTblEntry *rte, int varno, char *aliasname, int reduce_level) +{ + TupleDesc tupdesc; + int maxattrs; + int varattno; + List *colnames = NIL; + StringInfo attr = makeStringInfo(); + + if (rte->rtekind != RTE_RELATION) + elog(ERROR, "called in improper context"); + + if (reduce_level == 0) + return makeAlias(aliasname, NIL); + + tupdesc = rte->reltupdesc; + maxattrs = tupdesc->natts; + + for (varattno = 0; varattno < maxattrs; varattno++) + { + Form_pg_attribute att = tupdesc->attrs[varattno]; + Value *attrname; + + resetStringInfo(attr); + appendStringInfo(attr, "%s_%d_%d_%d", + NameStr(att->attname), varno, varattno + 1, reduce_level); + + attrname = makeString(pstrdup(attr->data)); + + colnames = lappend(colnames, attrname); + } + + return makeAlias(aliasname, colnames); +} + +/* create_remote_target_list + * generate a targetlist using out_alias and in_alias appropriately. It is + * possible that in case of multiple-hierarchy reduction, both sides can have + * columns with the same name. E.g. consider the following: + * + * select * from emp e join emp f on e.x = f.x, emp g; + * + * So if we just use new_alias.columnname it can + * very easily clash with other columnname from the same side of an already + * reduced join. To avoid this, we generate unique column aliases using the + * following convention: + * colname_varno_varattno_reduce_level_index + * + * Each RemoteScan node carries it's reduce_level index to indicate the + * convention that should be adopted while referring to it's columns. If the + * level is 0, then normal column names can be used because they will never + * clash at the join level + */ +static void +create_remote_target_list(PlannerInfo *root, StringInfo targets, List *out_tlist, List *in_tlist, + char *out_alias, int out_index, char *in_alias, int in_index) +{ + int i = 0; + ListCell *l; + StringInfo attrname = makeStringInfo(); + bool add_null_target = true; + + foreach(l, out_tlist) + { + Var *var = (Var *) lfirst(l); + RangeTblEntry *rte = planner_rt_fetch(var->varno, root); + char *attname; + + + if (i++ > 0) + appendStringInfo(targets, ", "); + + attname = get_rte_attribute_name(rte, var->varattno); + + if (out_index) + { + resetStringInfo(attrname); + /* varattno can be negative for sys attributes, hence the abs! */ + appendStringInfo(attrname, "%s_%d_%d_%d", + attname, var->varno, abs(var->varattno), out_index); + appendStringInfo(targets, "%s.%s", + quote_identifier(out_alias), quote_identifier(attrname->data)); + } + else + appendStringInfo(targets, "%s.%s", + quote_identifier(out_alias), quote_identifier(attname)); + + /* generate the new alias now using root->rs_alias_index */ + resetStringInfo(attrname); + appendStringInfo(attrname, "%s_%d_%d_%d", + attname, var->varno, abs(var->varattno), root->rs_alias_index); + appendStringInfo(targets, " AS %s", quote_identifier(attrname->data)); + add_null_target = false; + } + + foreach(l, in_tlist) + { + Var *var = (Var *) lfirst(l); + RangeTblEntry *rte = planner_rt_fetch(var->varno, root); + char *attname; + + if (i++ > 0) + appendStringInfo(targets, ", "); + + attname = get_rte_attribute_name(rte, var->varattno); + + if (in_index) + { + resetStringInfo(attrname); + /* varattno can be negative for sys attributes, hence the abs! */ + appendStringInfo(attrname, "%s_%d_%d_%d", + attname, var->varno, abs(var->varattno), in_index); + appendStringInfo(targets, "%s.%s", + quote_identifier(in_alias), quote_identifier(attrname->data)); + } + else + appendStringInfo(targets, "%s.%s", + quote_identifier(in_alias), quote_identifier(attname)); + + /* generate the new alias now using root->rs_alias_index */ + resetStringInfo(attrname); + appendStringInfo(attrname, "%s_%d_%d_%d", + attname, var->varno, abs(var->varattno), root->rs_alias_index); + appendStringInfo(targets, " AS %s", quote_identifier(attrname->data)); + add_null_target = false; + } + + /* + * It's possible that in some cases, the targetlist might not refer to any + * vars from the joined relations, eg. + * select count(*) from t1, t2; select const from t1, t2; etc + * For such cases just add a NULL selection into this targetlist + */ + if (add_null_target) + appendStringInfo(targets, " NULL "); +} + +/* + * create_remote_clause_expr + * generate a string to represent the clause list expression using out_alias + * and in_alias references. This function does a cute hack by temporarily + * modifying the rte->eref entries of the involved relations to point to + * out_alias and in_alias appropriately. The deparse_expression call then + * generates a string using these erefs which is exactly what is desired here. + * + * Additionally it creates aliases for the column references based on the + * reduce_level values too. This handles the case when both sides have same + * named columns.. + * + * Obviously this function restores the eref, alias values to their former selves + * appropriately too, after use + */ +static void +create_remote_clause_expr(PlannerInfo *root, Plan *parent, StringInfo clauses, + List *qual, RemoteQuery *scan) +{ + Node *node = (Node *) make_ands_explicit(qual); + + return create_remote_expr(root, parent, clauses, node, scan); +} + +static void +create_remote_expr(PlannerInfo *root, Plan *parent, StringInfo expr, + Node *node, RemoteQuery *scan) +{ + List *context; + List *leref = NIL; + ListCell *cell; + char *exprstr; + int rtindex; + Relids tmprelids, relids; + + relids = pull_varnos((Node *)node); + + tmprelids = bms_copy(relids); + + while ((rtindex = bms_first_member(tmprelids)) >= 0) + { + RangeTblEntry *rte = planner_rt_fetch(rtindex, root); + + /* + * This rtindex should be a member of either out_relids or + * in_relids and never both + */ + if (bms_is_member(rtindex, scan->outer_relids) && + bms_is_member(rtindex, scan->inner_relids)) + elog(ERROR, "improper relid references in the join clause list"); + + /* + * save the current rte->eref and rte->alias values and stick in a new + * one in the rte with the proper inner or outer alias + */ + leref = lappend(leref, rte->eref); + leref = lappend(leref, rte->alias); + + if (bms_is_member(rtindex, scan->outer_relids)) + { + rte->eref = makeAlias(scan->outer_alias, NIL); + + /* attach proper column aliases.. */ + rte->alias = generate_remote_rte_alias(rte, rtindex, + scan->outer_alias, scan->outer_reduce_level); + } + if (bms_is_member(rtindex, scan->inner_relids)) + { + rte->eref = makeAlias(scan->inner_alias, NIL); + + /* attach proper column aliases.. */ + rte->alias = generate_remote_rte_alias(rte, rtindex, + scan->inner_alias, scan->inner_reduce_level); + } + } + bms_free(tmprelids); + + /* Set up deparsing context */ + context = deparse_context_for_plan((Node *) parent, + NULL, + root->parse->rtable, + NULL); + + exprstr = deparse_expression(node, context, true, false); + + /* revert back the saved eref entries in the same order now! */ + cell = list_head(leref); + tmprelids = bms_copy(relids); + while ((rtindex = bms_first_member(tmprelids)) >= 0) + { + RangeTblEntry *rte = planner_rt_fetch(rtindex, root); + + Assert(cell != NULL); + + rte->eref = lfirst(cell); + rte->alias = lfirst(lnext(cell)); + + cell = lnext(cell); + } + bms_free(tmprelids); + + appendStringInfo(expr, " %s", exprstr); + return; +} +#endif + /* * create_append_plan * Create an Append plan for 'best_path' and (recursively) plans @@ -3980,3 +4586,56 @@ is_projection_capable_plan(Plan *plan) } return true; } + +#ifdef PGXC +/* + * findReferencedVars() + * + * Constructs a list of those Vars in targetlist which are found in + * parent_vars (in other words, the intersection of targetlist and + * parent_vars). Returns a new list in *out_tlist and a bitmap of + * those relids found in the result. + * + * Additionally do look at the qual references to other vars! They + * also need to be selected.. + */ +static void +findReferencedVars(List *parent_vars, Plan *plan, List **out_tlist, Relids *out_relids) +{ + List *vars; + Relids relids = NULL; + List *tlist = NIL; + ListCell *l; + + /* Pull vars from both the targetlist and the clauses attached to this plan */ + vars = pull_var_clause((Node *)plan->targetlist, PVC_REJECT_PLACEHOLDERS); + + foreach(l, vars) + { + Var *var = lfirst(l); + + if (search_tlist_for_var(var, parent_vars)) + tlist = lappend(tlist, var); + + if (!bms_is_member(var->varno, relids)) + relids = bms_add_member(relids, var->varno); + } + + /* now consider the local quals */ + vars = pull_var_clause((Node *)plan->qual, PVC_REJECT_PLACEHOLDERS); + + foreach(l, vars) + { + Var *var = lfirst(l); + + if (search_tlist_for_var(var, tlist) == NULL) + tlist = lappend(tlist, var); + + if (!bms_is_member(var->varno, relids)) + relids = bms_add_member(relids, var->varno); + } + + *out_tlist = tlist; + *out_relids = relids; +} +#endif diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 2c95815..dc6ff35 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -301,6 +301,9 @@ subquery_planner(PlannerGlobal *glob, Query *parse, root->eq_classes = NIL; root->append_rel_list = NIL; +#ifdef PGXC + root->rs_alias_index = 1; +#endif root->hasRecursion = hasRecursion; if (hasRecursion) root->wt_param_id = SS_assign_worktable_param(root); diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c index cab7fb4..950e388 100644 --- a/src/backend/optimizer/plan/setrefs.c +++ b/src/backend/optimizer/plan/setrefs.c @@ -1401,6 +1401,32 @@ search_indexed_tlist_for_non_var(Node *node, return NULL; /* no match */ } +#ifdef PGXC +/* + * search_tlist_for_var --- find a Var in the provided tlist. This does a + * basic scan through the list. So not very efficient... + * + * If no match, return NULL. + * + */ +Var * +search_tlist_for_var(Var *var, List *jtlist) +{ + Index varno = var->varno; + AttrNumber varattno = var->varattno; + ListCell *l; + + foreach(l, jtlist) + { + Var *listvar = (Var *) lfirst(l); + + if (listvar->varno == varno && listvar->varattno == varattno) + return var; + } + return NULL; /* no match */ +} +#endif + /* * search_indexed_tlist_for_sortgroupref --- find a sort/group expression * (which is assumed not to be just a Var) diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c index d63e504..229b16d 100644 --- a/src/backend/parser/parse_relation.c +++ b/src/backend/parser/parse_relation.c @@ -925,6 +925,7 @@ addRangeTableEntry(ParseState *pstate, #ifdef PGXC rte->reltupdesc = CreateTupleDescCopyConstr(rel->rd_att); + rte->relname = RelationGetRelationName(rel); #endif /* @@ -991,6 +992,7 @@ addRangeTableEntryForRelation(ParseState *pstate, #ifdef PGXC rte->reltupdesc = CreateTupleDescCopyConstr(rel->rd_att); + rte->relname = RelationGetRelationName(rel); #endif /* diff --git a/src/backend/pgxc/pool/execRemote.c b/src/backend/pgxc/pool/execRemote.c index c493eb3..7fe08be 100644 --- a/src/backend/pgxc/pool/execRemote.c +++ b/src/backend/pgxc/pool/execRemote.c @@ -2388,20 +2388,6 @@ ExecInitRemoteQuery(RemoteQuery *node, EState *estate, int eflags) ExecInitScanTupleSlot(estate, &remotestate->ss); - /* - * Initialize scan relation. get the relation object id from the - * relid'th entry in the range table, open that relation and acquire - * appropriate lock on it. - * This is needed for deparseSQL - * We should remove these lines once we plan and deparse earlier. - */ - if (!node->is_single_step) - { - currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid); - remotestate->ss.ss_currentRelation = currentRelation; - ExecAssignScanType(&remotestate->ss, RelationGetDescr(currentRelation)); - } - remotestate->ss.ps.ps_TupFromTlist = false; /* diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 3e8077f..684396c 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -687,6 +687,16 @@ static struct config_bool ConfigureNamesBool[] = &enable_hashjoin, true, NULL, NULL }, +#ifdef PGXC + { + {"enable_remotejoin", PGC_USERSET, QUERY_TUNING_METHOD, + gettext_noop("Enables the planner's use of remote join plans."), + NULL + }, + &enable_remotejoin, + true, NULL, NULL + }, +#endif { {"geqo", PGC_USERSET, QUERY_TUNING_GEQO, gettext_noop("Enables genetic query optimization."), diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index 9423175..4d90052 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -663,6 +663,7 @@ typedef struct RangeTblEntry */ #ifdef PGXC + char *relname; TupleDesc reltupdesc; #endif diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h index 581ce0a..d537855 100644 --- a/src/include/nodes/relation.h +++ b/src/include/nodes/relation.h @@ -189,6 +189,11 @@ typedef struct PlannerInfo * pseudoconstant = true */ bool hasRecursion; /* true if planning a recursive WITH item */ +#ifdef PGXC + /* This field is used only when RemoteScan nodes are involved */ + int rs_alias_index; /* used to build the alias reference */ +#endif + /* These fields are used only when hasRecursion is true: */ int wt_param_id; /* PARAM_EXEC ID for the work table */ struct Plan *non_recursive_plan; /* plan for non-recursive term */ diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h index 29aed38..876542a 100644 --- a/src/include/optimizer/cost.h +++ b/src/include/optimizer/cost.h @@ -59,6 +59,9 @@ extern bool enable_hashagg; extern bool enable_nestloop; extern bool enable_mergejoin; extern bool enable_hashjoin; +#ifdef PGXC +extern bool enable_remotejoin; +#endif extern int constraint_exclusion; extern double clamp_row_est(double nrows); diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h index 0dd2bcc..c1d191e 100644 --- a/src/include/optimizer/planmain.h +++ b/src/include/optimizer/planmain.h @@ -119,4 +119,7 @@ extern void extract_query_dependencies(List *queries, List **relationOids, List **invalItems); +#ifdef PGXC +extern Var *search_tlist_for_var(Var *var, List *jtlist); +#endif #endif /* PLANMAIN_H */ diff --git a/src/include/pgxc/planner.h b/src/include/pgxc/planner.h index 1e31fa3..ef00f27 100644 --- a/src/include/pgxc/planner.h +++ b/src/include/pgxc/planner.h @@ -23,6 +23,7 @@ #include "nodes/primnodes.h" #include "pgxc/locator.h" #include "tcop/dest.h" +#include "nodes/relation.h" typedef enum @@ -85,6 +86,17 @@ typedef struct bool read_only; /* do not use 2PC when committing read only steps */ bool force_autocommit; /* some commands like VACUUM require autocommit mode */ RemoteQueryExecType exec_type; + + char *relname; + bool remotejoin; /* True if this is a reduced remote join */ + int reduce_level; /* in case of reduced JOIN, it's level */ + List *base_tlist; /* in case of isReduced, the base tlist */ + char *outer_alias; + char *inner_alias; + int outer_reduce_level; + int inner_reduce_level; + Relids outer_relids; + Relids inner_relids; } RemoteQuery; commit aefc06e7bd90c657fb093a923f7b66177687561d Author: Pavan Deolasee <pav...@gm...> Date: Mon Oct 18 11:29:21 2010 +0530 First step to SQL-med integration. Moving query generation to planning stage diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index c58e2a0..134b9e1 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -1836,6 +1836,12 @@ _copyRangeTblEntry(RangeTblEntry *from) RangeTblEntry *newnode = makeNode(RangeTblEntry); COPY_SCALAR_FIELD(rtekind); + +#ifdef PGXC + if (from->reltupdesc) + newnode->reltupdesc = CreateTupleDescCopy(from->reltupdesc); +#endif + COPY_SCALAR_FIELD(relid); COPY_NODE_FIELD(subquery); COPY_SCALAR_FIELD(jointype); diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index 1c8691a..be80f18 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -2015,6 +2015,10 @@ _outSetOperationStmt(StringInfo str, SetOperationStmt *node) static void _outRangeTblEntry(StringInfo str, RangeTblEntry *node) { +#ifdef PGXC + int i; +#endif + WRITE_NODE_TYPE("RTE"); /* put alias + eref first to make dump more legible */ @@ -2025,6 +2029,22 @@ _outRangeTblEntry(StringInfo str, RangeTblEntry *node) switch (node->rtekind) { case RTE_RELATION: +#ifdef PGXC + /* write tuple descriptor */ + appendStringInfo(str, " :tupdesc_natts %d (", node->reltupdesc->natts); + + for (i = 0 ; i < node->reltupdesc->natts ; i++) + { + appendStringInfo(str, ":colname "); + _outToken(str, NameStr(node->reltupdesc->attrs[i]->attname)); + appendStringInfo(str, " :coltypid %u ", + node->reltupdesc->attrs[i]->atttypid); + appendStringInfo(str, ":coltypmod %d ", + node->reltupdesc->attrs[i]->atttypmod); + } + + appendStringInfo(str, ") "); + #endif case RTE_SPECIAL: WRITE_OID_FIELD(relid); break; diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c index 1f562d7..72a2156 100644 --- a/src/backend/nodes/readfuncs.c +++ b/src/backend/nodes/readfuncs.c @@ -31,7 +31,9 @@ #include "nodes/parsenodes.h" #include "nodes/readfuncs.h" - +#ifdef PGXC +#include "access/htup.h" +#endif /* * Macros to simplify reading of different kinds of fields. Use these @@ -1104,6 +1106,12 @@ _readFromExpr(void) static RangeTblEntry * _readRangeTblEntry(void) { +#ifdef PGXC + int natts, i; + char *colname; + Oid typid, typmod; +#endif + READ_LOCALS(RangeTblEntry); /* put alias + eref first to make dump more legible */ @@ -1114,6 +1122,52 @@ _readRangeTblEntry(void) switch (local_node->rtekind) { case RTE_RELATION: +#ifdef PGXC + /* read tuple descriptor */ + token = pg_strtok(&length); /* skip :tupdesc_natts */ + token = pg_strtok(&length); /* get field value */ + + natts = atoi(token); + + if (natts > 0 && natts <= MaxTupleAttributeNumber) + local_node->reltupdesc = CreateTemplateTupleDesc(natts, false); + else + elog(ERROR, "invalid node field to read"); + + token = pg_strtok(&length); /* skip '(' */ + + if (length == 1 && pg_strncasecmp(token, "(", length) == 0) + { + for (i = 0 ; i < natts ; i++) + { + token = pg_strtok(&length); /* skip :colname */ + token = pg_strtok(&length); /* get colname */ + colname = nullable_string(token, length); + + if (colname == NULL) + elog(ERROR, "invalid node field to read"); + + token = pg_strtok(&length); /* skip :coltypid */ + token = pg_strtok(&length); /* get typid */ + typid = atooid(token); + + token = pg_strtok(&length); /* skip :coltypmod */ + token = pg_strtok(&length); /* get typmod */ + typmod = atoi(token); + + TupleDescInitEntry(local_node->reltupdesc, + (i + 1), colname, typid, typmod, 0); + } + } + else + elog(ERROR, "invalid node field to read"); + + token = pg_strtok(&length); /* skip '(' */ + + if (!(length == 1 && pg_strncasecmp(token, ")", length) == 0)) + elog(ERROR, "invalid node field to read"); +#endif + case RTE_SPECIAL: READ_OID_FIELD(relid); break; diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index 818ea1b..4f3a7c6 100644 --- a/src/backend/optimizer/plan/createplan.c +++ b/src/backend/optimizer/plan/createplan.c @@ -34,6 +34,10 @@ #include "parser/parsetree.h" #ifdef PGXC #include "pgxc/planner.h" +#include "access/sysattr.h" +#include "utils/builtins.h" +#include "utils/syscache.h" +#include "catalog/pg_proc.h" #endif #include "utils/lsyscache.h" @@ -141,6 +145,9 @@ static Sort *make_sort(PlannerInfo *root, Plan *lefttree, int numCols, double limit_tuples); static Material *make_material(Plan *lefttree); +#ifdef PGXC +extern bool is_foreign_qual(Node *clause); +#endif /* * create_plan @@ -445,9 +452,6 @@ disuse_physical_tlist(Plan *plan, Path *path) case T_ValuesScan: case T_CteScan: case T_WorkTableScan: -#ifdef PGXC - case T_RemoteQuery: -#endif plan->targetlist = build_relation_tlist(path->parent); break; default: @@ -1583,9 +1587,23 @@ create_remotequery_plan(PlannerInfo *root, Path *best_path, List *tlist, List *scan_clauses) { RemoteQuery *scan_plan; + bool prefix; Index scan_relid = best_path->parent->relid; RangeTblEntry *rte; - + char *wherestr = NULL; + Bitmapset *varattnos = NULL; + List *remote_scan_clauses = NIL; + List *local_scan_clauses = NIL; + Oid nspid; + char *nspname; + char *relname; + const char *nspname_q; + const char *relname_q; + const char *aliasname_q; + int i; + TupleDesc tupdesc; + bool first; + StringInfoData sql; Assert(scan_relid > 0); rte = planner_rt_fetch(scan_relid, root); @@ -1598,16 +1616,159 @@ create_remotequery_plan(PlannerInfo *root, Path *best_path, /* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */ scan_clauses = extract_actual_clauses(scan_clauses, false); + if (scan_clauses) + { + ListCell *l; + + foreach(l, (List *)scan_clauses) + { + Node *clause = lfirst(l); + + if (is_foreign_qual(clause)) + remote_scan_clauses = lappend(remote_scan_clauses, clause); + else + local_scan_clauses = lappend(local_scan_clauses, clause); + } + } + + /* + * Incorporate any remote_scan_clauses into the WHERE clause that + * we intend to push to the remote server. + */ + if (remote_scan_clauses) + { + char *sep = ""; + ListCell *l; + StringInfoData buf; + List *deparse_context; + + initStringInfo(&buf); + + deparse_context = deparse_context_for_remotequery( + get_rel_name(rte->relid), rte->relid); + + /* + * remote_scan_clauses is a list of scan clauses (restrictions) that we + * can push to the remote server. We want to deparse each of those + * expressions (that is, each member of the List) and AND them together + * into a WHERE clause. + */ + + foreach(l, (List *)remote_scan_clauses) + { + Node *clause = lfirst(l); + + appendStringInfo(&buf, "%s", sep ); + appendStringInfo(&buf, "%s", deparse_expression(clause, deparse_context, false, false)); + sep = " AND "; + } + + wherestr = buf.data; + } + + /* + * Now walk through the target list and the scan clauses to get the + * interesting attributes. Only those attributes will be fetched from the + * remote side. + */ + varattnos = pull_varattnos_varno((Node *) best_path->parent->reltargetlist, best_path->parent->relid, + varattnos); + varattnos = pull_varattnos_varno((Node *) local_scan_clauses, + best_path->parent->relid, varattnos); + /* + * Scanning multiple relations in a RemoteQuery node is not supported. + */ + prefix = false; +#if 0 + prefix = list_length(estate->es_range_table) > 1; +#endif + + /* Get quoted names of schema, table and alias */ + nspid = get_rel_namespace(rte->relid); + nspname = get_namespace_name(nspid); + relname = get_rel_name(rte->relid); + nspname_q = quote_identifier(nspname); + relname_q = quote_identifier(relname); + aliasname_q = quote_identifier(rte->eref->aliasname); + + initStringInfo(&sql); + + /* deparse SELECT clause */ + appendStringInfo(&sql, "SELECT "); + + /* + * TODO: omit (deparse to "NULL") columns which are not used in the + * original SQL. + * + * We must parse nodes parents of this RemoteQuery node to determine unused + * columns because some columns may be used only in parent Sort/Agg/Limit + * nodes. + */ + tupdesc = best_path->parent->reltupdesc; + first = true; + for (i = 0; i < tupdesc->natts; i++) + { + /* skip dropped attributes */ + if (tupdesc->attrs[i]->attisdropped) + continue; + + if (!first) + appendStringInfoString(&sql, ", "); + + if (bms_is_member(i + 1 - FirstLowInvalidHeapAttributeNumber, varattnos)) + { + if (prefix) + appendStringInfo(&sql, "%s.%s", + aliasname_q, tupdesc->attrs[i]->attname.data); + else + appendStringInfo(&sql, "%s", tupdesc->attrs[i]->attname.data); + } + else + appendStringInfo(&sql, "%s", "NULL"); + first = false; + } + + /* if target list is composed only of system attributes, add dummy column */ + if (first) + appendStringInfo(&sql, "NULL"); + + /* deparse FROM clause */ + appendStringInfo(&sql, " FROM "); + /* + * XXX: should use GENERIC OPTIONS like 'foreign_relname' or something for + * the foreign table name instead of the local name ? + */ + appendStringInfo(&sql, "%s.%s %s", nspname_q, relname_q, aliasname_q); + pfree(nspname); + pfree(relname); + if (nspname_q != nspname_q) + pfree((char *) nspname_q); + if (relname_q != relname_q) + pfree((char *) relname_q); + if (aliasname_q != rte->eref->aliasname) + pfree((char *) aliasname_q); + + if (wherestr) + { + appendStringInfo(&sql, " WHERE "); + appendStringInfo(&sql, "%s", wherestr); + pfree(wherestr); + } + + bms_free(varattnos); + scan_plan = make_remotequery(tlist, rte, - scan_clauses, + local_scan_clauses, scan_relid); + scan_plan->sql_statement = sql.data; + copy_path_costsize(&scan_plan->scan.plan, best_path); /* PGXCTODO - get better estimates */ scan_plan->scan.plan.plan_rows = 1000; - + return scan_plan; } #endif diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c index 1d93203..957a515 100644 --- a/src/backend/optimizer/util/relnode.c +++ b/src/backend/optimizer/util/relnode.c @@ -92,6 +92,10 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptKind reloptkind) rel->index_outer_relids = NULL; rel->index_inner_paths = NIL; +#ifdef PGXC + rel->reltupdesc = rte->reltupdesc; +#endif + /* Check type of rtable entry */ switch (rte->rtekind) { diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c index 1a6826f..a574278 100644 --- a/src/backend/optimizer/util/var.c +++ b/src/backend/optimizer/util/var.c @@ -34,6 +34,14 @@ typedef struct int sublevels_up; } pull_varnos_context; +#ifdef PGXC +typedef struct +{ + Index varno; + Bitmapset *varattnos; +} pull_varattnos_context; +#endif + typedef struct { int var_location; @@ -68,6 +76,10 @@ typedef struct static bool pull_varnos_walker(Node *node, pull_varnos_context *context); static bool pull_varattnos_walker(Node *node, Bitmapset **varattnos); +#ifdef PGXC +static bool pull_varattnos_varno_walker(Node *node, + pull_varattnos_context *context); +#endif static bool contain_var_clause_walker(Node *node, void *context); static bool contain_vars_of_level_walker(Node *node, int *sublevels_up); static bool locate_var_of_level_walker(Node *node, @@ -228,6 +240,54 @@ contain_var_clause(Node *node) return contain_var_clause_walker(node, NULL); } +#ifdef PGXC +/* + * pull_varattnos_varno + * Find all the distinct attribute numbers present in an expression tree, + * and add them to the initial contents of *varattnos. + * + * Attribute numbers are offset by FirstLowInvalidHeapAttributeNumber so that + * we can include system attributes (e.g., OID) in the bitmap representation. + * + * This is same as pull_varattnos except for the fact that it gets attributes + * for the given varno + */ +Bitmapset * +pull_varattnos_varno(Node *node, Index varno, Bitmapset *varattnos) +{ + pull_varattnos_context context; + + context.varno = varno; + context.varattnos = varattnos; + + (void) pull_varattnos_varno_walker(node, &context); + + return context.varattnos; +} + +static bool +pull_varattnos_varno_walker(Node *node, pull_varattnos_context *context) +{ + if (node == NULL) + return false; + + Assert(context != NULL); + + if (IsA(node, Var)) + { + Var *var = (Var *) node; + + if (var->varno == context->varno) + context->varattnos = bms_add_member(context->varattnos, + var->varattno - FirstLowInvalidHeapAttributeNumber); + return false; + } + + return expression_tree_walker(node, pull_varattnos_varno_walker, + (void *) context); +} +#endif + static bool contain_var_clause_walker(Node *node, void *context) { diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c index 5a42451..d63e504 100644 --- a/src/backend/parser/parse_relation.c +++ b/src/backend/parser/parse_relation.c @@ -923,6 +923,10 @@ addRangeTableEntry(ParseState *pstate, rel = parserOpenTable(pstate, relation, lockmode); rte->relid = RelationGetRelid(rel); +#ifdef PGXC + rte->reltupdesc = CreateTupleDescCopyConstr(rel->rd_att); +#endif + /* * Build the list of effective column names using user-supplied aliases * and/or actual column names. @@ -985,6 +989,10 @@ addRangeTableEntryForRelation(ParseState *pstate, rte->alias = alias; rte->relid = RelationGetRelid(rel); +#ifdef PGXC + rte->reltupdesc = CreateTupleDescCopyConstr(rel->rd_att); +#endif + /* * Build the list of effective column names using user-supplied aliases * and/or actual column names. diff --git a/src/backend/pgxc/pool/execRemote.c b/src/backend/pgxc/pool/execRemote.c index 14dce33..c493eb3 100644 --- a/src/backend/pgxc/pool/execRemote.c +++ b/src/backend/pgxc/pool/execRemote.c @@ -2723,11 +2723,6 @@ ExecRemoteQuery(RemoteQueryState *node) errmsg("Could not begin transaction on data nodes."))); } - /* Get the SQL string */ - /* only do if not single step */ - if (!step->is_single_step) - step->sql_statement = deparseSql(node); - /* See if we have a primary node, execute on it first before the others */ if (primaryconnection) { diff --git a/src/backend/pgxc/pool/postgresql_fdw.c b/src/backend/pgxc/pool/postgresql_fdw.c index dabf5da..14c0ddb 100644 --- a/src/backend/pgxc/pool/postgresql_fdw.c +++ b/src/backend/pgxc/pool/postgresql_fdw.c @@ -45,7 +45,7 @@ /* deparse SQL from the request */ bool is_immutable_func(Oid funcid); -static bool is_foreign_qual(ExprState *state); +bool is_foreign_qual(Node *node); static bool foreign_qual_walker(Node *node, void *context); char *deparseSql(RemoteQueryState *scanstate); @@ -103,10 +103,10 @@ is_immutable_func(Oid funcid) * local server in the foreign server. * - scalar array operator (ANY/ALL) */ -static bool -is_foreign_qual(ExprState *state) +bool +is_foreign_qual(Node *node) { - return !foreign_qual_walker((Node *) state->expr, NULL); + return !foreign_qual_walker(node, NULL); } /* @@ -120,6 +120,9 @@ foreign_qual_walker(Node *node, void *context) switch (nodeTag(node)) { + case T_ExprState: + return foreign_qual_walker((Node *) ((ExprState *) node)->expr, NULL); + case T_Param: /* TODO: pass internal parameters to the foreign server */ if (((Param *) node)->paramkind != PARAM_EXTERN) @@ -286,7 +289,7 @@ elog(DEBUG2, "%s(%u) called", __FUNCTION__, __LINE__); { ExprState *state = lfirst(lc); - if (is_foreign_qual(state)) + if (is_foreign_qual((Node *) state)) { elog(DEBUG1, "foreign qual: %s", nodeToString(state->expr)); foreign_qual = lappend(foreign_qual, state); @@ -317,7 +320,7 @@ elog(DEBUG2, "%s(%u) called", __FUNCTION__, __LINE__); Node *node; node = (Node *) make_ands_explicit(foreign_expr); appendStringInfo(&sql, " WHERE "); - appendStringInfo(&sql, + appendStringInfo(&sql, "%s", deparse_expression(node, context, prefix, false)); /* * The contents of the list MUST NOT be free-ed because they are diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c index c930701..130dff3 100644 --- a/src/backend/utils/adt/ruleutils.c +++ b/src/backend/utils/adt/ruleutils.c @@ -114,6 +114,8 @@ typedef struct List *subplans; /* List of subplans, in plan-tree case */ Plan *outer_plan; /* OUTER subplan, or NULL if none */ Plan *inner_plan; /* INNER subplan, or NULL if none */ + + bool remotequery; /* deparse context for remote query */ } deparse_namespace; @@ -1936,10 +1938,42 @@ deparse_context_for(const char *aliasname, Oid relid) dpns->ctes = NIL; dpns->subplans = NIL; dpns->outer_plan = dpns->inner_plan = NULL; +#ifdef PGXC + dpns->remotequery = false; +#endif + + /* Return a one-deep namespace stack */ + return list_make1(dpns); +} + +#ifdef PGXC +List * +deparse_context_for_remotequery(const char *aliasname, Oid relid) +{ + deparse_namespace *dpns; + RangeTblEntry *rte; + + dpns = (deparse_namespace *) palloc(sizeof(deparse_namespace)); + + /* Build a minimal RTE for the rel */ + rte = makeNode(RangeTblEntry); + rte->rtekind = RTE_RELATION; + rte->relid = relid; + rte->eref = makeAlias(aliasname, NIL); + rte->inh = false; + rte->inFromCl = true; + + /* Build one-element rtable */ + dpns->rtable = list_make1(rte); + dpns->ctes = NIL; + dpns->subplans = NIL; + dpns->outer_plan = dpns->inner_plan = NULL; + dpns->remotequery = true; /* Return a one-deep namespace stack */ return list_make1(dpns); } +#endif /* * deparse_context_for_plan - Build deparse context for a plan node @@ -1974,7 +2008,9 @@ deparse_context_for_plan(Node *plan, Node *outer_plan, dpns->rtable = rtable; dpns->ctes = NIL; dpns->subplans = subplans; - +#ifdef PGXC + dpns->remotequery = false; +#endif /* * Set up outer_plan and inner_plan from the Plan node (this includes * various special cases for particular Plan types). @@ -2138,7 +2174,9 @@ make_ruledef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc, dpns.ctes = query->cteList; dpns.subplans = NIL; dpns.outer_plan = dpns.inner_plan = NULL; - +#ifdef PGXC + dpns.remotequery = false; +#endif get_rule_expr(qual, &context, false); } @@ -2285,7 +2323,9 @@ get_query_def(Query *query, StringInfo buf, List *parentnamespace, dpns.ctes = query->cteList; dpns.subplans = NIL; dpns.outer_plan = dpns.inner_plan = NULL; - +#ifdef PGXC + dpns.remotequery = false; +#endif switch (query->commandType) { case CMD_SELECT: @@ -3379,6 +3419,14 @@ get_variable(Var *var, int levelsup, bool showstar, deparse_context *context) * likely that varno is OUTER or INNER, in which case we must dig down * into the subplans. */ +#ifdef PGXC + if (dpns->remotequery) + { + rte = rt_fetch(1, dpns->rtable); + attnum = var->varattno; + } + else +#endif if (var->varno >= 1 && var->varno <= list_length(dpns->rtable)) { rte = rt_fetch(var->varno, dpns->rtable); @@ -3705,6 +3753,9 @@ get_name_for_var_field(Var *var, int fieldno, mydpns.ctes = rte->subquery->cteList; mydpns.subplans = NIL; mydpns.outer_plan = mydpns.inner_plan = NULL; +#ifdef PGXC + mydpns.remotequery = false; +#endif context->namespaces = lcons(&mydpns, context->namespaces); @@ -3828,7 +3879,9 @@ get_name_for_var_field(Var *var, int fieldno, mydpns.ctes = ctequery->cteList; mydpns.subplans = NIL; mydpns.outer_plan = mydpns.inner_plan = NULL; - +#ifdef PGXC + mydpns.remotequery = false; +#endif new_nslist = list_copy_tail(context->namespaces, ctelevelsup); context->namespaces = lcons(&mydpns, new_nslist); diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index 5fb2a2b..9423175 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -24,6 +24,9 @@ #include "nodes/bitmapset.h" #include "nodes/primnodes.h" #include "nodes/value.h" +#ifdef PGXC +#include "access/tupdesc.h" +#endif /* Possible sources of a Query */ typedef enum QuerySource @@ -659,6 +662,10 @@ typedef struct RangeTblEntry * code that is being actively worked on. FIXME someday. */ +#ifdef PGXC + TupleDesc reltupdesc; +#endif + /* * Fields valid for a plain relation RTE (else zero): */ diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h index ea48889..581ce0a 100644 --- a/src/include/nodes/relation.h +++ b/src/include/nodes/relation.h @@ -377,6 +377,10 @@ typedef struct RelOptInfo * clauses */ List *index_inner_paths; /* InnerIndexscanInfo nodes */ +#ifdef PGXC + TupleDesc reltupdesc; +#endif + /* * Inner indexscans are not in the main pathlist because they are not * usable except in specific join contexts. We use the index_inner_paths diff --git a/src/include/optimizer/var.h b/src/include/optimizer/var.h index 08e885b..966e827 100644 --- a/src/include/optimizer/var.h +++ b/src/include/optimizer/var.h @@ -25,6 +25,9 @@ typedef enum extern Relids pull_varnos(Node *node); extern void pull_varattnos(Node *node, Bitmapset **varattnos); +#ifdef PGXC +extern Bitmapset * pull_varattnos_varno(Node *node, Index varno, Bitmapset *varattnos); +#endif extern bool contain_var_clause(Node *node); extern bool contain_vars_of_level(Node *node, int levelsup); extern int locate_var_of_level(Node *node, int levelsup); diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h index 50b9ab2..85384b5 100644 --- a/src/include/utils/builtins.h +++ b/src/include/utils/builtins.h @@ -595,7 +595,10 @@ extern Datum pg_get_function_identity_arguments(PG_FUNCTION_ARGS); extern Datum pg_get_function_result(PG_FUNCTION_ARGS); extern char *deparse_expression(Node *expr, List *dpcontext, bool forceprefix, bool showimplicit); +extern List *deparse_context_for_remotequery(const char *aliasname, Oid relid); +#ifdef PGXC extern List *deparse_context_for(const char *aliasname, Oid relid); +#endif extern List *deparse_context_for_plan(Node *plan, Node *outer_plan, List *rtable, List *subplans); extern const char *quote_identifier(const char *ident); ----------------------------------------------------------------------- hooks/post-receive -- Postgres-XC |
From: Michael P. <mic...@us...> - 2010-10-13 05:46:42
|
Project "Postgres-XC". The branch, master has been updated via ca4fb6103add2b4560b8efe142f24d94ed03d56e (commit) from 52af07a890baeb608b5ea59211eb4a080511e8c7 (commit) - Log ----------------------------------------------------------------- commit ca4fb6103add2b4560b8efe142f24d94ed03d56e Author: Michael P <mic...@us...> Date: Wed Oct 13 14:41:13 2010 +0900 After a Commit of prepared transaction on GTM, Connection from PGXC Node to GTM was always reinitialized even if process went correctly on GTM. Now if Commit Prepared at GTM runs without error, connection is not reinitialized. Bug found by Benny Mei Le diff --git a/src/gtm/client/gtm_client.c b/src/gtm/client/gtm_client.c index 984aee1..53ab3f3 100644 --- a/src/gtm/client/gtm_client.c +++ b/src/gtm/client/gtm_client.c @@ -215,6 +215,8 @@ commit_prepared_transaction(GTM_Conn *conn, GlobalTransactionId gxid, GlobalTran Assert(res->gr_resdata.grd_gxid == gxid); } + return res->gr_status; + send_failed: receive_failed: return -1; ----------------------------------------------------------------------- Summary of changes: src/gtm/client/gtm_client.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) hooks/post-receive -- Postgres-XC |
From: mason_s <ma...@us...> - 2010-10-13 03:03:38
|
Project "Postgres-XC". The branch, master has been updated via 52af07a890baeb608b5ea59211eb4a080511e8c7 (commit) from 88162bcb5cb3dabf8cef3717ad1837182fb5f5dc (commit) - Log ----------------------------------------------------------------- commit 52af07a890baeb608b5ea59211eb4a080511e8c7 Author: Mason Sharp <ma...@us...> Date: Tue Oct 12 23:00:12 2010 -0400 Fix bug with pooler. Make sure socket removed when signal to stop is trapped. diff --git a/src/backend/pgxc/pool/poolcomm.c b/src/backend/pgxc/pool/poolcomm.c index 7e4771c..853b385 100644 --- a/src/backend/pgxc/pool/poolcomm.c +++ b/src/backend/pgxc/pool/poolcomm.c @@ -5,7 +5,7 @@ * Communication functions between the pool manager and session * * - * Portions Copyright (c) 1996-2009, PostgreSQL Global Development Group + * Portions Copyright (c) 1996-2009, PostgreSQL Global Development Group * Portions Copyright (c) 2010 Nippon Telegraph and Telephone Corporation * *------------------------------------------------------------------------- @@ -41,6 +41,8 @@ static int pool_discardbytes(PoolPort *port, size_t len); static char sock_path[MAXPGPATH]; +static void StreamDoUnlink(int code, Datum arg); + static int Lock_AF_UNIX(unsigned short port, const char *unixSocketName); #endif @@ -77,6 +79,9 @@ pool_listen(unsigned short port, const char *unixSocketName) if (listen(fd, 5) < 0) return -1; + /* Arrange to unlink the socket file at exit */ + on_proc_exit(StreamDoUnlink, 0); + return fd; #else /* TODO support for non-unix platform */ @@ -87,6 +92,19 @@ pool_listen(unsigned short port, const char *unixSocketName) #endif } +/* StreamDoUnlink() + * Shutdown routine for pooler connection + * If a Unix socket is used for communication, explicitly close it. + */ +#ifdef HAVE_UNIX_SOCKETS +static void +StreamDoUnlink(int code, Datum arg) +{ + Assert(sock_path[0]); + unlink(sock_path); +} +#endif /* HAVE_UNIX_SOCKETS */ + #ifdef HAVE_UNIX_SOCKETS static int Lock_AF_UNIX(unsigned short port, const char *unixSocketName) @@ -411,8 +429,8 @@ pool_flush(PoolPort *port) { last_reported_send_errno = errno; - /* - * Handle a seg fault that may later occur in proc array + /* + * Handle a seg fault that may later occur in proc array * when this fails when we are already shutting down * If shutting down already, do not call. */ ----------------------------------------------------------------------- Summary of changes: src/backend/pgxc/pool/poolcomm.c | 24 +++++++++++++++++++++--- 1 files changed, 21 insertions(+), 3 deletions(-) hooks/post-receive -- Postgres-XC |
From: Michael P. <mic...@us...> - 2010-10-13 01:48:50
|
Project "Postgres-XC". The branch, master has been updated via 88162bcb5cb3dabf8cef3717ad1837182fb5f5dc (commit) from ea13b66f4beaeb13db9741fb5a1347f976b9ebab (commit) - Log ----------------------------------------------------------------- commit 88162bcb5cb3dabf8cef3717ad1837182fb5f5dc Author: Michael P <mic...@us...> Date: Wed Oct 13 10:45:16 2010 +0900 Added support for two new pieces of functionality. 1) Support for DDL and utility command synchronisation among Coordinators. DDL is now synchronized amongst multiple coordinators. Previously, after DDL it was required to use an extra utility to resync the nodes and restart other Coordinators. This is no longer necessary. DDL support works also with common BEGIN, COMMIT and ROLLBACK instructions in the cluster. DDL may be initiated at any node. Each Coordinator can connect to any other one. Just as Coordinators use pools for connecting to Data Nodes, Coordinators now use pools for connecting to the other Coordinators. 2) Support for PREPARE TRANSACTION and COMMIT TRANSACTION, ROLLBACK PREPARED. When a transaction is prepared or committed, based on the SQL, it will only execute on the involved nodes, including DDL on Coordinators. GTM is used track which xid and nodes are involved in the transaction, identified by the user or application specified transaction identifier, when it is prepared. New GUCs -------- There are some new GUCs for handling Coordinator communication num_coordinators coordinator_hosts coordinator_ports coordinator_users coordinator_passwords In addition, a new GUC replaces coordinator_id: pgxc_node_id Open Issues ----------- Implicit two phase commit (client in autocommit mode, but distributed transaction required because of multiple nodes) does not first prepare on the originating coordinator before committing, if DDL is involved. We really should prepare here before committing on all nodes. We also need to add a bit of special handling for COMMIT PREPARED. If there is an error, and it got committed on some nodes, we still should force it to be committed on the originating coordinator, if involved, and still return an error of some sort that it was partially committed. (When the downed node recovers, in the future it will determine if any other node has committed the transaction, and if so, it, too, must commit.) It is a pretty rare case, but we should handle it. With this current configuration, DDL will fail if at least one Coordinator is down. In the future, we will make this more flexible. Written by Michael Paquier diff --git a/src/backend/access/transam/gtm.c b/src/backend/access/transam/gtm.c index 08ed2c9..64437e7 100644 --- a/src/backend/access/transam/gtm.c +++ b/src/backend/access/transam/gtm.c @@ -20,7 +20,7 @@ /* Configuration variables */ char *GtmHost = "localhost"; int GtmPort = 6666; -int GtmCoordinatorId = 1; +int PGXCNodeId = 1; extern bool FirstSnapshotSet; @@ -42,7 +42,7 @@ InitGTM() /* 256 bytes should be enough */ char conn_str[256]; - sprintf(conn_str, "host=%s port=%d coordinator_id=%d", GtmHost, GtmPort, GtmCoordinatorId); + sprintf(conn_str, "host=%s port=%d coordinator_id=%d", GtmHost, GtmPort, PGXCNodeId); conn = PQconnectGTM(conn_str); if (GTMPQstatus(conn) != CONNECTION_OK) @@ -187,7 +187,7 @@ RollbackTranGTM(GlobalTransactionId gxid) } int -BeingPreparedTranGTM(GlobalTransactionId gxid, +StartPreparedTranGTM(GlobalTransactionId gxid, char *gid, int datanodecnt, PGXC_NodeId datanodes[], @@ -200,7 +200,7 @@ BeingPreparedTranGTM(GlobalTransactionId gxid, return 0; CheckConnection(); - ret = being_prepared_transaction(conn, gxid, gid, datanodecnt, datanodes, coordcnt, coordinators); + ret = start_prepared_transaction(conn, gxid, gid, datanodecnt, datanodes, coordcnt, coordinators); /* * If something went wrong (timeout), try and reset GTM connection. diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index d881078..97f6c76 100644 --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -892,8 +892,13 @@ StartPrepare(GlobalTransaction gxact) * * Calculates CRC and writes state file to WAL and in pg_twophase directory. */ +#ifdef PGXC +void +EndPrepare(GlobalTransaction gxact, bool write_2pc_file) +#else void EndPrepare(GlobalTransaction gxact) +#endif { TransactionId xid = gxact->proc.xid; TwoPhaseFileHeader *hdr; @@ -929,9 +934,10 @@ EndPrepare(GlobalTransaction gxact) * critical section, though, it doesn't matter since any failure causes * PANIC anyway. */ + #ifdef PGXC - /* Do not write 2PC state file on Coordinator side */ - if (IS_PGXC_DATANODE) + /* Write 2PC state file on Coordinator side if a DDL is involved in transaction */ + if (write_2pc_file) { #endif TwoPhaseFilePath(path, xid); @@ -1009,6 +1015,7 @@ EndPrepare(GlobalTransaction gxact) #ifdef PGXC } #endif + START_CRIT_SECTION(); MyProc->inCommit = true; @@ -1020,8 +1027,11 @@ EndPrepare(GlobalTransaction gxact) /* If we crash now, we have prepared: WAL replay will fix things */ #ifdef PGXC - /* Just write 2PC state file on Datanodes */ - if (IS_PGXC_DATANODE) + /* + * Just write 2PC state file on Datanodes + * or on Coordinators if DDL queries are involved. + */ + if (write_2pc_file) { #endif @@ -1038,6 +1048,7 @@ EndPrepare(GlobalTransaction gxact) ereport(ERROR, (errcode_for_file_access(), errmsg("could not close two-phase state file: %m"))); + #ifdef PGXC } #endif @@ -1893,15 +1904,16 @@ RecordTransactionAbortPrepared(TransactionId xid, END_CRIT_SECTION(); } + #ifdef PGXC /* * Remove a gxact on a Coordinator, * this is used to be able to prepare a commit transaction on another coordinator than the one - * who prepared the transaction + * who prepared the transaction, for a transaction that does not include DDLs */ void RemoveGXactCoord(GlobalTransaction gxact) { - RemoveGXact(gxact); + RemoveGXact(gxact); } #endif diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c index 5176e85..03e6d90 100644 --- a/src/backend/access/transam/varsup.c +++ b/src/backend/access/transam/varsup.c @@ -22,7 +22,7 @@ #include "storage/pmsignal.h" #include "storage/proc.h" #include "utils/builtins.h" -#ifdef PGXC +#ifdef PGXC #include "pgxc/pgxc.h" #include "access/gtm.h" #endif @@ -99,25 +99,27 @@ GetNewTransactionId(bool isSubXact) return BootstrapTransactionId; } -#ifdef PGXC - if (IS_PGXC_COORDINATOR) +#ifdef PGXC + if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) { - /* Get XID from GTM before acquiring the lock. + /* + * Get XID from GTM before acquiring the lock. * The rest of the code will handle it if after obtaining XIDs, * the lock is acquired in a different order. * This will help with GTM connection issues- we will not * block all other processes. + * GXID can just be obtained from a remote Coordinator */ xid = (TransactionId) BeginTranGTM(timestamp); - *timestamp_received = true; + *timestamp_received = true; } - #endif LWLockAcquire(XidGenLock, LW_EXCLUSIVE); -#ifdef PGXC - if (IS_PGXC_COORDINATOR) +#ifdef PGXC + /* Only remote Coordinator can go a GXID */ + if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) { if (TransactionIdIsValid(xid)) { @@ -140,7 +142,8 @@ GetNewTransactionId(bool isSubXact) LWLockRelease(XidGenLock); return xid; } - } else if(IS_PGXC_DATANODE) + } + else if(IS_PGXC_DATANODE || IsConnFromCoord()) { if (IsAutoVacuumWorkerProcess()) { @@ -159,7 +162,8 @@ GetNewTransactionId(bool isSubXact) /* try and get gxid directly from GTM */ next_xid = (TransactionId) BeginTranGTM(NULL); } - } else if (GetForceXidFromGTM()) + } + else if (GetForceXidFromGTM()) { elog (DEBUG1, "Force get XID from GTM"); /* try and get gxid directly from GTM */ diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index 458068c..f51672e 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -26,7 +26,7 @@ #include "access/gtm.h" /* PGXC_COORD */ #include "gtm/gtm_c.h" -#include "pgxc/datanode.h" +#include "pgxc/pgxcnode.h" /* PGXC_DATANODE */ #include "postmaster/autovacuum.h" #endif @@ -116,7 +116,10 @@ typedef enum TBlockState TBLOCK_ABORT_END, /* failed xact, ROLLBACK received */ TBLOCK_ABORT_PENDING, /* live xact, ROLLBACK received */ TBLOCK_PREPARE, /* live xact, PREPARE received */ - +#ifdef PGXC + TBLOCK_PREPARE_NO_2PC_FILE, /* PREPARE receive but skip 2PC file creation + * and Commit gxact */ +#endif /* subtransaction states */ TBLOCK_SUBBEGIN, /* starting a subtransaction */ TBLOCK_SUBINPROGRESS, /* live subtransaction */ @@ -334,7 +337,7 @@ static GlobalTransactionId GetGlobalTransactionId(TransactionState s) { GTM_Timestamp gtm_timestamp; - bool received_tp; + bool received_tp = false; /* * Here we receive timestamp at the same time as gxid. @@ -495,7 +498,7 @@ AssignTransactionId(TransactionState s) * the Xid as "running". See GetNewTransactionId. */ #ifdef PGXC /* PGXC_COORD */ - if (IS_PGXC_COORDINATOR) + if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) { s->transactionId = (TransactionId) GetGlobalTransactionId(s); elog(DEBUG1, "New transaction id assigned = %d, isSubXact = %s", @@ -1629,7 +1632,8 @@ StartTransaction(void) */ s->state = TRANS_START; #ifdef PGXC /* PGXC_COORD */ - if (IS_PGXC_COORDINATOR) + /* GXID is assigned already by a remote Coordinator */ + if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) s->globalTransactionId = InvalidGlobalTransactionId; /* until assigned */ #endif s->transactionId = InvalidTransactionId; /* until assigned */ @@ -1797,7 +1801,7 @@ CommitTransaction(void) * There can be error on the data nodes. So go to data nodes before * changing transaction state and local clean up */ - DataNodeCommit(); + PGXCNodeCommit(); #endif /* Prevent cancel/die interrupt while cleaning up */ @@ -1818,14 +1822,15 @@ CommitTransaction(void) #ifdef PGXC /* - * Now we can let GTM know about transaction commit + * Now we can let GTM know about transaction commit. + * Only a Remote Coordinator is allowed to do that. */ - if (IS_PGXC_COORDINATOR) + if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) { CommitTranGTM(s->globalTransactionId); latestXid = s->globalTransactionId; } - else if (IS_PGXC_DATANODE) + else if (IS_PGXC_DATANODE || IsConnFromCoord()) { /* If we are autovacuum, commit on GTM */ if ((IsAutoVacuumWorkerProcess() || GetForceXidFromGTM()) @@ -1930,9 +1935,9 @@ CommitTransaction(void) s->maxChildXids = 0; #ifdef PGXC - if (IS_PGXC_COORDINATOR) + if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) s->globalTransactionId = InvalidGlobalTransactionId; - else if (IS_PGXC_DATANODE) + else if (IS_PGXC_DATANODE || IsConnFromCoord()) SetNextTransactionId(InvalidTransactionId); #endif @@ -1951,8 +1956,17 @@ CommitTransaction(void) * * NB: if you change this routine, better look at CommitTransaction too! */ +#ifdef PGXC +/* + * Only a Postgres-XC Coordinator that received a PREPARE Command from + * an application can use this special prepare. + */ +static void +PrepareTransaction(bool write_2pc_file) +#else static void PrepareTransaction(void) +#endif { TransactionState s = CurrentTransactionState; TransactionId xid = GetCurrentTransactionId(); @@ -2084,7 +2098,7 @@ PrepareTransaction(void) * updates, because the transaction manager might get confused if we lose * a global transaction. */ - EndPrepare(gxact); + EndPrepare(gxact, write_2pc_file); /* * Now we clean up backend-internal state and release internal resources. @@ -2138,7 +2152,7 @@ PrepareTransaction(void) * We want to be able to commit a prepared transaction from another coordinator, * so clean up the gxact in shared memory also. */ - if (IS_PGXC_COORDINATOR) + if (!write_2pc_file) { RemoveGXactCoord(gxact); } @@ -2183,7 +2197,7 @@ PrepareTransaction(void) s->maxChildXids = 0; #ifdef PGXC /* PGXC_DATANODE */ - if (IS_PGXC_DATANODE) + if (IS_PGXC_DATANODE || IsConnFromCoord()) SetNextTransactionId(InvalidTransactionId); #endif /* @@ -2273,16 +2287,18 @@ AbortTransaction(void) TRACE_POSTGRESQL_TRANSACTION_ABORT(MyProc->lxid); #ifdef PGXC - if (IS_PGXC_COORDINATOR) + /* This is done by remote Coordinator */ + if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) { - /* Make sure this is rolled back on the DataNodes, - * if so it will just return + /* + * Make sure this is rolled back on the DataNodes + * if so it will just return */ - DataNodeRollback(); + PGXCNodeRollback(); RollbackTranGTM(s->globalTransactionId); latestXid = s->globalTransactionId; } - else if (IS_PGXC_DATANODE) + else if (IS_PGXC_DATANODE || IsConnFromCoord()) { /* If we are autovacuum, commit on GTM */ if ((IsAutoVacuumWorkerProcess() || GetForceXidFromGTM()) @@ -2378,9 +2394,9 @@ CleanupTransaction(void) s->maxChildXids = 0; #ifdef PGXC /* PGXC_DATANODE */ - if (IS_PGXC_COORDINATOR) + if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) s->globalTransactionId = InvalidGlobalTransactionId; - else if (IS_PGXC_DATANODE) + else if (IS_PGXC_DATANODE || IsConnFromCoord()) SetNextTransactionId(InvalidTransactionId); #endif @@ -2446,6 +2462,9 @@ StartTransactionCommand(void) case TBLOCK_SUBRESTART: case TBLOCK_SUBABORT_RESTART: case TBLOCK_PREPARE: +#ifdef PGXC + case TBLOCK_PREPARE_NO_2PC_FILE: +#endif elog(ERROR, "StartTransactionCommand: unexpected state %s", BlockStateAsString(s->blockState)); break; @@ -2552,9 +2571,20 @@ CommitTransactionCommand(void) * return to the idle state. */ case TBLOCK_PREPARE: - PrepareTransaction(); + PrepareTransaction(true); + s->blockState = TBLOCK_DEFAULT; + break; + +#ifdef PGXC + /* + * We are complieting a PREPARE TRANSACTION for a pgxc transaction + * that involved DDLs on a Coordinator. + */ + case TBLOCK_PREPARE_NO_2PC_FILE: + PrepareTransaction(false); s->blockState = TBLOCK_DEFAULT; break; +#endif /* * We were just issued a SAVEPOINT inside a transaction block. @@ -2586,10 +2616,15 @@ CommitTransactionCommand(void) CommitTransaction(); s->blockState = TBLOCK_DEFAULT; } +#ifdef PGXC + else if (s->blockState == TBLOCK_PREPARE || + s->blockState == TBLOCK_PREPARE_NO_2PC_FILE) +#else else if (s->blockState == TBLOCK_PREPARE) +#endif { Assert(s->parent == NULL); - PrepareTransaction(); + PrepareTransaction(true); s->blockState = TBLOCK_DEFAULT; } else @@ -2789,6 +2824,9 @@ AbortCurrentTransaction(void) * the transaction). */ case TBLOCK_PREPARE: +#ifdef PGXC + case TBLOCK_PREPARE_NO_2PC_FILE: +#endif AbortTransaction(); CleanupTransaction(); s->blockState = TBLOCK_DEFAULT; @@ -3140,6 +3178,9 @@ BeginTransactionBlock(void) case TBLOCK_SUBRESTART: case TBLOCK_SUBABORT_RESTART: case TBLOCK_PREPARE: +#ifdef PGXC + case TBLOCK_PREPARE_NO_2PC_FILE: +#endif elog(FATAL, "BeginTransactionBlock: unexpected state %s", BlockStateAsString(s->blockState)); break; @@ -3158,8 +3199,13 @@ BeginTransactionBlock(void) * We do it this way because it's not convenient to change memory context, * resource owner, etc while executing inside a Portal. */ +#ifdef PGXC +bool +PrepareTransactionBlock(char *gid, bool write_2pc_file) +#else bool PrepareTransactionBlock(char *gid) +#endif { TransactionState s; bool result; @@ -3180,6 +3226,16 @@ PrepareTransactionBlock(char *gid) /* Save GID where PrepareTransaction can find it again */ prepareGID = MemoryContextStrdup(TopTransactionContext, gid); +#ifdef PGXC + /* + * For a Postgres-XC Coordinator, prepare is done for a transaction + * if and only if a DDL was involved in the transaction. + * If not, it is enough to prepare it on Datanodes involved only. + */ + if (!write_2pc_file) + s->blockState = TBLOCK_PREPARE_NO_2PC_FILE; + else +#endif s->blockState = TBLOCK_PREPARE; } else @@ -3308,6 +3364,9 @@ EndTransactionBlock(void) case TBLOCK_SUBRESTART: case TBLOCK_SUBABORT_RESTART: case TBLOCK_PREPARE: +#ifdef PGXC + case TBLOCK_PREPARE_NO_2PC_FILE: +#endif elog(FATAL, "EndTransactionBlock: unexpected state %s", BlockStateAsString(s->blockState)); break; @@ -3400,6 +3459,9 @@ UserAbortTransactionBlock(void) case TBLOCK_SUBRESTART: case TBLOCK_SUBABORT_RESTART: case TBLOCK_PREPARE: +#ifdef PGXC + case TBLOCK_PREPARE_NO_2PC_FILE: +#endif elog(FATAL, "UserAbortTransactionBlock: unexpected state %s", BlockStateAsString(s->blockState)); break; @@ -3447,6 +3509,9 @@ DefineSavepoint(char *name) case TBLOCK_SUBRESTART: case TBLOCK_SUBABORT_RESTART: case TBLOCK_PREPARE: +#ifdef PGXC + case TBLOCK_PREPARE_NO_2PC_FILE: +#endif elog(FATAL, "DefineSavepoint: unexpected state %s", BlockStateAsString(s->blockState)); break; @@ -3503,6 +3568,9 @@ ReleaseSavepoint(List *options) case TBLOCK_SUBRESTART: case TBLOCK_SUBABORT_RESTART: case TBLOCK_PREPARE: +#ifdef PGXC + case TBLOCK_PREPARE_NO_2PC_FILE: +#endif elog(FATAL, "ReleaseSavepoint: unexpected state %s", BlockStateAsString(s->blockState)); break; @@ -3601,6 +3669,9 @@ RollbackToSavepoint(List *options) case TBLOCK_SUBRESTART: case TBLOCK_SUBABORT_RESTART: case TBLOCK_PREPARE: +#ifdef PGXC + case TBLOCK_PREPARE_NO_2PC_FILE: +#endif elog(FATAL, "RollbackToSavepoint: unexpected state %s", BlockStateAsString(s->blockState)); break; @@ -3684,6 +3755,9 @@ BeginInternalSubTransaction(char *name) case TBLOCK_INPROGRESS: case TBLOCK_END: case TBLOCK_PREPARE: +#ifdef PGXC + case TBLOCK_PREPARE_NO_2PC_FILE: +#endif case TBLOCK_SUBINPROGRESS: /* Normal subtransaction start */ PushTransaction(); @@ -3776,6 +3850,9 @@ RollbackAndReleaseCurrentSubTransaction(void) case TBLOCK_SUBRESTART: case TBLOCK_SUBABORT_RESTART: case TBLOCK_PREPARE: +#ifdef PGXC + case TBLOCK_PREPARE_NO_2PC_FILE: +#endif elog(FATAL, "RollbackAndReleaseCurrentSubTransaction: unexpected state %s", BlockStateAsString(s->blockState)); break; @@ -3824,6 +3901,9 @@ AbortOutOfAnyTransaction(void) case TBLOCK_END: case TBLOCK_ABORT_PENDING: case TBLOCK_PREPARE: +#ifdef PGXC + case TBLOCK_PREPARE_NO_2PC_FILE: +#endif /* In a transaction, so clean up */ AbortTransaction(); CleanupTransaction(); @@ -3915,6 +3995,9 @@ TransactionBlockStatusCode(void) case TBLOCK_END: case TBLOCK_SUBEND: case TBLOCK_PREPARE: +#ifdef PGXC + case TBLOCK_PREPARE_NO_2PC_FILE: +#endif return 'T'; /* in transaction */ case TBLOCK_ABORT: case TBLOCK_SUBABORT: @@ -4273,7 +4356,7 @@ PushTransaction(void) * failure. */ #ifdef PGXC /* PGXC_COORD */ - if (IS_PGXC_COORDINATOR) + if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) s->globalTransactionId = InvalidGlobalTransactionId; #endif s->transactionId = InvalidTransactionId; /* until assigned */ @@ -4410,6 +4493,9 @@ BlockStateAsString(TBlockState blockState) return "ABORT END"; case TBLOCK_ABORT_PENDING: return "ABORT PEND"; +#ifdef PGXC + case TBLOCK_PREPARE_NO_2PC_FILE: +#endif case TBLOCK_PREPARE: return "PREPARE"; case TBLOCK_SUBBEGIN: diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index af57e68..dbbca98 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -359,8 +359,11 @@ doRename(const ObjectAddress *object, const char *oldname, const char *newname) * If we are here, a schema is being renamed, a sequence depends on it. * as sequences' global name use the schema name, this sequence * has also to be renamed on GTM. + * An operation with GTM can just be done from a remote Coordinator. */ - if (relKind == RELKIND_SEQUENCE && IS_PGXC_COORDINATOR) + if (relKind == RELKIND_SEQUENCE + && IS_PGXC_COORDINATOR + && !IsConnFromCoord()) { Relation relseq = relation_open(object->objectId, AccessShareLock); char *seqname = GetGlobalSeqName(relseq, NULL, oldname); @@ -1136,8 +1139,11 @@ doDeletion(const ObjectAddress *object) } #ifdef PGXC - /* Drop the sequence on GTM */ - if (relKind == RELKIND_SEQUENCE && IS_PGXC_COORDINATOR) + /* + * Drop the sequence on GTM. + * Sequence is dropped on GTM by a remote Coordinator only. + */ + if (relKind == RELKIND_SEQUENCE && IS_PGXC_COORDINATOR && !IsConnFromCoord()) { /* * The sequence has already been removed from coordinator, diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 772a6f7..a1da3a0 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -180,7 +180,7 @@ typedef struct CopyStateData RelationLocInfo *rel_loc; /* the locator key */ int hash_idx; /* index of the hash column */ - DataNodeHandle **connections; /* Involved data node connections */ + PGXCNodeHandle **connections; /* Involved data node connections */ #endif } CopyStateData; diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c index 83ddbab..7f98b4d 100644 --- a/src/backend/commands/sequence.c +++ b/src/backend/commands/sequence.c @@ -350,7 +350,8 @@ DefineSequence(CreateSeqStmt *seq) heap_close(rel, NoLock); #ifdef PGXC /* PGXC_COORD */ - if (IS_PGXC_COORDINATOR) + /* Remote Coordinator is in charge of creating sequence in GTM */ + if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) { char *seqname = GetGlobalSeqName(rel, NULL, NULL); @@ -492,7 +493,8 @@ AlterSequenceInternal(Oid relid, List *options) relation_close(seqrel, NoLock); #ifdef PGXC - if (IS_PGXC_COORDINATOR) + /* Remote Coordinator is in charge of create sequence in GTM */ + if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) { char *seqname = GetGlobalSeqName(seqrel, NULL, NULL); diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index c8b1456..d3506c8 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -2094,8 +2094,10 @@ RenameRelation(Oid myrelid, const char *newrelname, ObjectType reltype) /* Do the work */ RenameRelationInternal(myrelid, newrelname, namespaceId); #ifdef PGXC - if (IS_PGXC_COORDINATOR && - (reltype == OBJECT_SEQUENCE || relkind == RELKIND_SEQUENCE)) /* It is possible to rename a sequence with ALTER TABLE */ + /* Operation with GTM can only be done with a Remote Coordinator */ + if (IS_PGXC_COORDINATOR + && !IsConnFromCoord() + && (reltype == OBJECT_SEQUENCE || relkind == RELKIND_SEQUENCE)) /* It is possible to rename a sequence with ALTER TABLE */ { char *seqname = GetGlobalSeqName(targetrelation, NULL, NULL); char *newseqname = GetGlobalSeqName(targetrelation, newrelname, NULL); diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index 519ea4f..86db1eb 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -59,7 +59,9 @@ #include "utils/memutils.h" #include "utils/snapmgr.h" #include "utils/tqual.h" - +#ifdef PGXC +#include "pgxc/pgxc.h" +#endif /* Hooks for plugins to get control in ExecutorStart/Run/End() */ ExecutorStart_hook_type ExecutorStart_hook = NULL; diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 8dd924d..2c95815 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -124,7 +124,11 @@ planner(Query *parse, int cursorOptions, ParamListInfo boundParams) result = (*planner_hook) (parse, cursorOptions, boundParams); else #ifdef PGXC - if (IS_PGXC_COORDINATOR) + /* + * A coordinator receiving a query from another Coordinator + * is not allowed to go into PGXC planner. + */ + if (IS_PGXC_COORDINATOR && !IsConnFromCoord()) result = pgxc_planner(parse, cursorOptions, boundParams); else #endif diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c index f47cc6a..cfa470f 100644 --- a/src/backend/parser/parse_utilcmd.c +++ b/src/backend/parser/parse_utilcmd.c @@ -277,6 +277,8 @@ transformCreateStmt(CreateStmt *stmt, const char *queryString) RemoteQuery *step = makeNode(RemoteQuery); step->combine_type = COMBINE_TYPE_SAME; step->sql_statement = queryString; + /* This query is a DDL, Launch it on both Datanodes and Coordinators. */ + step->exec_type = EXEC_ON_ALL_NODES; result = lappend(result, step); } #endif @@ -1970,6 +1972,8 @@ transformAlterTableStmt(AlterTableStmt *stmt, const char *queryString) RemoteQuery *step = makeNode(RemoteQuery); step->combine_type = COMBINE_TYPE_SAME; step->sql_statement = queryString; + /* This query is a DDl, it is launched on both Coordinators and Datanodes. */ + step->exec_type = EXEC_ON_ALL_NODES; result = lappend(result, step); } #endif diff --git a/src/backend/pgxc/locator/locator.c b/src/backend/pgxc/locator/locator.c index debbc77..098e254 100644 --- a/src/backend/pgxc/locator/locator.c +++ b/src/backend/pgxc/locator/locator.c @@ -24,6 +24,7 @@ #include "postgres.h" #include "access/skey.h" +#include "access/gtm.h" #include "access/relscan.h" #include "catalog/indexing.h" #include "catalog/pg_type.h" @@ -440,12 +441,12 @@ GetLocatorType(Oid relid) /* - * Return a list of all nodes. + * Return a list of all Datanodes. * We assume all tables use all nodes in the prototype, so just return a list * from first one. */ List * -GetAllNodes(void) +GetAllDataNodes(void) { int i; @@ -463,10 +464,38 @@ GetAllNodes(void) return nodeList; } +/* + * Return a list of all Coordinators + * This is used to send DDL to all nodes + * Do not put in the list the local Coordinator where this function is launched + */ +List * +GetAllCoordNodes(void) +{ + int i; + + /* + * PGXCTODO - add support for having nodes on a subset of nodes + * For now, assume on all nodes + */ + List *nodeList = NIL; + + for (i = 1; i < NumCoords + 1; i++) + { + /* + * Do not put in list the Coordinator we are on, + * it doesn't make sense to connect to the local coordinator. + */ + if (i != PGXCNodeId) + nodeList = lappend_int(nodeList, i); + } + + return nodeList; +} + /* * Build locator information associated with the specified relation. - * */ void RelationBuildLocator(Relation rel) @@ -528,7 +557,7 @@ RelationBuildLocator(Relation rel) /** PGXCTODO - add support for having nodes on a subset of nodes * For now, assume on all nodes */ - relationLocInfo->nodeList = GetAllNodes(); + relationLocInfo->nodeList = GetAllDataNodes(); relationLocInfo->nodeCount = relationLocInfo->nodeList->length; /* diff --git a/src/backend/pgxc/pool/Makefile b/src/backend/pgxc/pool/Makefile index c7e950a..f8679eb 100644 --- a/src/backend/pgxc/pool/Makefile +++ b/src/backend/pgxc/pool/Makefile @@ -14,6 +14,6 @@ subdir = src/backend/pgxc/pool top_builddir = ../../../.. include $(top_builddir)/src/Makefile.global -OBJS = datanode.o execRemote.o poolmgr.o poolcomm.o postgresql_fdw.o +OBJS = pgxcnode.o execRemote.o poolmgr.o poolcomm.o postgresql_fdw.o include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/pgxc/pool/execRemote.c b/src/backend/pgxc/pool/execRemote.c index 16d2f6b..14dce33 100644 --- a/src/backend/pgxc/pool/execRemote.c +++ b/src/backend/pgxc/pool/execRemote.c @@ -30,6 +30,8 @@ #include "utils/memutils.h" #include "utils/tuplesort.h" #include "utils/snapmgr.h" +#include "pgxc/locator.h" +#include "pgxc/pgxc.h" #define END_QUERY_TIMEOUT 20 #define CLEAR_TIMEOUT 5 @@ -45,26 +47,30 @@ extern char *deparseSql(RemoteQueryState *scanstate); #define PRIMARY_NODE_WRITEAHEAD 1024 * 1024 static bool autocommit = true; -static DataNodeHandle **write_node_list = NULL; +static PGXCNodeHandle **write_node_list = NULL; static int write_node_count = 0; -static int data_node_begin(int conn_count, DataNodeHandle ** connections, +static int pgxc_node_begin(int conn_count, PGXCNodeHandle ** connections, GlobalTransactionId gxid); -static int data_node_commit(int conn_count, DataNodeHandle ** connections); -static int data_node_rollback(int conn_count, DataNodeHandle ** connections); -static int data_node_prepare(int conn_count, DataNodeHandle ** connections, - char *gid); -static int data_node_rollback_prepared(GlobalTransactionId gxid, GlobalTransactionId prepared_gxid, - int conn_count, DataNodeHandle ** connections, - char *gid); -static int data_node_commit_prepared(GlobalTransactionId gxid, GlobalTransactionId prepared_gxid, - int conn_count, DataNodeHandle ** connections, - char *gid); - -static void clear_write_node_list(); - -static int handle_response_clear(DataNodeHandle * conn); - +static int pgxc_node_commit(PGXCNodeAllHandles * pgxc_handles); +static int pgxc_node_rollback(PGXCNodeAllHandles * pgxc_handles); +static int pgxc_node_prepare(PGXCNodeAllHandles * pgxc_handles, char *gid); +static int pgxc_node_rollback_prepared(GlobalTransactionId gxid, GlobalTransactionId prepared_gxid, + PGXCNodeAllHandles * pgxc_handles, char *gid); +static int pgxc_node_commit_prepared(GlobalTransactionId gxid, GlobalTransactionId prepared_gxid, + PGXCNodeAllHandles * pgxc_handles, char *gid); +static PGXCNodeAllHandles * get_exec_connections(ExecNodes *exec_nodes, + RemoteQueryExecType exec_type); +static int pgxc_node_receive_and_validate(const int conn_count, + PGXCNodeHandle ** connections, + bool reset_combiner); +static void clear_write_node_list(void); + +static void pfree_pgxc_all_handles(PGXCNodeAllHandles *pgxc_handles); + +static int handle_response_clear(PGXCNodeHandle * conn); + +static PGXCNodeAllHandles *pgxc_get_all_transaction_nodes(void); #define MAX_STATEMENTS_PER_TRAN 10 @@ -922,14 +928,14 @@ FetchTuple(RemoteQueryState *combiner, TupleTableSlot *slot) * Handle responses from the Data node connections */ static int -data_node_receive_responses(const int conn_count, DataNodeHandle ** connections, +pgxc_node_receive_responses(const int conn_count, PGXCNodeHandle ** connections, struct timeval * timeout, RemoteQueryState *combiner) { int count = conn_count; - DataNodeHandle *to_receive[conn_count]; + PGXCNodeHandle *to_receive[conn_count]; /* make a copy of the pointers to the connections */ - memcpy(to_receive, connections, conn_count * sizeof(DataNodeHandle *)); + memcpy(to_receive, connections, conn_count * sizeof(PGXCNodeHandle *)); /* * Read results. @@ -941,7 +947,7 @@ data_node_receive_responses(const int conn_count, DataNodeHandle ** connections, { int i = 0; - if (data_node_receive(count, to_receive, timeout)) + if (pgxc_node_receive(count, to_receive, timeout)) return EOF; while (i < count) { @@ -986,7 +992,7 @@ data_node_receive_responses(const int conn_count, DataNodeHandle ** connections, * 2 - got copy response */ int -handle_response(DataNodeHandle * conn, RemoteQueryState *combiner) +handle_response(PGXCNodeHandle * conn, RemoteQueryState *combiner) { char *msg; int msg_len; @@ -1094,7 +1100,7 @@ handle_response(DataNodeHandle * conn, RemoteQueryState *combiner) * RESPONSE_COMPLETE - done with the connection, or done trying (error) */ static int -handle_response_clear(DataNodeHandle * conn) +handle_response_clear(PGXCNodeHandle * conn) { char *msg; int msg_len; @@ -1156,10 +1162,10 @@ handle_response_clear(DataNodeHandle * conn) /* - * Send BEGIN command to the Data nodes and receive responses + * Send BEGIN command to the Datanodes or Coordinators and receive responses */ static int -data_node_begin(int conn_count, DataNodeHandle ** connections, +pgxc_node_begin(int conn_count, PGXCNodeHandle ** connections, GlobalTransactionId gxid) { int i; @@ -1170,20 +1176,20 @@ data_node_begin(int conn_count, DataNodeHandle ** connections, /* Send BEGIN */ for (i = 0; i < conn_count; i++) { - if (GlobalTransactionIdIsValid(gxid) && data_node_send_gxid(connections[i], gxid)) + if (GlobalTransactionIdIsValid(gxid) && pgxc_node_send_gxid(connections[i], gxid)) return EOF; - if (GlobalTimestampIsValid(timestamp) && data_node_send_timestamp(connections[i], timestamp)) + if (GlobalTimestampIsValid(timestamp) && pgxc_node_send_timestamp(connections[i], timestamp)) return EOF; - if (data_node_send_query(connections[i], "BEGIN")) + if (pgxc_node_send_query(connections[i], "BEGIN")) return EOF; } combiner = CreateResponseCombiner(conn_count, COMBINE_TYPE_NONE); /* Receive responses */ - if (data_node_receive_responses(conn_count, connections, timeout, combiner)) + if (pgxc_node_receive_responses(conn_count, connections, timeout, combiner)) return EOF; /* Verify status */ @@ -1197,17 +1203,17 @@ clear_write_node_list() /* we just malloc once and use counter */ if (write_node_list == NULL) { - write_node_list = (DataNodeHandle **) malloc(NumDataNodes * sizeof(DataNodeHandle *)); + write_node_list = (PGXCNodeHandle **) malloc(NumDataNodes * sizeof(PGXCNodeHandle *)); } write_node_count = 0; } /* - * Switch autocommmit mode off, so all subsequent statements will be in the same transaction + * Switch autocommit mode off, so all subsequent statements will be in the same transaction */ void -DataNodeBegin(void) +PGXCNodeBegin(void) { autocommit = false; clear_write_node_list(); @@ -1215,18 +1221,30 @@ DataNodeBegin(void) /* - * Prepare transaction on Datanodes involved in current transaction. + * Prepare transaction on Datanodes and Coordinators involved in current transaction. * GXID associated to current transaction has to be committed on GTM. */ -int -DataNodePrepare(char *gid) +bool +PGXCNodePrepare(char *gid) { int res = 0; int tran_count; - DataNodeHandle *connections[NumDataNodes]; + PGXCNodeAllHandles *pgxc_connections; + bool local_operation = false; + + pgxc_connections = pgxc_get_all_transaction_nodes(); - /* gather connections to prepare */ - tran_count = get_transaction_nodes(connections); + /* DDL involved in transaction, so make a local prepare too */ + if (pgxc_connections->co_conn_count != 0) + local_operation = true; + + /* + * If no connections have been gathered for Coordinators, + * it means that no DDL has been involved in this transaction. + * And so this transaction is not prepared on Coordinators. + * It is only on Datanodes that data is involved. + */ + tran_count = pgxc_connections->dn_conn_count + pgxc_connections->co_conn_count; /* * If we do not have open transactions we have nothing to prepare just @@ -1234,12 +1252,11 @@ DataNodePrepare(char *gid) */ if (tran_count == 0) { - elog(WARNING, "Nothing to PREPARE on Datanodes, gid is not used"); + elog(WARNING, "Nothing to PREPARE on Datanodes and Coordinators, gid is not used"); goto finish; } - /* TODO: data_node_prepare */ - res = data_node_prepare(tran_count, connections, gid); + res = pgxc_node_prepare(pgxc_connections, gid); finish: /* @@ -1249,12 +1266,16 @@ finish: * Release the connections for the moment. */ if (!autocommit) - stat_transaction(tran_count); + stat_transaction(pgxc_connections->dn_conn_count); if (!PersistentConnections) release_handles(false); autocommit = true; clear_write_node_list(); - return res; + + /* Clean up connections */ + pfree_pgxc_all_handles(pgxc_connections); + + return local_operation; } @@ -1262,47 +1283,64 @@ finish: * Prepare transaction on dedicated nodes with gid received from application */ static int -data_node_prepare(int conn_count, DataNodeHandle ** connections, char *gid) +pgxc_node_prepare(PGXCNodeAllHandles *pgxc_handles, char *gid) { - int i; + int real_co_conn_count; int result = 0; - struct timeval *timeout = NULL; + int co_conn_count = pgxc_handles->co_conn_count; + int dn_conn_count = pgxc_handles->dn_conn_count; char *buffer = (char *) palloc0(22 + strlen(gid) + 1); - RemoteQueryState *combiner = NULL; GlobalTransactionId gxid = InvalidGlobalTransactionId; PGXC_NodeId *datanodes = NULL; + PGXC_NodeId *coordinators = NULL; gxid = GetCurrentGlobalTransactionId(); /* * Now that the transaction has been prepared on the nodes, - * Initialize to make the business on GTM + * Initialize to make the business on GTM. + * We also had the Coordinator we are on in the prepared state. + */ + if (dn_conn_count != 0) + datanodes = collect_pgxcnode_numbers(dn_conn_count, + pgxc_handles->datanode_handles, REMOTE_CONN_DATANODE); + + /* + * Local Coordinator is saved in the list sent to GTM + * only when a DDL is involved in the transaction. + * So we don't need to complete the list of Coordinators sent to GTM + * when number of connections to Coordinator is zero (no DDL). */ - datanodes = collect_datanode_numbers(conn_count, connections); + if (co_conn_count != 0) + coordinators = collect_pgxcnode_numbers(co_conn_count, + pgxc_handles->coord_handles, REMOTE_CONN_COORD); /* - * Send a Prepare in Progress message to GTM. - * At the same time node list is saved on GTM. + * Tell to GTM that the transaction is being prepared first. + * Don't forget to add in the list of Coordinators the coordinator we are on + * if a DDL is involved in the transaction. + * This one also is being prepared ! */ - result = BeingPreparedTranGTM(gxid, gid, conn_count, datanodes, 0, NULL); + if (co_conn_count == 0) + real_co_conn_count = co_conn_count; + else + real_co_conn_count = co_conn_count + 1; + + result = StartPreparedTranGTM(gxid, gid, dn_conn_count, + datanodes, real_co_conn_count, coordinators); if (result < 0) return EOF; sprintf(buffer, "PREPARE TRANSACTION '%s'", gid); - /* Send PREPARE */ - for (i = 0; i < conn_count; i++) - if (data_node_send_query(connections[i], buffer)) - return EOF; + /* Continue even after an error here, to consume the messages */ + result = pgxc_all_handles_send_query(pgxc_handles, buffer, true); - combiner = CreateResponseCombiner(conn_count, COMBINE_TYPE_NONE); + /* Receive and Combine results from Datanodes and Coordinators */ + result |= pgxc_node_receive_and_validate(dn_conn_count, pgxc_handles->datanode_handles, false); + result |= pgxc_node_receive_and_validate(co_conn_count, pgxc_handles->coord_handles, false); - /* Receive responses */ - if (data_node_receive_responses(conn_count, connections, timeout, combiner)) - return EOF; - - result = ValidateAndCloseCombiner(combiner) ? result : EOF; if (result) goto finish; @@ -1324,31 +1362,27 @@ finish: if (result) { GlobalTransactionId rollback_xid = InvalidGlobalTransactionId; - buffer = (char *) repalloc(buffer, 20 + strlen(gid) + 1); + result = 0; + buffer = (char *) repalloc(buffer, 20 + strlen(gid) + 1); sprintf(buffer, "ROLLBACK PREPARED '%s'", gid); - rollback_xid = BeginTranGTM(NULL); - for (i = 0; i < conn_count; i++) - { - if (data_node_send_gxid(connections[i], rollback_xid)) - { - add_error_message(connections[i], "Can not send request"); - return EOF; - } - if (data_node_send_query(connections[i], buffer)) - { - add_error_message(connections[i], "Can not send request"); - return EOF; - } - } + /* Consume any messages on the Datanodes and Coordinators first if necessary */ + PGXCNodeConsumeMessages(); - if (!combiner) - combiner = CreateResponseCombiner(conn_count, COMBINE_TYPE_NONE); + rollback_xid = BeginTranGTM(NULL); - if (data_node_receive_responses(conn_count, connections, timeout, combiner)) + /* + * Send xid and rollback prepared down to Datanodes and Coordinators + * Even if we get an error on one, we try and send to the others + */ + if (pgxc_all_handles_send_gxid(pgxc_handles, rollback_xid, false)) result = EOF; - result = ValidateAndCloseCombiner(combiner) ? result : EOF; + if (pgxc_all_handles_send_query(pgxc_handles, buffer, false)) + result = EOF; + + result = pgxc_node_receive_and_validate(dn_conn_count, pgxc_handles->datanode_handles, false); + result |= pgxc_node_receive_and_validate(co_conn_count, pgxc_handles->coord_handles, false); /* * Don't forget to rollback also on GTM @@ -1364,26 +1398,30 @@ finish: /* - * Commit prepared transaction on Datanodes where it has been prepared. + * Commit prepared transaction on Datanodes and Coordinators (as necessary) + * where it has been prepared. * Connection to backends has been cut when transaction has been prepared, * So it is necessary to send the COMMIT PREPARE message to all the nodes. * We are not sure if the transaction prepared has involved all the datanodes * or not but send the message to all of them. * This avoid to have any additional interaction with GTM when making a 2PC transaction. */ -void -DataNodeCommitPrepared(char *gid) +bool +PGXCNodeCommitPrepared(char *gid) { int res = 0; int res_gtm = 0; - DataNodeHandle **connections; - List *nodelist = NIL; + PGXCNodeAllHandles *pgxc_handles; + List *datanodelist = NIL; + List *coordlist = NIL; int i, tran_count; PGXC_NodeId *datanodes = NULL; PGXC_NodeId *coordinators = NULL; int coordcnt = 0; int datanodecnt = 0; GlobalTransactionId gxid, prepared_gxid; + /* This flag tracks if the transaction has to be committed locally */ + bool operation_local = false; res_gtm = GetGIDDataGTM(gid, &gxid, &prepared_gxid, &datanodecnt, &datanodes, &coordcnt, &coordinators); @@ -1394,17 +1432,33 @@ DataNodeCommitPrepared(char *gid) autocommit = false; - /* Build the list of nodes based on data received from GTM */ + /* + * Build the list of nodes based on data received from GTM. + * For Sequence DDL this list is NULL. + */ for (i = 0; i < datanodecnt; i++) + datanodelist = lappend_int(datanodelist,datanodes[i]); + + for (i = 0; i < coordcnt; i++) { - nodelist = lappend_int(nodelist,datanodes[i]); + /* Local Coordinator number found, has to commit locally also */ + if (coordinators[i] == PGXCNodeId) + operation_local = true; + else + coordlist = lappend_int(coordlist,coordinators[i]); } /* Get connections */ - connections = get_handles(nodelist); + if (coordcnt > 0 && datanodecnt == 0) + pgxc_handles = get_handles(datanodelist, coordlist, true); + else + pgxc_handles = get_handles(datanodelist, coordlist, false); - /* Commit here the prepared transaction to all Datanodes */ - res = data_node_commit_prepared(gxid, prepared_gxid, datanodecnt, connections, gid); + /* + * Commit here the prepared transaction to all Datanodes and Coordinators + * If necessary, local Coordinator Commit is performed after this DataNodeCommitPrepared. + */ + res = pgxc_node_commit_prepared(gxid, prepared_gxid, pgxc_handles, gid); finish: /* In autocommit mode statistics is collected in DataNodeExec */ @@ -1416,11 +1470,13 @@ finish: clear_write_node_list(); /* Free node list taken from GTM */ - if (datanodes) + if (datanodes && datanodecnt != 0) free(datanodes); - if (coordinators) + + if (coordinators && coordcnt != 0) free(coordinators); + pfree_pgxc_all_handles(pgxc_handles); if (res_gtm < 0) ereport(ERROR, (errcode(ERRCODE_INTERNAL_ERROR), @@ -1429,6 +1485,8 @@ finish: ereport(ERROR, (errcode(ERRCODE_INTERNAL_ERROR), errmsg("Could not commit prepared transaction on data nodes"))); + + return operation_local; } /* @@ -1440,42 +1498,29 @@ finish: * This permits to avoid interactions with GTM. */ static int -data_node_commit_prepared(GlobalTransactionId gxid, GlobalTransactionId prepared_gxid, int conn_count, DataNodeHandle ** connections, char *gid) +pgxc_node_commit_prepared(GlobalTransactionId gxid, + GlobalTransactionId prepared_gxid, + PGXCNodeAllHandles *pgxc_handles, + char *gid) { int result = 0; - int i; - RemoteQueryState *combiner = NULL; - struct timeval *timeout = NULL; + int co_conn_count = pgxc_handles->co_conn_count; + int dn_conn_count = pgxc_handles->dn_conn_count; char *buffer = (char *) palloc0(18 + strlen(gid) + 1); /* GXID has been piggybacked when gid data has been received from GTM */ sprintf(buffer, "COMMIT PREPARED '%s'", gid); /* Send gxid and COMMIT PREPARED message to all the Datanodes */ - for (i = 0; i < conn_count; i++) - { - if (data_node_send_gxid(connections[i], gxid)) - { - add_error_message(connections[i], "Can not send request"); - result = EOF; - goto finish; - } - if (data_node_send_query(connections[i], buffer)) - { - add_error_message(connections[i], "Can not send request"); - result = EOF; - goto finish; - } - } - - combiner = CreateResponseCombiner(conn_count, COMBINE_TYPE_NONE); + if (pgxc_all_handles_send_gxid(pgxc_handles, gxid, true)) + goto finish; - /* Receive responses */ - if (data_node_receive_responses(conn_count, connections, timeout, combiner)) + /* Continue and receive responses even if there is an error */ + if (pgxc_all_handles_send_query(pgxc_handles, buffer, false)) result = EOF; - /* Validate and close combiner */ - result = ValidateAndCloseCombiner(combiner) ? result : EOF; + result = pgxc_node_receive_and_validate(dn_conn_count, pgxc_handles->datanode_handles, false); + result |= pgxc_node_receive_and_validate(co_conn_count, pgxc_handles->coord_handles, false); finish: /* Both GXIDs used for PREPARE and COMMIT PREPARED are discarded from GTM snapshot here */ @@ -1486,21 +1531,25 @@ finish: /* * Rollback prepared transaction on Datanodes involved in the current transaction + * + * Return whether or not a local operation required. */ -void -DataNodeRollbackPrepared(char *gid) +bool +PGXCNodeRollbackPrepared(char *gid) { int res = 0; int res_gtm = 0; - DataNodeHandle **connections; - List *nodelist = NIL; + PGXCNodeAllHandles *pgxc_handles; + List *datanodelist = NIL; + List *coordlist = NIL; int i, tran_count; - PGXC_NodeId *datanodes = NULL; PGXC_NodeId *coordinators = NULL; int coordcnt = 0; int datanodecnt = 0; GlobalTransactionId gxid, prepared_gxid; + /* This flag tracks if the transaction has to be rolled back locally */ + bool operation_local = false; res_gtm = GetGIDDataGTM(gid, &gxid, &prepared_gxid, &datanodecnt, &datanodes, &coordcnt, &coordinators); @@ -1513,15 +1562,25 @@ DataNodeRollbackPrepared(char *gid) /* Build the node list based on the result got from GTM */ for (i = 0; i < datanodecnt; i++) + datanodelist = lappend_int(datanodelist,datanodes[i]); + + for (i = 0; i < coordcnt; i++) { - nodelist = lappend_int(nodelist,datanodes[i]); + /* Local Coordinator number found, has to rollback locally also */ + if (coordinators[i] == PGXCNodeId) + operation_local = true; + else + coordlist = lappend_int(coordlist,coordinators[i]); } /* Get connections */ - connections = get_handles(nodelist); + if (coordcnt > 0 && datanodecnt == 0) + pgxc_handles = get_handles(datanodelist, coordlist, true); + else + pgxc_handles = get_handles(datanodelist, coordlist, false); - /* Here do the real rollback to Datanodes */ - res = data_node_rollback_prepared(gxid, prepared_gxid, datanodecnt, connections, gid); + /* Here do the real rollback to Datanodes and Coordinators */ + res = pgxc_node_rollback_prepared(gxid, prepared_gxid, pgxc_handles, gid); finish: /* In autocommit mode statistics is collected in DataNodeExec */ @@ -1530,7 +1589,16 @@ finish: if (!PersistentConnections) release_handles(true); autocommit = true; - clear_write_node_list(true); + clear_write_node_list(); + + /* Free node list taken from GTM */ + if (datanodes) + free(datanodes); + + if (coordinators) + free(coordinators); + + pfree_pgxc_all_handles(pgxc_handles); if (res_gtm < 0) ereport(ERROR, (errcode(ERRCODE_INTERNAL_ERROR), @@ -1539,6 +1607,8 @@ finish: ereport(ERROR, (errcode(ERRCODE_INTERNAL_ERROR), errmsg("Could not rollback prepared transaction on Datanodes"))); + + return operation_local; } @@ -1548,13 +1618,12 @@ finish: * At the end both prepared GXID and GXID are committed. */ static int -data_node_rollback_prepared(GlobalTransactionId gxid, GlobalTransactionId prepared_gxid, - int conn_count, DataNodeHandle ** connections, char *gid) +pgxc_node_rollback_prepared(GlobalTransactionId gxid, GlobalTransactionId prepared_gxid, + PGXCNodeAllHandles *pgxc_handles, char *gid) { int result = 0; - int i; - RemoteQueryState *combiner = NULL; - struct timeval *timeout = NULL; + int dn_conn_count = pgxc_handles->dn_conn_count; + int co_conn_count = pgxc_handles->co_conn_count; char *buffer = (char *) palloc0(20 + strlen(gid) + 1); /* Datanodes have reset after prepared state, so get a new gxid */ @@ -1562,34 +1631,15 @@ data_node_rollback_prepared(GlobalTransactionId gxid, GlobalTransactionId prepar sprintf(buffer, "ROLLBACK PREPARED '%s'", gid); - /* Send gxid and COMMIT PREPARED message to all the Datanodes */ - for (i = 0; i < conn_count; i++) - { - if (data_node_send_gxid(connections[i], gxid)) - { - add_error_message(connections[i], "Can not send request"); - result = EOF; - goto finish; - } - - if (data_node_send_query(connections[i], buffer)) - { - add_error_message(connections[i], "Can not send request"); - result = EOF; - goto finish; - } - } - - combiner = CreateResponseCombiner(conn_count, COMBINE_TYPE_NONE); - - /* Receive responses */ - if (data_node_receive_responses(conn_count, connections, timeout, combiner)) + /* Send gxid and ROLLBACK PREPARED message to all the Datanodes */ + if (pgxc_all_handles_send_gxid(pgxc_handles, gxid, false)) + result = EOF; + if (pgxc_all_handles_send_query(pgxc_handles, buffer, false)) result = EOF; - /* Validate and close combiner */ - result = ValidateAndCloseCombiner(combiner) ? result : EOF; + result = pgxc_node_receive_and_validate(dn_conn_count, pgxc_handles->datanode_handles, false); + result |= pgxc_node_receive_and_validate(co_conn_count, pgxc_handles->coord_handles, false); -finish: /* Both GXIDs used for PREPARE and COMMIT PREPARED are discarded from GTM snapshot here */ CommitPreparedTranGTM(gxid, prepared_gxid); @@ -1601,14 +1651,15 @@ finish: * Commit current transaction on data nodes where it has been started */ void -DataNodeCommit(void) +PGXCNodeCommit(void) { int res = 0; int tran_count; - DataNodeHandle *connections[NumDataNodes]; + PGXCNodeAllHandles *pgxc_connections; - /* gather connections to commit */ - tran_count = get_transaction_nodes(connections); + pgxc_connections = pgxc_get_all_transaction_nodes(); + + tran_count = pgxc_connections->dn_conn_count + pgxc_connections->co_conn_count; /* * If we do not have open transactions we have nothing to commit, just @@ -1617,7 +1668,7 @@ DataNodeCommit(void) if (tran_count == 0) goto finish; - res = data_node_commit(tran_count, connections); + res = pgxc_node_commit(pgxc_connections); finish: /* In autocommit mode statistics is collected in DataNodeExec */ @@ -1627,6 +1678,9 @@ finish: release_handles(false); autocommit = true; clear_write_node_list(); + + /* Clear up connection */ + pfree_pgxc_all_handles(pgxc_connections); if (res != 0) ereport(ERROR, (errcode(ERRCODE_INTERNAL_ERROR), @@ -1639,15 +1693,13 @@ finish: * if more then on one node data have been modified during the transactioon. */ static int -data_node_commit(int conn_count, DataNodeHandle ** connections) +pgxc_node_commit(PGXCNodeAllHandles *pgxc_handles) { - int i; - struct timeval *timeout = NULL; char buffer[256]; GlobalTransactionId gxid = InvalidGlobalTransactionId; int result = 0; - RemoteQueryState *combiner = NULL; - + int co_conn_count = pgxc_handles->co_conn_count; + int dn_conn_count = pgxc_handles->dn_conn_count; /* can set this to false to disable temporarily */ /* bool do2PC = conn_count > 1; */ @@ -1674,21 +1726,13 @@ data_node_commit(int conn_count, DataNodeHandle ** connections) gxid = GetCurrentGlobalTransactionId(); sprintf(buffer, "PREPARE TRANSACTION 'T%d'", gxid); - /* Send PREPARE */ - for (i = 0; i < conn_count; i++) - { - if (data_node_send_query(connections[i], buffer)) - return EOF; - } - combiner = CreateResponseCombiner(conn_count, COMBINE_TYPE_NONE); - /* Receive responses */ - if (data_node_receive_responses(conn_count, connections, timeout, combiner)) + if (pgxc_all_handles_send_query(pgxc_handles, buffer, false)) result = EOF; - /* Reset combiner */ - if (!ValidateAndResetCombiner(combiner)) - result = EOF; + /* Receive and Combine results from Datanodes and Coordinators */ + result |= pgxc_node_receive_and_validate(dn_conn_count, pgxc_handles->datanode_handles, true); + result |= pgxc_node_receive_and_validate(co_conn_count, pgxc_handles->coord_handles, true); } if (!do2PC) @@ -1696,7 +1740,11 @@ data_node_commit(int conn_count, DataNodeHandle ** connections) else { if (result) + { sprintf(buffer, "ROLLBACK PREPARED 'T%d'", gxid); + /* Consume any messages on the Datanodes and Coordinators first if necessary */ + PGXCNodeConsumeMessages(); + } else sprintf(buffer, "COMMIT PREPARED 'T%d'", gxid); @@ -1707,33 +1755,20 @@ data_node_commit(int conn_count, DataNodeHandle ** connections) */ two_phase_xid = BeginTranGTM(NULL); - for (i = 0; i < conn_count; i++) - { - if (data_node_send_gxid(connections[i], two_phase_xid)) - { - add_error_message(connections[i], "Can not send request"); - result = EOF; - goto finish; - } - } - } - - /* Send COMMIT */ - for (i = 0; i < conn_count; i++) - { - if (data_node_send_query(connections[i], buffer)) + if (pgxc_all_handles_send_gxid(pgxc_handles, two_phase_xid, true)) { result = EOF; goto finish; } } - if (!combiner) - combiner = CreateResponseCombiner(conn_count, COMBINE_TYPE_NONE); - /* Receive responses */ - if (data_node_receive_responses(conn_count, connections, timeout, combiner)) + /* Send COMMIT to all handles */ + if (pgxc_all_handles_send_query(pgxc_handles, buffer, false)) result = EOF; - result = ValidateAndCloseCombiner(combiner) ? result : EOF; + + /* Receive and Combine results from Datanodes and Coordinators */ + result |= pgxc_node_receive_and_validate(dn_conn_count, pgxc_handles->datanode_handles, false); + result |= pgxc_node_receive_and_validate(co_conn_count, pgxc_handles->coord_handles, false); finish: if (do2PC) @@ -1748,18 +1783,18 @@ finish: * This will happen */ int -DataNodeRollback(void) +PGXCNodeRollback(void) { int res = 0; int tran_count; - DataNodeHandle *connections[NumDataNodes]; + PGXCNodeAllHandles *pgxc_connections; + pgxc_connections = pgxc_get_all_transaction_nodes(); - /* Consume any messages on the data nodes first if necessary */ - DataNodeConsumeMessages(); + tran_count = pgxc_connections->dn_conn_count + pgxc_connections->co_conn_count; - /* gather connections to rollback */ - tran_count = get_transaction_nodes(connections); + /* Consume any messages on the Datanodes and Coordinators first if necessary */ + PGXCNodeConsumeMessages(); /* * If we do not have open transactions we have nothing to rollback just @@ -1768,7 +1803,7 @@ DataNodeRollback(void) if (tran_count == 0) goto finish; - res = data_node_rollback(tran_count, connections); + res = pgxc_node_rollback(pgxc_connections); finish: /* In autocommit mode statistics is collected in DataNodeExec */ @@ -1778,20 +1813,23 @@ finish: release_handles(true); autocommit = true; clear_write_node_list(); + + /* Clean up connections */ + pfree_pgxc_all_handles(pgxc_connections); return res; } /* - * Send ROLLBACK command down to the Data nodes and handle responses + * Send ROLLBACK command down to Datanodes and Coordinators and handle responses */ static int -data_node_rollback(int conn_count, DataNodeHandle ** connections) +pgxc_node_rollback(PGXCNodeAllHandles *pgxc_handles) { int i; - struct timeval *timeout = NULL; - RemoteQueryState *combiner; - + int result = 0; + int co_conn_count = pgxc_handles->co_conn_count; + int dn_conn_count = pgxc_handles->dn_conn_count; /* * Rollback is a special case, being issued because of an error. @@ -1799,20 +1837,21 @@ data_node_rollback(int conn_count, DataNodeHandle ** connections) * issuing our rollbacks so that we did not read the results of the * previous command. */ - for (i = 0; i < conn_count; i++) - clear_socket_data(connections[i]); + for (i = 0; i < dn_conn_count; i++) + clear_socket_data(pgxc_handles->datanode_handles[i]); - /* Send ROLLBACK - */ - for (i = 0; i < conn_count; i++) - data_node_send_query(connections[i], "ROLLBACK"); + for (i = 0; i < co_conn_count; i++) + clear_socket_data(pgxc_handles->coord_handles[i]); - combiner = CreateResponseCombiner(conn_count, COMBINE_TYPE_NONE); - /* Receive responses */ - if (data_node_receive_responses(conn_count, connections, timeout, combiner)) - return EOF; + /* Send ROLLBACK to all handles */ + if (pgxc_all_handles_send_query(pgxc_handles, "ROLLBACK", false)) + result = EOF; - /* Verify status */ - return ValidateAndCloseCombiner(combiner) ? 0 : EOF; + /* Receive and Combine results from Datanodes and Coordinators */ + result |= pgxc_node_receive_and_validate(dn_conn_count, pgxc_handles->datanode_handles, false); + result |= pgxc_node_receive_and_validate(co_conn_count, pgxc_handles->coord_handles, false); + + return result; } @@ -1820,15 +1859,16 @@ data_node_rollback(int conn_count, DataNodeHandle ** connections) * Begin COPY command * The copy_connections array must have room for NumDataNodes items */ -DataNodeHandle** +PGXCNodeHandle** DataNodeCopyBegin(const char *query, List *nodelist, Snapshot snapshot, bool is_from) { int i, j; int conn_count = list_length(nodelist) == 0 ? NumDataNodes : list_length(nodelist); struct timeval *timeout = NULL; - DataNodeHandle **connections; - DataNodeHandle **copy_connections; - DataNodeHandle *newConnections[conn_count]; + PGXCNodeAllHandles *pgxc_handles; + PGXCNodeHandle **connections; + PGXCNodeHandle **copy_connections; + PGXCNodeHandle *newConnections[conn_count]; int new_count = 0; ListCell *nodeitem; bool need_tran; @@ -1840,7 +1880,9 @@ DataNodeCopyBegin(const char *query, List *nodelist, Snapshot snapshot, bool is_ return NULL; /* Get needed datanode connections */ - connections = get_handles(nodelist); + pgxc_handles = get_handles(nodelist, NULL, false); + connections = pgxc_handles->datanode_handles; + if (!connections) return NULL; @@ -1853,7 +1895,7 @@ DataNodeCopyBegin(const char *query, List *nodelist, Snapshot snapshot, bool is_ * So store connections in an array where index is node-1. * Unused items in the array should be NULL */ - copy_connections = (DataNodeHandle **) palloc0(NumDataNodes * sizeof(DataNodeHandle *)); + copy_connections = (PGXCNodeHandle **) palloc0(NumDataNodes * sizeof(PGXCNodeHandle *)); i = 0; foreach(nodeitem, nodelist) copy_connections[lfirst_int(nodeitem) - 1] = connections[i++]; @@ -1910,7 +1952,7 @@ DataNodeCopyBegin(const char *query, List *nodelist, Snapshot snapshot, bool is_ if (new_count > 0 && need_tran) { /* Start transaction on connections where it is not started */ - if (data_node_begin(new_count, newConnections, gxid)) + if (pgxc_node_begin(new_count, newConnections, gxid)) { pfree(connections); pfree(copy_connections); @@ -1922,18 +1964,18 @@ DataNodeCopyBegin(const char *query, List *nodelist, Snapshot snapshot, bool is_ for (i = 0; i < conn_count; i++) { /* If explicit transaction is needed gxid is already sent */ - if (!need_tran && data_node_send_gxid(connections[i], gxid)) + if (!need_tran && pgxc_node_send_gxid(connections[i], gxid)) { add_error_message(connections[i], "Can not send request"); pfree(connections); pfree(copy_connections); return NULL; } - if (conn_count == 1 && data_node_send_timestamp(connections[i], timestamp)) + if (conn_count == 1 && pgxc_node_send_timestamp(connections[i], tim... [truncated message content] |
From: mason_s <ma...@us...> - 2010-10-04 20:54:09
|
Project "Postgres-XC". The branch, master has been updated via ea13b66f4beaeb13db9741fb5a1347f976b9ebab (commit) from d044db4cc1b8cf18f14cfaa6c65d39ec14905dfb (commit) - Log ----------------------------------------------------------------- commit ea13b66f4beaeb13db9741fb5a1347f976b9ebab Author: Mason Sharp <ma...@us...> Date: Mon Oct 4 16:53:07 2010 -0400 Fixed bug where extra materialization nodes were being created. By Pavan Deolasee diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index 337f17b..818ea1b 100644 --- a/src/backend/optimizer/plan/createplan.c +++ b/src/backend/optimizer/plan/createplan.c @@ -321,15 +321,6 @@ create_scan_plan(PlannerInfo *root, Path *best_path) best_path, tlist, scan_clauses); - - /* - * Insert a materialization plan above this temporarily - * until we better handle multiple steps using the same connection. - */ - matplan = (Plan *) make_material(plan); - copy_plan_costsize(matplan, plan); - matplan->total_cost += cpu_tuple_cost * matplan->plan_rows; - plan = matplan; break; #endif default: diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index cbf7618..ca2e2a2 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -1325,9 +1325,15 @@ create_remotequery_path(PlannerInfo *root, RelOptInfo *rel) pathnode->parent = rel; pathnode->pathkeys = NIL; /* result is always unordered */ - // PGXCTODO - set cost properly + /* PGXCTODO - set cost properly */ cost_seqscan(pathnode, root, rel); + /* + * Insert a materialization plan above this temporarily + * until we better handle multiple steps using the same connection. + */ + pathnode = create_material_path(rel, pathnode); + return pathnode; } #endif ----------------------------------------------------------------------- Summary of changes: src/backend/optimizer/plan/createplan.c | 9 --------- src/backend/optimizer/util/pathnode.c | 8 +++++++- 2 files changed, 7 insertions(+), 10 deletions(-) hooks/post-receive -- Postgres-XC |
From: mason_s <ma...@us...> - 2010-10-04 20:52:39
|
Project "Postgres-XC". The branch, master has been updated via d044db4cc1b8cf18f14cfaa6c65d39ec14905dfb (commit) from e4978385ac1e81be3b95fe51656a0a166cfc22fb (commit) - Log ----------------------------------------------------------------- commit d044db4cc1b8cf18f14cfaa6c65d39ec14905dfb Author: Mason Sharp <ma...@us...> Date: Sat Oct 2 19:21:57 2010 +0900 Fix a bug with EXPLAIN and EXPLAIN VERBOSE. If it was a single-step statement, the output plan would incorrectly display a coordinator-based standard plan instead of the simple one. Bug and cause of problem discovered by Pavan Deolasee diff --git a/src/backend/pgxc/plan/planner.c b/src/backend/pgxc/plan/planner.c index a88179b..29e4ee0 100644 --- a/src/backend/pgxc/plan/planner.c +++ b/src/backend/pgxc/plan/planner.c @@ -2218,10 +2218,12 @@ pgxc_planner(Query *query, int cursorOptions, ParamListInfo boundParams) } /* - * If there already is an active portal, we may be doing planning within a function. - * Just use the standard plan + * If there already is an active portal, we may be doing planning + * within a function. Just use the standard plan, but check if + * it is part of an EXPLAIN statement so that we do not show that + * we plan multiple steps when it is a single-step operation. */ - if (ActivePortal) + if (ActivePortal && strcmp(ActivePortal->commandTag, "EXPLAIN")) return standard_planner(query, cursorOptions, boundParams); query_step->is_single_step = true; ----------------------------------------------------------------------- Summary of changes: src/backend/pgxc/plan/planner.c | 8 +++++--- 1 files changed, 5 insertions(+), 3 deletions(-) hooks/post-receive -- Postgres-XC |
From: mason_s <ma...@us...> - 2010-09-27 05:18:13
|
Project "Postgres-XC". The branch, master has been updated via e4978385ac1e81be3b95fe51656a0a166cfc22fb (commit) from c3e87d496dbf75651197f03b36d1cf0ba4ea7f0c (commit) - Log ----------------------------------------------------------------- commit e4978385ac1e81be3b95fe51656a0a166cfc22fb Author: Mason Sharp <ma...@us...> Date: Mon Sep 27 14:10:48 2010 +0900 Handle stored functions in queries. If the stored function is IMMUTABLE and appears in a query, it can be safely executed on the data nodes and is pushed down. Otherwise, the stored function must be executed on the coordinator. Note that stored functions cannot yet contain queries that use passed in parameters until we add support for prepared statements with parameters (planned to be done within the next few months). diff --git a/src/backend/pgxc/plan/planner.c b/src/backend/pgxc/plan/planner.c index a7bc0ab..a88179b 100644 --- a/src/backend/pgxc/plan/planner.c +++ b/src/backend/pgxc/plan/planner.c @@ -33,6 +33,7 @@ #include "parser/parse_coerce.h" #include "pgxc/locator.h" #include "pgxc/planner.h" +#include "tcop/pquery.h" #include "utils/acl.h" #include "utils/builtins.h" #include "utils/fmgroids.h" @@ -139,7 +140,6 @@ bool StrictStatementChecking = true; /* Forbid multi-node SELECT statements with an ORDER BY clause */ bool StrictSelectChecking = false; - static ExecNodes *get_plan_nodes(Query *query, bool isRead); static bool get_plan_nodes_walker(Node *query_node, XCWalkerContext *context); static bool examine_conditions_walker(Node *expr_node, XCWalkerContext *context); @@ -507,8 +507,9 @@ get_plan_nodes_insert(Query *query) * Get list of parent-child joins (partitioned together) * Get list of joins with replicated tables * - * If we encounter a cross-node join, we stop processing and return false, - * otherwise true. + * If we encounter an expression such as a cross-node join that cannot + * be easily handled in a single step, we stop processing and return true, + * otherwise false. * */ static bool @@ -780,6 +781,13 @@ examine_conditions_walker(Node *expr_node, XCWalkerContext *context) } } + /* See if the function is immutable, otherwise give up */ + if (IsA(expr_node, FuncExpr)) + { + if (!is_immutable_func(((FuncExpr*) expr_node)->funcid)) + return true; + } + /* Handle subquery */ if (IsA(expr_node, SubLink)) { @@ -2088,12 +2096,11 @@ pgxc_planner(Query *query, int cursorOptions, ParamListInfo boundParams) result->canSetTag = query->canSetTag; result->utilityStmt = query->utilityStmt; result->intoClause = query->intoClause; - result->rtable = query->rtable; query_step = makeNode(RemoteQuery); - query_step->is_single_step = false; + /* * Declare Cursor case: * We should leave as a step query only SELECT statement @@ -2210,6 +2217,13 @@ pgxc_planner(Query *query, int cursorOptions, ParamListInfo boundParams) return result; } + /* + * If there already is an active portal, we may be doing planning within a function. + * Just use the standard plan + */ + if (ActivePortal) + return standard_planner(query, cursorOptions, boundParams); + query_step->is_single_step = true; /* * PGXCTODO diff --git a/src/backend/pgxc/pool/postgresql_fdw.c b/src/backend/pgxc/pool/postgresql_fdw.c index 9e418be..dabf5da 100644 --- a/src/backend/pgxc/pool/postgresql_fdw.c +++ b/src/backend/pgxc/pool/postgresql_fdw.c @@ -44,7 +44,7 @@ /* deparse SQL from the request */ -static bool is_immutable_func(Oid funcid); +bool is_immutable_func(Oid funcid); static bool is_foreign_qual(ExprState *state); static bool foreign_qual_walker(Node *node, void *context); char *deparseSql(RemoteQueryState *scanstate); @@ -53,7 +53,7 @@ char *deparseSql(RemoteQueryState *scanstate); /* * Check whether the function is IMMUTABLE. */ -static bool +bool is_immutable_func(Oid funcid) { HeapTuple tp; diff --git a/src/include/pgxc/planner.h b/src/include/pgxc/planner.h index 548e4cd..d2bac5a 100644 --- a/src/include/pgxc/planner.h +++ b/src/include/pgxc/planner.h @@ -148,4 +148,5 @@ extern PlannedStmt *pgxc_planner(Query *query, int cursorOptions, ParamListInfo boundParams); extern bool IsHashDistributable(Oid col_type); +extern bool is_immutable_func(Oid funcid); #endif /* PGXCPLANNER_H */ ----------------------------------------------------------------------- Summary of changes: src/backend/pgxc/plan/planner.c | 24 +++++++++++++++++++----- src/backend/pgxc/pool/postgresql_fdw.c | 4 ++-- src/include/pgxc/planner.h | 1 + 3 files changed, 22 insertions(+), 7 deletions(-) hooks/post-receive -- Postgres-XC |
From: mason_s <ma...@us...> - 2010-09-26 07:51:36
|
Project "Postgres-XC". The branch, master has been updated via c3e87d496dbf75651197f03b36d1cf0ba4ea7f0c (commit) from ac66a8c598dfc601e64df04dba73dc6d99f78272 (commit) - Log ----------------------------------------------------------------- commit c3e87d496dbf75651197f03b36d1cf0ba4ea7f0c Author: Mason Sharp <ma...@us...> Date: Sun Sep 26 16:47:36 2010 +0900 Initial support for cursors (DECLARE, FETCH). This initial version implements support by creating them on the Coordinator only; they are not created on the data nodes. Not yet supported is UPDATE / DELETE WHERE CURRENT OF, but basic read-only cursor capability works, including SCROLL cursors. Written by Andrei Martsinchyk diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c index eab1bd0..63031a7 100644 --- a/src/backend/access/common/heaptuple.c +++ b/src/backend/access/common/heaptuple.c @@ -1194,8 +1194,17 @@ slot_deform_datarow(TupleTableSlot *slot) errmsg("Tuple does not match the descriptor"))); if (slot->tts_attinmeta == NULL) + { + /* + * Ensure info about input functions is available as long as slot lives + */ + MemoryContext oldcontext = MemoryContextSwitchTo(slot->tts_mcxt); + slot->tts_attinmeta = TupleDescGetAttInMetadata(slot->tts_tupleDescriptor); + MemoryContextSwitchTo(oldcontext); + } + buffer = makeStringInfo(); for (i = 0; i < attnum; i++) { diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 657413a..772a6f7 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -847,7 +847,7 @@ DoCopy(const CopyStmt *stmt, const char *queryString) int num_phys_attrs; uint64 processed; #ifdef PGXC - Exec_Nodes *exec_nodes = NULL; + ExecNodes *exec_nodes = NULL; #endif /* Allocate workspace and zero all fields */ @@ -1138,7 +1138,7 @@ DoCopy(const CopyStmt *stmt, const char *queryString) { char *hash_att; - exec_nodes = (Exec_Nodes *) palloc0(sizeof(Exec_Nodes)); + exec_nodes = makeNode(ExecNodes); /* * If target table does not exists on nodes (e.g. system table) diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index 151fe33..c58e2a0 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -25,6 +25,10 @@ #include "nodes/plannodes.h" #include "nodes/relation.h" +#ifdef PGXC +#include "pgxc/locator.h" +#include "pgxc/planner.h" +#endif #include "utils/datum.h" @@ -809,6 +813,124 @@ _copyPlanInvalItem(PlanInvalItem *from) return newnode; } +#ifdef PGXC +/* + * _copyRemoteQuery + */ +static RemoteQuery * +_copyRemoteQuery(RemoteQuery *from) +{ + RemoteQuery *newnode = makeNode(RemoteQuery); + + /* + * copy node superclass fields + */ + CopyScanFields((Scan *) from, (Scan *) newnode); + + /* + * copy remainder of node + */ + COPY_SCALAR_FIELD(is_single_step); + COPY_STRING_FIELD(sql_statement); + COPY_NODE_FIELD(exec_nodes); + COPY_SCALAR_FIELD(combine_type); + COPY_NODE_FIELD(simple_aggregates); + COPY_NODE_FIELD(sort); + COPY_NODE_FIELD(distinct); + COPY_SCALAR_FIELD(read_only); + COPY_SCALAR_FIELD(force_autocommit); + + return newnode; +} + +/* + * _copyExecNodes + */ +static ExecNodes * +_copyExecNodes(ExecNodes *from) +{ + ExecNodes *newnode = makeNode(ExecNodes); + + COPY_NODE_FIELD(primarynodelist); + COPY_NODE_FIELD(nodelist); + COPY_SCALAR_FIELD(baselocatortype); + COPY_SCALAR_FIELD(tableusagetype); + + return newnode; +} + +/* + * _copySimpleAgg + */ +static SimpleAgg * +_copySimpleAgg(SimpleAgg *from) +{ + SimpleAgg *newnode = makeNode(SimpleAgg); + + COPY_SCALAR_FIELD(column_pos); + COPY_NODE_FIELD(aggref); + COPY_SCALAR_FIELD(transfn_oid); + COPY_SCALAR_FIELD(finalfn_oid); + COPY_SCALAR_FIELD(arginputfn); + COPY_SCALAR_FIELD(argioparam); + COPY_SCALAR_FIELD(resoutputfn); + COPY_SCALAR_FIELD(transfn); + COPY_SCALAR_FIELD(finalfn); + if (!from->initValueIsNull) + newnode->initValue = datumCopy(from->initValue, from->transtypeByVal, + from->transtypeLen); + COPY_SCALAR_FIELD(initValueIsNull); + COPY_SCALAR_FIELD(inputtypeLen); + COPY_SCALAR_FIELD(resulttypeLen); + COPY_SCALAR_FIELD(transtypeLen); + COPY_SCALAR_FIELD(inputtypeByVal); + COPY_SCALAR_FIELD(resulttypeByVal); + COPY_SCALAR_FIELD(transtypeByVal); + /* No need to copy runtime info, just init */ + newnode->collectValueNull = true; + initStringInfo(&newnode->valuebuf); + + return newnode; +} + +/* + * _copySimpleSort + */ +static SimpleSort * +_copySimpleSort(SimpleSort *from) +{ + SimpleSort *newnode = makeNode(SimpleSort); + + COPY_SCALAR_FIELD(numCols); + if (from->numCols > 0) + { + COPY_POINTER_FIELD(sortColIdx, from->numCols * sizeof(AttrNumber)); + COPY_POINTER_FIELD(sortOperators, from->numCols * sizeof(Oid)); + COPY_POINTER_FIELD(nullsFirst, from->numCols * sizeof(bool)); + } + + return newnode; +} + +/* + * _copySimpleDistinct + */ +static SimpleDistinct * +_copySimpleDistinct(SimpleDistinct *from) +{ + SimpleDistinct *newnode = makeNode(SimpleDistinct); + + COPY_SCALAR_FIELD(numCols); + if (from->numCols > 0) + { + COPY_POINTER_FIELD(uniqColIdx, from->numCols * sizeof(AttrNumber)); + COPY_POINTER_FIELD(eqOperators, from->numCols * sizeof(Oid)); + } + + return newnode; +} +#endif + /* **************************************************************** * primnodes.h copy functions * **************************************************************** @@ -3554,7 +3676,26 @@ copyObject(void *from) case T_PlanInvalItem: retval = _copyPlanInvalItem(from); break; - +#ifdef PGXC + /* + * PGXC SPECIFIC NODES + */ + case T_RemoteQuery: + retval = _copyRemoteQuery(from); + break; + case T_ExecNodes: + retval = _copyExecNodes(from); + break; + case T_SimpleAgg: + retval = _copySimpleAgg(from); + break; + case T_SimpleSort: + retval = _copySimpleSort(from); + break; + case T_SimpleDistinct: + retval = _copySimpleDistinct(from); + break; +#endif /* * PRIMITIVE NODES */ diff --git a/src/backend/pgxc/locator/locator.c b/src/backend/pgxc/locator/locator.c index 63c6359..debbc77 100644 --- a/src/backend/pgxc/locator/locator.c +++ b/src/backend/pgxc/locator/locator.c @@ -279,18 +279,18 @@ GetRoundRobinNode(Oid relid) * * The returned List is a copy, so it should be freed when finished. */ -Exec_Nodes * +ExecNodes * GetRelationNodes(RelationLocInfo *rel_loc_info, long *partValue, int isRead) { ListCell *prefItem; ListCell *stepItem; - Exec_Nodes *exec_nodes; + ExecNodes *exec_nodes; if (rel_loc_info == NULL) return NULL; - exec_nodes = (Exec_Nodes *) palloc0(sizeof(Exec_Nodes)); + exec_nodes = makeNode(ExecNodes); exec_nodes->baselocatortype = rel_loc_info->locatorType; switch (rel_loc_info->locatorType) diff --git a/src/backend/pgxc/plan/planner.c b/src/backend/pgxc/plan/planner.c index 7fedbfb..a7bc0ab 100644 --- a/src/backend/pgxc/plan/planner.c +++ b/src/backend/pgxc/plan/planner.c @@ -20,6 +20,7 @@ #include "catalog/pg_namespace.h" #include "catalog/pg_proc.h" #include "catalog/pg_type.h" +#include "executor/executor.h" #include "lib/stringinfo.h" #include "nodes/nodeFuncs.h" #include "nodes/nodes.h" @@ -120,7 +121,7 @@ typedef struct XCWalkerContext { Query *query; bool isRead; - Exec_Nodes *exec_nodes; /* resulting execution nodes */ + ExecNodes *exec_nodes; /* resulting execution nodes */ Special_Conditions *conditions; bool multilevel_join; List *rtables; /* a pointer to a list of rtables */ @@ -139,7 +140,7 @@ bool StrictStatementChecking = true; bool StrictSelectChecking = false; -static Exec_Nodes *get_plan_nodes(Query *query, bool isRead); +static ExecNodes *get_plan_nodes(Query *query, bool isRead); static bool get_plan_nodes_walker(Node *query_node, XCWalkerContext *context); static bool examine_conditions_walker(Node *expr_node, XCWalkerContext *context); static int handle_limit_offset(RemoteQuery *query_step, Query *query, PlannedStmt *plan_stmt); @@ -402,13 +403,13 @@ get_base_var(Var *var, XCWalkerContext *context) /* * get_plan_nodes_insert - determine nodes on which to execute insert. */ -static Exec_Nodes * +static ExecNodes * get_plan_nodes_insert(Query *query) { RangeTblEntry *rte; RelationLocInfo *rel_loc_info; Const *constant; - Exec_Nodes *exec_nodes; + ExecNodes *exec_nodes; ListCell *lc; long part_value; long *part_value_ptr = NULL; @@ -786,7 +787,7 @@ examine_conditions_walker(Node *expr_node, XCWalkerContext *context) bool is_multilevel; int save_parent_child_count = 0; SubLink *sublink = (SubLink *) expr_node; - Exec_Nodes *save_exec_nodes = context->exec_nodes; /* Save old exec_nodes */ + ExecNodes *save_exec_nodes = context->exec_nodes; /* Save old exec_nodes */ /* save parent-child count */ if (context->exec_nodes) @@ -940,9 +941,9 @@ get_plan_nodes_walker(Node *query_node, XCWalkerContext *context) ListCell *lc, *item; RelationLocInfo *rel_loc_info; - Exec_Nodes *test_exec_nodes = NULL; - Exec_Nodes *current_nodes = NULL; - Exec_Nodes *from_query_nodes = NULL; + ExecNodes *test_exec_nodes = NULL; + ExecNodes *current_nodes = NULL; + ExecNodes *from_query_nodes = NULL; TableUsageType table_usage_type = TABLE_USAGE_TYPE_NO_TABLE; TableUsageType current_usage_type = TABLE_USAGE_TYPE_NO_TABLE; int from_subquery_count = 0; @@ -972,7 +973,7 @@ get_plan_nodes_walker(Node *query_node, XCWalkerContext *context) if (contains_only_pg_catalog (query->rtable)) { /* just pg_catalog tables */ - context->exec_nodes = (Exec_Nodes *) palloc0(sizeof(Exec_Nodes)); + context->exec_nodes = makeNode(ExecNodes); context->exec_nodes->tableusagetype = TABLE_USAGE_TYPE_PGCATALOG; context->exec_on_coord = true; return false; @@ -991,7 +992,7 @@ get_plan_nodes_walker(Node *query_node, XCWalkerContext *context) if (rte->rtekind == RTE_SUBQUERY) { - Exec_Nodes *save_exec_nodes = context->exec_nodes; + ExecNodes *save_exec_nodes = context->exec_nodes; Special_Conditions *save_conditions = context->conditions; /* Save old conditions */ List *current_rtable = rte->subquery->rtable; @@ -1089,7 +1090,7 @@ get_plan_nodes_walker(Node *query_node, XCWalkerContext *context) /* If we are just dealing with pg_catalog, just return */ if (table_usage_type == TABLE_USAGE_TYPE_PGCATALOG) { - context->exec_nodes = (Exec_Nodes *) palloc0(sizeof(Exec_Nodes)); + context->exec_nodes = makeNode(ExecNodes); context->exec_nodes->tableusagetype = TABLE_USAGE_TYPE_PGCATALOG; context->exec_on_coord = true; return false; @@ -1255,10 +1256,10 @@ get_plan_nodes_walker(Node *query_node, XCWalkerContext *context) * Top level entry point before walking query to determine plan nodes * */ -static Exec_Nodes * +static ExecNodes * get_plan_nodes(Query *query, bool isRead) { - Exec_Nodes *result_nodes = NULL; + ExecNodes *result_nodes = NULL; XCWalkerContext context; @@ -1293,10 +1294,10 @@ get_plan_nodes(Query *query, bool isRead) * * return NULL if it is not safe to be done in a single step. */ -static Exec_Nodes * +static ExecNodes * get_plan_nodes_command(Query *query) { - Exec_Nodes *exec_nodes = NULL; + ExecNodes *exec_nodes = NULL; switch (query->commandType) { @@ -1384,7 +1385,7 @@ get_simple_aggregates(Query * query) *finalfnexpr; Datum textInitVal; - simple_agg = (SimpleAgg *) palloc0(sizeof(SimpleAgg)); + simple_agg = makeNode(SimpleAgg); simple_agg->column_pos = column_pos; initStringInfo(&simple_agg->valuebuf); simple_agg->aggref = aggref; @@ -1759,7 +1760,7 @@ make_simple_sort_from_sortclauses(Query *query, RemoteQuery *step) nullsFirst = (bool *) palloc(numsortkeys * sizeof(bool)); numsortkeys = 0; - sort = (SimpleSort *) palloc(sizeof(SimpleSort)); + sort = makeNode(SimpleSort); if (sortcls) { @@ -1908,7 +1909,7 @@ make_simple_sort_from_sortclauses(Query *query, RemoteQuery *step) * extra_distincts list */ - distinct = (SimpleDistinct *) palloc(sizeof(SimpleDistinct)); + distinct = makeNode(SimpleDistinct); /* * We will need at most list_length(distinctcls) sort columns @@ -2093,12 +2094,50 @@ pgxc_planner(Query *query, int cursorOptions, ParamListInfo boundParams) query_step = makeNode(RemoteQuery); query_step->is_single_step = false; - query_step->sql_statement = pstrdup(query->sql_statement); + /* + * Declare Cursor case: + * We should leave as a step query only SELECT statement + * Further if we need refer source statement for planning we should take + * the truncated string + */ + if (query->utilityStmt && + IsA(query->utilityStmt, DeclareCursorStmt)) + { + + char *src = query->sql_statement; + char str[strlen(src) + 1]; /* mutable copy */ + char *dst = str; + + cursorOptions |= ((DeclareCursorStmt *) query->utilityStmt)->options; + + /* + * Initialize mutable copy, converting letters to uppercase and + * various witespace characters to spaces + */ + while (*src) + { + if (isspace(*src)) + { + src++; + *dst++ = ' '; + } + else + *dst++ = toupper(*src++); + } + *dst = '\0'; + /* search for SELECT keyword in the normalized string */ + dst = strstr(str, " SELECT "); + /* Take substring of the original string using found offset */ + query_step->sql_statement = pstrdup(query->sql_statement + (dst - str + 1)); + } + else + query_step->sql_statement = pstrdup(query->sql_statement); + query_step->exec_nodes = NULL; query_step->combine_type = COMBINE_TYPE_NONE; query_step->simple_aggregates = NULL; /* Optimize multi-node handling */ - query_step->read_only = query->nodeTag == T_SelectStmt; + query_step->read_only = query->commandType == CMD_SELECT; query_step->force_autocommit = false; result->planTree = (Plan *) query_step; @@ -2108,20 +2147,20 @@ pgxc_planner(Query *query, int cursorOptions, ParamListInfo boundParams) * level, Data Nodes, or both. By default we choose both. We should be * able to quickly expand this for more commands. */ - switch (query->nodeTag) + switch (query->commandType) { - case T_SelectStmt: + case CMD_SELECT: /* Perform some checks to make sure we can support the statement */ if (query->intoClause) ereport(ERROR, (errcode(ERRCODE_STATEMENT_TOO_COMPLEX), (errmsg("INTO clause not yet supported")))); /* fallthru */ - case T_InsertStmt: - case T_UpdateStmt: - case T_DeleteStmt: + case CMD_INSERT: + case CMD_UPDATE: + case CMD_DELETE: /* Set result relations */ - if (query->nodeTag != T_SelectStmt) + if (query->commandType != CMD_SELECT) result->resultRelations = list_make1_int(query->resultRelation); query_step->exec_nodes = get_plan_nodes_command(query); @@ -2129,7 +2168,7 @@ pgxc_planner(Query *query, int cursorOptions, ParamListInfo boundParams) if (query_step->exec_nodes == NULL) { /* Do not yet allow multi-node correlated UPDATE or DELETE */ - if ((query->nodeTag == T_UpdateStmt || query->nodeTag == T_DeleteStmt)) + if (query->commandType == CMD_UPDATE || query->commandType == CMD_DELETE) { ereport(ERROR, (errcode(ERRCODE_STATEMENT_TOO_COMPLEX), @@ -2144,15 +2183,16 @@ pgxc_planner(Query *query, int cursorOptions, ParamListInfo boundParams) return result; } - if ((query->nodeTag == T_UpdateStmt || query->nodeTag == T_DeleteStmt) + /* Do not yet allow multi-node correlated UPDATE or DELETE */ + if ((query->commandType == CMD_UPDATE || query->commandType == CMD_DELETE) && !query_step->exec_nodes && list_length(query->rtable) > 1) { - result = standard_planner(query, cursorOptions, boundParams); - return result; + result = standard_planner(query, cursorOptions, boundParams); + return result; } - /* + /* * Use standard plan if we have more than one data node with either * group by, hasWindowFuncs, or hasRecursive */ @@ -2161,13 +2201,13 @@ pgxc_planner(Query *query, int cursorOptions, ParamListInfo boundParams) * group by expression is the partitioning column, in which * case it is ok to treat as a single step. */ - if (query->nodeTag == T_SelectStmt + if (query->commandType == CMD_SELECT && query_step->exec_nodes && list_length(query_step->exec_nodes->nodelist) > 1 && (query->groupClause || query->hasWindowFuncs || query->hasRecursive)) { - result = standard_planner(query, cursorOptions, boundParams); - return result; + result = standard_planner(query, cursorOptions, boundParams); + return result; } query_step->is_single_step = true; @@ -2191,9 +2231,9 @@ pgxc_planner(Query *query, int cursorOptions, ParamListInfo boundParams) query, query_step->exec_nodes->baselocatortype); /* Set up simple aggregates */ - /* PGXCTODO - we should detect what types of aggregates are used. + /* PGXCTODO - we should detect what types of aggregates are used. * in some cases we can avoid the final step and merely proxy results - * (when there is only one data node involved) instead of using + * (when there is only one data node involved) instead of using * coordinator consolidation. At the moment this is needed for AVG() */ query_step->simple_aggregates = get_simple_aggregates(query); @@ -2224,6 +2264,16 @@ pgxc_planner(Query *query, int cursorOptions, ParamListInfo boundParams) result->planTree = standardPlan; } + /* + * If creating a plan for a scrollable cursor, make sure it can run + * backwards on demand. Add a Material node at the top at need. + */ + if (cursorOptions & CURSOR_OPT_SCROLL) + { + if (!ExecSupportsBackwardScan(result->planTree)) + result->planTree = materialize_finished_plan(result->planTree); + } + return result; } diff --git a/src/backend/pgxc/pool/execRemote.c b/src/backend/pgxc/pool/execRemote.c index e7ac601..16d2f6b 100644 --- a/src/backend/pgxc/pool/execRemote.c +++ b/src/backend/pgxc/pool/execRemote.c @@ -1990,7 +1990,7 @@ DataNodeCopyBegin(const char *query, List *nodelist, Snapshot snapshot, bool is_ * Send a data row to the specified nodes */ int -DataNodeCopyIn(char *data_row, int len, Exec_Nodes *exec_nodes, DataNodeHandle** copy_connections) +DataNodeCopyIn(char *data_row, int len, ExecNodes *exec_nodes, DataNodeHandle** copy_connections) { DataNodeHandle *primary_handle = NULL; ListCell *nodeitem; @@ -2143,7 +2143,7 @@ DataNodeCopyIn(char *data_row, int len, Exec_Nodes *exec_nodes, DataNodeHandle** } uint64 -DataNodeCopyOut(Exec_Nodes *exec_nodes, DataNodeHandle** copy_connections, FILE* copy_file) +DataNodeCopyOut(ExecNodes *exec_nodes, DataNodeHandle** copy_connections, FILE* copy_file) { RemoteQueryState *combiner; int conn_count = list_length(exec_nodes->nodelist) == 0 ? NumDataNodes : list_length(exec_nodes->nodelist); @@ -2436,7 +2436,7 @@ copy_slot(RemoteQueryState *node, TupleTableSlot *src, TupleTableSlot *dst) } static void -get_exec_connections(Exec_Nodes *exec_nodes, +get_exec_connections(ExecNodes *exec_nodes, int *regular_conn_count, int *total_conn_count, DataNodeHandle ***connections, diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 84c70c6..528e4e1 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -658,7 +658,6 @@ pg_analyze_and_rewrite(Node *parsetree, const char *query_string, { Query *query = (Query *) lfirst(lc); query->sql_statement = pstrdup(query_string); - query->nodeTag = nodeTag(parsetree); } } #endif @@ -1318,7 +1317,6 @@ exec_parse_message(const char *query_string, /* string to execute */ { Query *query = (Query *) lfirst(lc); query->sql_statement = pstrdup(query_string); - query->nodeTag = nodeTag(raw_parse_tree); } } #endif diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c index 91456c4..db11abe 100644 --- a/src/backend/tcop/utility.c +++ b/src/backend/tcop/utility.c @@ -62,7 +62,7 @@ #include "pgxc/pgxc.h" #include "pgxc/planner.h" -static void ExecUtilityStmtOnNodes(const char *queryString, Exec_Nodes *nodes, +static void ExecUtilityStmtOnNodes(const char *queryString, ExecNodes *nodes, bool force_autocommit); #endif @@ -1367,7 +1367,7 @@ ProcessUtility(Node *parsetree, #ifdef PGXC static void -ExecUtilityStmtOnNodes(const char *queryString, Exec_Nodes *nodes, +ExecUtilityStmtOnNodes(const char *queryString, ExecNodes *nodes, bool force_autocommit) { RemoteQuery *step = makeNode(RemoteQuery); diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h index a466239..8bb49c6 100644 --- a/src/include/nodes/nodes.h +++ b/src/include/nodes/nodes.h @@ -73,6 +73,13 @@ typedef enum NodeTag T_SetOp, T_Limit, #ifdef PGXC + /* + * TAGS FOR PGXC NODES (planner.h, locator.h) + */ + T_ExecNodes, + T_SimpleAgg, + T_SimpleSort, + T_SimpleDistinct, T_RemoteQuery, #endif /* this one isn't a subclass of Plan: */ diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index a367a6b..5fb2a2b 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -149,7 +149,6 @@ typedef struct Query #ifdef PGXC /* need this info for PGXC Planner, may be temporary */ char *sql_statement; /* original query */ - NodeTag nodeTag; /* node tag of top node of parse tree */ #endif } Query; diff --git a/src/include/pgxc/execRemote.h b/src/include/pgxc/execRemote.h index e2aebef..5ba8fff 100644 --- a/src/include/pgxc/execRemote.h +++ b/src/include/pgxc/execRemote.h @@ -85,8 +85,8 @@ extern void DataNodeRollbackPrepared(char *gid); extern void DataNodeCommitPrepared(char *gid); extern DataNodeHandle** DataNodeCopyBegin(const char *query, List *nodelist, Snapshot snapshot, bool is_from); -extern int DataNodeCopyIn(char *data_row, int len, Exec_Nodes *exec_nodes, DataNodeHandle** copy_connections); -extern uint64 DataNodeCopyOut(Exec_Nodes *exec_nodes, DataNodeHandle** copy_connections, FILE* copy_file); +extern int DataNodeCopyIn(char *data_row, int len, ExecNodes *exec_nodes, DataNodeHandle** copy_connections); +extern uint64 DataNodeCopyOut(ExecNodes *exec_nodes, DataNodeHandle** copy_connections, FILE* copy_file); extern void DataNodeCopyFinish(DataNodeHandle** copy_connections, int primary_data_node, CombineType combine_type); extern int ExecCountSlotsRemoteQuery(RemoteQuery *node); diff --git a/src/include/pgxc/locator.h b/src/include/pgxc/locator.h index 7ae0474..233bf26 100644 --- a/src/include/pgxc/locator.h +++ b/src/include/pgxc/locator.h @@ -61,11 +61,12 @@ typedef enum */ typedef struct { + NodeTag type; List *primarynodelist; List *nodelist; char baselocatortype; TableUsageType tableusagetype; /* track pg_catalog usage */ -} Exec_Nodes; +} ExecNodes; extern char *PreferredDataNodes; @@ -77,7 +78,7 @@ extern char ConvertToLocatorType(int disttype); extern char *GetRelationHashColumn(RelationLocInfo *rel_loc_info); extern RelationLocInfo *GetRelationLocInfo(Oid relid); extern RelationLocInfo *CopyRelationLocInfo(RelationLocInfo *src_info); -extern Exec_Nodes *GetRelationNodes(RelationLocInfo *rel_loc_info, long *partValue, +extern ExecNodes *GetRelationNodes(RelationLocInfo *rel_loc_info, long *partValue, int isRead); extern bool IsHashColumn(RelationLocInfo *rel_loc_info, char *part_col_name); extern bool IsHashColumnForRelId(Oid relid, char *part_col_name); diff --git a/src/include/pgxc/planner.h b/src/include/pgxc/planner.h index 346dd65..548e4cd 100644 --- a/src/include/pgxc/planner.h +++ b/src/include/pgxc/planner.h @@ -38,6 +38,7 @@ typedef enum */ typedef struct { + NodeTag type; int numCols; /* number of sort-key columns */ AttrNumber *sortColIdx; /* their indexes in the target list */ Oid *sortOperators; /* OIDs of operators to sort them by */ @@ -47,6 +48,7 @@ typedef struct /* For returning distinct results from the RemoteQuery*/ typedef struct { + NodeTag type; int numCols; /* number of sort-key columns */ AttrNumber *uniqColIdx; /* their indexes in the target list */ Oid *eqOperators; /* OIDs of operators to equate them by */ @@ -61,7 +63,7 @@ typedef struct Scan scan; bool is_single_step; /* special case, skip extra work */ char *sql_statement; - Exec_Nodes *exec_nodes; + ExecNodes *exec_nodes; CombineType combine_type; List *simple_aggregates; /* simple aggregate to combine on this step */ SimpleSort *sort; @@ -87,6 +89,7 @@ typedef enum /* For handling simple aggregates */ typedef struct { + NodeTag type; int column_pos; /* Only use 1 for now */ Aggref *aggref; Oid transfn_oid; ----------------------------------------------------------------------- Summary of changes: src/backend/access/common/heaptuple.c | 9 ++ src/backend/commands/copy.c | 4 +- src/backend/nodes/copyfuncs.c | 143 ++++++++++++++++++++++++++++++++- src/backend/pgxc/locator/locator.c | 6 +- src/backend/pgxc/plan/planner.c | 122 ++++++++++++++++++++-------- src/backend/pgxc/pool/execRemote.c | 6 +- src/backend/tcop/postgres.c | 2 - src/backend/tcop/utility.c | 4 +- src/include/nodes/nodes.h | 7 ++ src/include/nodes/parsenodes.h | 1 - src/include/pgxc/execRemote.h | 4 +- src/include/pgxc/locator.h | 5 +- src/include/pgxc/planner.h | 5 +- 13 files changed, 263 insertions(+), 55 deletions(-) hooks/post-receive -- Postgres-XC |
From: mason_s <ma...@us...> - 2010-09-22 00:45:00
|
Project "Postgres-XC". The branch, master has been updated via ac66a8c598dfc601e64df04dba73dc6d99f78272 (commit) from ba79eded1dfbfabc51b3de4931b638853d13a30d (commit) - Log ----------------------------------------------------------------- commit ac66a8c598dfc601e64df04dba73dc6d99f78272 Author: Mason Sharp <ma...@us...> Date: Tue Sep 21 20:41:41 2010 -0400 Address performance issues that were introduced in the last couple of months. We avoid sending down BEGIN to the data nodes for SELECT if it is not needed. We avoid going through the standard PostgreSQL planner on the coordinator if unnecessary, for simple single-step statements. Remove extra limit node that appeared in the plan, though it did no limiting. diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index 847b556..519ea4f 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -1009,6 +1009,9 @@ InitPlan(QueryDesc *queryDesc, int eflags) } else { +#ifdef PGXC + if (!IS_PGXC_COORDINATOR) +#endif if (operation == CMD_INSERT) ExecCheckPlanOutput(estate->es_result_relation_info->ri_RelationDesc, planstate->plan->targetlist); diff --git a/src/backend/pgxc/plan/planner.c b/src/backend/pgxc/plan/planner.c index e18e813..7fedbfb 100644 --- a/src/backend/pgxc/plan/planner.c +++ b/src/backend/pgxc/plan/planner.c @@ -1979,6 +1979,9 @@ handle_limit_offset(RemoteQuery *query_step, Query *query, PlannedStmt *plan_stm { /* check if no special handling needed */ + if (!query->limitCount && !query->limitOffset) + return 0; + if (query_step && query_step->exec_nodes && list_length(query_step->exec_nodes->nodelist) <= 1) return 0; @@ -2071,16 +2074,23 @@ handle_limit_offset(RemoteQuery *query_step, Query *query, PlannedStmt *plan_stm PlannedStmt * pgxc_planner(Query *query, int cursorOptions, ParamListInfo boundParams) { - /* - * We waste some time invoking standard planner, but getting good enough - * PlannedStmt, we just need to replace standard plan. - * In future we may want to skip the standard_planner invocation and - * initialize the PlannedStmt here. At the moment not all queries works: - * ex. there was a problem with INSERT into a subset of table columns - */ - PlannedStmt *result = standard_planner(query, cursorOptions, boundParams); - Plan *standardPlan = result->planTree; - RemoteQuery *query_step = makeNode(RemoteQuery); + PlannedStmt *result; + Plan *standardPlan; + RemoteQuery *query_step; + + + /* build the PlannedStmt result */ + result = makeNode(PlannedStmt); + + /* Try and set what we can */ + result->commandType = query->commandType; + result->canSetTag = query->canSetTag; + result->utilityStmt = query->utilityStmt; + result->intoClause = query->intoClause; + + result->rtable = query->rtable; + + query_step = makeNode(RemoteQuery); query_step->is_single_step = false; query_step->sql_statement = pstrdup(query->sql_statement); @@ -2110,6 +2120,10 @@ pgxc_planner(Query *query, int cursorOptions, ParamListInfo boundParams) case T_InsertStmt: case T_UpdateStmt: case T_DeleteStmt: + /* Set result relations */ + if (query->nodeTag != T_SelectStmt) + result->resultRelations = list_make1_int(query->resultRelation); + query_step->exec_nodes = get_plan_nodes_command(query); if (query_step->exec_nodes == NULL) @@ -2124,18 +2138,35 @@ pgxc_planner(Query *query, int cursorOptions, ParamListInfo boundParams) /* * Processing guery against catalog tables, or multi-step command. - * Restore standard plan + * Run through standard planner */ - result->planTree = standardPlan; + result = standard_planner(query, cursorOptions, boundParams); return result; } - /* Do not yet allow multi-node correlated UPDATE or DELETE */ if ((query->nodeTag == T_UpdateStmt || query->nodeTag == T_DeleteStmt) && !query_step->exec_nodes && list_length(query->rtable) > 1) { - result->planTree = standardPlan; + result = standard_planner(query, cursorOptions, boundParams); + return result; + } + + /* + * Use standard plan if we have more than one data node with either + * group by, hasWindowFuncs, or hasRecursive + */ + /* + * PGXCTODO - this could be improved to check if the first + * group by expression is the partitioning column, in which + * case it is ok to treat as a single step. + */ + if (query->nodeTag == T_SelectStmt + && query_step->exec_nodes + && list_length(query_step->exec_nodes->nodelist) > 1 + && (query->groupClause || query->hasWindowFuncs || query->hasRecursive)) + { + result = standard_planner(query, cursorOptions, boundParams); return result; } @@ -2153,7 +2184,7 @@ pgxc_planner(Query *query, int cursorOptions, ParamListInfo boundParams) * then call standard planner and take targetList from the plan * generated by Postgres. */ - query_step->scan.plan.targetlist = standardPlan->targetlist; + query_step->scan.plan.targetlist = query->targetList; if (query_step->exec_nodes) query_step->combine_type = get_plan_combine_type( @@ -2174,32 +2205,15 @@ pgxc_planner(Query *query, int cursorOptions, ParamListInfo boundParams) (query->sortClause || query->distinctClause)) make_simple_sort_from_sortclauses(query, query_step); - /* Handle LIMIT and OFFSET for single-step queries on multiple nodes*/ + /* Handle LIMIT and OFFSET for single-step queries on multiple nodes */ if (handle_limit_offset(query_step, query, result)) { /* complicated expressions, just fallback to standard plan */ - result->planTree = standardPlan; + result = standard_planner(query, cursorOptions, boundParams); return result; } - - /* - * Use standard plan if we have more than one data node with either - * group by, hasWindowFuncs, or hasRecursive - */ - /* - * PGXCTODO - this could be improved to check if the first - * group by expression is the partitioning column, in which - * case it is ok to treat as a single step. - */ - if (query->nodeTag == T_SelectStmt - && query_step->exec_nodes - && list_length(query_step->exec_nodes->nodelist) > 1 - && (query->groupClause || query->hasWindowFuncs || query->hasRecursive)) - { - result->planTree = standardPlan; - return result; - } break; + default: /* Allow for override */ if (StrictStatementChecking) diff --git a/src/backend/pgxc/pool/execRemote.c b/src/backend/pgxc/pool/execRemote.c index e7ef66e..e7ac601 100644 --- a/src/backend/pgxc/pool/execRemote.c +++ b/src/backend/pgxc/pool/execRemote.c @@ -998,8 +998,8 @@ handle_response(DataNodeHandle * conn, RemoteQueryState *combiner) if (conn->state == DN_CONNECTION_STATE_QUERY) return RESPONSE_EOF; - /* - * If we are in the process of shutting down, we + /* + * If we are in the process of shutting down, we * may be rolling back, and the buffer may contain other messages. * We want to avoid a procarray exception * as well as an error stack overflow. @@ -1745,7 +1745,7 @@ finish: /* * Rollback current transaction - * This will happen + * This will happen */ int DataNodeRollback(void) @@ -2577,7 +2577,7 @@ ExecRemoteQuery(RemoteQueryState *node) if (force_autocommit) need_tran = false; else - need_tran = !autocommit || total_conn_count > 1; + need_tran = !autocommit || !is_read_only && total_conn_count > 1; elog(DEBUG1, "autocommit = %s, has primary = %s, regular_conn_count = %d, need_tran = %s", autocommit ? "true" : "false", primaryconnection ? "true" : "false", regular_conn_count, need_tran ? "true" : "false"); @@ -3143,7 +3143,7 @@ DataNodeConsumeMessages(void) pfree(connections); } - + /* ---------------------------------------------------------------- * ExecRemoteQueryReScan * ----------------------------------------------------------------------- Summary of changes: src/backend/executor/execMain.c | 3 + src/backend/pgxc/plan/planner.c | 84 +++++++++++++++++++++--------------- src/backend/pgxc/pool/execRemote.c | 10 ++-- 3 files changed, 57 insertions(+), 40 deletions(-) hooks/post-receive -- Postgres-XC |
From: Michael P. <mic...@us...> - 2010-09-14 23:36:21
|
Project "Postgres-XC". The branch, master has been updated via ba79eded1dfbfabc51b3de4931b638853d13a30d (commit) from 19a8fa536779653524a1feb862c18277efa317f4 (commit) - Log ----------------------------------------------------------------- commit ba79eded1dfbfabc51b3de4931b638853d13a30d Author: Michael P <mic...@us...> Date: Wed Sep 15 08:26:30 2010 +0900 Implementation of 2PC from applications Support for PREPARE TRANSACTION 'tid', ROLLBACK PREPARED 'tid' and COMMIT PREPARED 'tid'. When a Transaction is prepared on a Coordinator, the list of involved Datanodes is saved in GTM and transaction is put in PREPARE state. The transaction ID 'tid' is also saved on GTM. COMMIT PREPARED or ROLLBACK PREPARED can be issued from a different Coordinator by using the same tid. The Coordinator receiving the Commit SQL gets a list of Datanodes from GTM, and commits the transaction on the right nodes. This patch adds a new interface on GTM to save also the list of Coordinators involved in a PREPARE transaction. Coordinator<->Coordinator connection protocol is not implemented yet, so for the moment Coordinator do not create a 2PC file at PREPARE. This feature will be added with the implementation of DDL synchronization among Coordinators. diff --git a/src/backend/access/transam/gtm.c b/src/backend/access/transam/gtm.c index c7f3547..08ed2c9 100644 --- a/src/backend/access/transam/gtm.c +++ b/src/backend/access/transam/gtm.c @@ -122,7 +122,8 @@ CommitTranGTM(GlobalTransactionId gxid) CheckConnection(); ret = commit_transaction(conn, gxid); - /* If something went wrong (timeout), try and reset GTM connection. + /* + * If something went wrong (timeout), try and reset GTM connection. * We will close the transaction locally anyway, and closing GTM will force * it to be closed on GTM. */ @@ -134,6 +135,34 @@ CommitTranGTM(GlobalTransactionId gxid) return ret; } +/* + * For a prepared transaction, commit the gxid used for PREPARE TRANSACTION + * and for COMMIT PREPARED. + */ +int +CommitPreparedTranGTM(GlobalTransactionId gxid, GlobalTransactionId prepared_gxid) +{ + int ret = 0; + + if (!GlobalTransactionIdIsValid(gxid) || !GlobalTransactionIdIsValid(prepared_gxid)) + return ret; + CheckConnection(); + ret = commit_prepared_transaction(conn, gxid, prepared_gxid); + + /* + * If something went wrong (timeout), try and reset GTM connection. + * We will close the transaction locally anyway, and closing GTM will force + * it to be closed on GTM. + */ + + if (ret < 0) + { + CloseGTM(); + InitGTM(); + } + return ret; +} + int RollbackTranGTM(GlobalTransactionId gxid) { @@ -144,7 +173,37 @@ RollbackTranGTM(GlobalTransactionId gxid) CheckConnection(); ret = abort_transaction(conn, gxid); - /* If something went wrong (timeout), try and reset GTM connection. + /* + * If something went wrong (timeout), try and reset GTM connection. + * We will abort the transaction locally anyway, and closing GTM will force + * it to end on GTM. + */ + if (ret < 0) + { + CloseGTM(); + InitGTM(); + } + return ret; +} + +int +BeingPreparedTranGTM(GlobalTransactionId gxid, + char *gid, + int datanodecnt, + PGXC_NodeId datanodes[], + int coordcnt, + PGXC_NodeId coordinators[]) +{ + int ret = 0; + + if (!GlobalTransactionIdIsValid(gxid)) + return 0; + CheckConnection(); + + ret = being_prepared_transaction(conn, gxid, gid, datanodecnt, datanodes, coordcnt, coordinators); + + /* + * If something went wrong (timeout), try and reset GTM connection. * We will abort the transaction locally anyway, and closing GTM will force * it to end on GTM. */ @@ -153,6 +212,61 @@ RollbackTranGTM(GlobalTransactionId gxid) CloseGTM(); InitGTM(); } + + return ret; +} + +int +PrepareTranGTM(GlobalTransactionId gxid) +{ + int ret; + + if (!GlobalTransactionIdIsValid(gxid)) + return 0; + CheckConnection(); + ret = prepare_transaction(conn, gxid); + + /* + * If something went wrong (timeout), try and reset GTM connection. + * We will close the transaction locally anyway, and closing GTM will force + * it to be closed on GTM. + */ + if (ret < 0) + { + CloseGTM(); + InitGTM(); + } + return ret; +} + + +int +GetGIDDataGTM(char *gid, + GlobalTransactionId *gxid, + GlobalTransactionId *prepared_gxid, + int *datanodecnt, + PGXC_NodeId **datanodes, + int *coordcnt, + PGXC_NodeId **coordinators) +{ + int ret = 0; + + CheckConnection(); + ret = get_gid_data(conn, GTM_ISOLATION_RC, gid, gxid, + prepared_gxid, datanodecnt, datanodes, + coordcnt, coordinators); + + /* + * If something went wrong (timeout), try and reset GTM connection. + * We will abort the transaction locally anyway, and closing GTM will force + * it to end on GTM. + */ + if (ret < 0) + { + CloseGTM(); + InitGTM(); + } + return ret; } diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index e8cf1bf..d881078 100644 --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -929,6 +929,11 @@ EndPrepare(GlobalTransaction gxact) * critical section, though, it doesn't matter since any failure causes * PANIC anyway. */ +#ifdef PGXC + /* Do not write 2PC state file on Coordinator side */ + if (IS_PGXC_DATANODE) + { +#endif TwoPhaseFilePath(path, xid); fd = BasicOpenFile(path, @@ -1001,6 +1006,9 @@ EndPrepare(GlobalTransaction gxact) * We save the PREPARE record's location in the gxact for later use by * CheckPointTwoPhase. */ +#ifdef PGXC + } +#endif START_CRIT_SECTION(); MyProc->inCommit = true; @@ -1011,6 +1019,12 @@ EndPrepare(GlobalTransaction gxact) /* If we crash now, we have prepared: WAL replay will fix things */ +#ifdef PGXC + /* Just write 2PC state file on Datanodes */ + if (IS_PGXC_DATANODE) + { +#endif + /* write correct CRC and close file */ if ((write(fd, &statefile_crc, sizeof(pg_crc32))) != sizeof(pg_crc32)) { @@ -1024,6 +1038,9 @@ EndPrepare(GlobalTransaction gxact) ereport(ERROR, (errcode_for_file_access(), errmsg("could not close two-phase state file: %m"))); +#ifdef PGXC + } +#endif /* * Mark the prepared transaction as valid. As soon as xact.c marks MyProc @@ -1875,3 +1892,16 @@ RecordTransactionAbortPrepared(TransactionId xid, END_CRIT_SECTION(); } + +#ifdef PGXC +/* + * Remove a gxact on a Coordinator, + * this is used to be able to prepare a commit transaction on another coordinator than the one + * who prepared the transaction + */ +void +RemoveGXactCoord(GlobalTransaction gxact) +{ + RemoveGXact(gxact); +} +#endif diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index 8a946cc..458068c 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -2133,6 +2133,17 @@ PrepareTransaction(void) PostPrepare_Locks(xid); +#ifdef PGXC + /* + * We want to be able to commit a prepared transaction from another coordinator, + * so clean up the gxact in shared memory also. + */ + if (IS_PGXC_COORDINATOR) + { + RemoveGXactCoord(gxact); + } +#endif + ResourceOwnerRelease(TopTransactionResourceOwner, RESOURCE_RELEASE_LOCKS, true, true); diff --git a/src/backend/pgxc/pool/datanode.c b/src/backend/pgxc/pool/datanode.c index 31b5bc0..2e8ec40 100644 --- a/src/backend/pgxc/pool/datanode.c +++ b/src/backend/pgxc/pool/datanode.c @@ -1105,6 +1105,25 @@ get_transaction_nodes(DataNodeHandle **connections) } /* + * Collect node numbers for the given Datanode connections + * and return it for prepared transactions + */ +PGXC_NodeId* +collect_datanode_numbers(int conn_count, DataNodeHandle **connections) +{ + PGXC_NodeId *datanodes = NULL; + int i; + datanodes = (PGXC_NodeId *) palloc(conn_count * sizeof(PGXC_NodeId)); + + for (i = 0; i < conn_count; i++) + { + datanodes[i] = connections[i]->nodenum; + } + + return datanodes; +} + +/* * Return those node connections that appear to be active and * have data to consume on them. */ diff --git a/src/backend/pgxc/pool/execRemote.c b/src/backend/pgxc/pool/execRemote.c index 05dbe2e..e7ef66e 100644 --- a/src/backend/pgxc/pool/execRemote.c +++ b/src/backend/pgxc/pool/execRemote.c @@ -52,6 +52,14 @@ static int data_node_begin(int conn_count, DataNodeHandle ** connections, GlobalTransactionId gxid); static int data_node_commit(int conn_count, DataNodeHandle ** connections); static int data_node_rollback(int conn_count, DataNodeHandle ** connections); +static int data_node_prepare(int conn_count, DataNodeHandle ** connections, + char *gid); +static int data_node_rollback_prepared(GlobalTransactionId gxid, GlobalTransactionId prepared_gxid, + int conn_count, DataNodeHandle ** connections, + char *gid); +static int data_node_commit_prepared(GlobalTransactionId gxid, GlobalTransactionId prepared_gxid, + int conn_count, DataNodeHandle ** connections, + char *gid); static void clear_write_node_list(); @@ -531,6 +539,7 @@ HandleCommandComplete(RemoteQueryState *combiner, char *msg_body, size_t len) else combiner->combine_type = COMBINE_TYPE_NONE; } + combiner->command_complete_count++; } @@ -793,6 +802,7 @@ validate_combiner(RemoteQueryState *combiner) /* Check if state is defined */ if (combiner->request_type == REQUEST_TYPE_NOT_DEFINED) return false; + /* Check all nodes completed */ if ((combiner->request_type == REQUEST_TYPE_COMMAND || combiner->request_type == REQUEST_TYPE_QUERY) @@ -1205,6 +1215,389 @@ DataNodeBegin(void) /* + * Prepare transaction on Datanodes involved in current transaction. + * GXID associated to current transaction has to be committed on GTM. + */ +int +DataNodePrepare(char *gid) +{ + int res = 0; + int tran_count; + DataNodeHandle *connections[NumDataNodes]; + + /* gather connections to prepare */ + tran_count = get_transaction_nodes(connections); + + /* + * If we do not have open transactions we have nothing to prepare just + * report success + */ + if (tran_count == 0) + { + elog(WARNING, "Nothing to PREPARE on Datanodes, gid is not used"); + goto finish; + } + + /* TODO: data_node_prepare */ + res = data_node_prepare(tran_count, connections, gid); + +finish: + /* + * The transaction is just prepared, but Datanodes have reset, + * so we'll need a new gxid for commit prepared or rollback prepared + * Application is responsible for delivering the correct gid. + * Release the connections for the moment. + */ + if (!autocommit) + stat_transaction(tran_count); + if (!PersistentConnections) + release_handles(false); + autocommit = true; + clear_write_node_list(); + return res; +} + + +/* + * Prepare transaction on dedicated nodes with gid received from application + */ +static int +data_node_prepare(int conn_count, DataNodeHandle ** connections, char *gid) +{ + int i; + int result = 0; + struct timeval *timeout = NULL; + char *buffer = (char *) palloc0(22 + strlen(gid) + 1); + RemoteQueryState *combiner = NULL; + GlobalTransactionId gxid = InvalidGlobalTransactionId; + PGXC_NodeId *datanodes = NULL; + + gxid = GetCurrentGlobalTransactionId(); + + /* + * Now that the transaction has been prepared on the nodes, + * Initialize to make the business on GTM + */ + datanodes = collect_datanode_numbers(conn_count, connections); + + /* + * Send a Prepare in Progress message to GTM. + * At the same time node list is saved on GTM. + */ + result = BeingPreparedTranGTM(gxid, gid, conn_count, datanodes, 0, NULL); + + if (result < 0) + return EOF; + + sprintf(buffer, "PREPARE TRANSACTION '%s'", gid); + + /* Send PREPARE */ + for (i = 0; i < conn_count; i++) + if (data_node_send_query(connections[i], buffer)) + return EOF; + + combiner = CreateResponseCombiner(conn_count, COMBINE_TYPE_NONE); + + /* Receive responses */ + if (data_node_receive_responses(conn_count, connections, timeout, combiner)) + return EOF; + + result = ValidateAndCloseCombiner(combiner) ? result : EOF; + if (result) + goto finish; + + /* + * Prepare the transaction on GTM after everything is done. + * GXID associated with PREPARE state is considered as used on Nodes, + * but is still present in Snapshot. + * This GXID will be discarded from Snapshot when commit prepared is + * issued from another node. + */ + result = PrepareTranGTM(gxid); + +finish: + /* + * An error has happened on a Datanode or GTM, + * It is necessary to rollback the transaction on already prepared nodes. + * But not on nodes where the error occurred. + */ + if (result) + { + GlobalTransactionId rollback_xid = InvalidGlobalTransactionId; + buffer = (char *) repalloc(buffer, 20 + strlen(gid) + 1); + + sprintf(buffer, "ROLLBACK PREPARED '%s'", gid); + + rollback_xid = BeginTranGTM(NULL); + for (i = 0; i < conn_count; i++) + { + if (data_node_send_gxid(connections[i], rollback_xid)) + { + add_error_message(connections[i], "Can not send request"); + return EOF; + } + if (data_node_send_query(connections[i], buffer)) + { + add_error_message(connections[i], "Can not send request"); + return EOF; + } + } + + if (!combiner) + combiner = CreateResponseCombiner(conn_count, COMBINE_TYPE_NONE); + + if (data_node_receive_responses(conn_count, connections, timeout, combiner)) + result = EOF; + result = ValidateAndCloseCombiner(combiner) ? result : EOF; + + /* + * Don't forget to rollback also on GTM + * Both GXIDs used for PREPARE and COMMIT PREPARED are discarded from GTM snapshot here. + */ + CommitPreparedTranGTM(gxid, rollback_xid); + + return EOF; + } + + return result; +} + + +/* + * Commit prepared transaction on Datanodes where it has been prepared. + * Connection to backends has been cut when transaction has been prepared, + * So it is necessary to send the COMMIT PREPARE message to all the nodes. + * We are not sure if the transaction prepared has involved all the datanodes + * or not but send the message to all of them. + * This avoid to have any additional interaction with GTM when making a 2PC transaction. + */ +void +DataNodeCommitPrepared(char *gid) +{ + int res = 0; + int res_gtm = 0; + DataNodeHandle **connections; + List *nodelist = NIL; + int i, tran_count; + PGXC_NodeId *datanodes = NULL; + PGXC_NodeId *coordinators = NULL; + int coordcnt = 0; + int datanodecnt = 0; + GlobalTransactionId gxid, prepared_gxid; + + res_gtm = GetGIDDataGTM(gid, &gxid, &prepared_gxid, + &datanodecnt, &datanodes, &coordcnt, &coordinators); + + tran_count = datanodecnt + coordcnt; + if (tran_count == 0 || res_gtm < 0) + goto finish; + + autocommit = false; + + /* Build the list of nodes based on data received from GTM */ + for (i = 0; i < datanodecnt; i++) + { + nodelist = lappend_int(nodelist,datanodes[i]); + } + + /* Get connections */ + connections = get_handles(nodelist); + + /* Commit here the prepared transaction to all Datanodes */ + res = data_node_commit_prepared(gxid, prepared_gxid, datanodecnt, connections, gid); + +finish: + /* In autocommit mode statistics is collected in DataNodeExec */ + if (!autocommit) + stat_transaction(tran_count); + if (!PersistentConnections) + release_handles(false); + autocommit = true; + clear_write_node_list(); + + /* Free node list taken from GTM */ + if (datanodes) + free(datanodes); + if (coordinators) + free(coordinators); + + if (res_gtm < 0) + ereport(ERROR, + (errcode(ERRCODE_INTERNAL_ERROR), + errmsg("Could not get GID data from GTM"))); + if (res != 0) + ereport(ERROR, + (errcode(ERRCODE_INTERNAL_ERROR), + errmsg("Could not commit prepared transaction on data nodes"))); +} + +/* + * Commit a prepared transaction on all nodes + * Prepared transaction with this gid has reset the datanodes, + * so we need a new gxid. + * An error is returned to the application only if all the Datanodes + * and Coordinator do not know about the gxid proposed. + * This permits to avoid interactions with GTM. + */ +static int +data_node_commit_prepared(GlobalTransactionId gxid, GlobalTransactionId prepared_gxid, int conn_count, DataNodeHandle ** connections, char *gid) +{ + int result = 0; + int i; + RemoteQueryState *combiner = NULL; + struct timeval *timeout = NULL; + char *buffer = (char *) palloc0(18 + strlen(gid) + 1); + + /* GXID has been piggybacked when gid data has been received from GTM */ + sprintf(buffer, "COMMIT PREPARED '%s'", gid); + + /* Send gxid and COMMIT PREPARED message to all the Datanodes */ + for (i = 0; i < conn_count; i++) + { + if (data_node_send_gxid(connections[i], gxid)) + { + add_error_message(connections[i], "Can not send request"); + result = EOF; + goto finish; + } + if (data_node_send_query(connections[i], buffer)) + { + add_error_message(connections[i], "Can not send request"); + result = EOF; + goto finish; + } + } + + combiner = CreateResponseCombiner(conn_count, COMBINE_TYPE_NONE); + + /* Receive responses */ + if (data_node_receive_responses(conn_count, connections, timeout, combiner)) + result = EOF; + + /* Validate and close combiner */ + result = ValidateAndCloseCombiner(combiner) ? result : EOF; + +finish: + /* Both GXIDs used for PREPARE and COMMIT PREPARED are discarded from GTM snapshot here */ + CommitPreparedTranGTM(gxid, prepared_gxid); + + return result; +} + +/* + * Rollback prepared transaction on Datanodes involved in the current transaction + */ +void +DataNodeRollbackPrepared(char *gid) +{ + int res = 0; + int res_gtm = 0; + DataNodeHandle **connections; + List *nodelist = NIL; + int i, tran_count; + + PGXC_NodeId *datanodes = NULL; + PGXC_NodeId *coordinators = NULL; + int coordcnt = 0; + int datanodecnt = 0; + GlobalTransactionId gxid, prepared_gxid; + + res_gtm = GetGIDDataGTM(gid, &gxid, &prepared_gxid, + &datanodecnt, &datanodes, &coordcnt, &coordinators); + + tran_count = datanodecnt + coordcnt; + if (tran_count == 0 || res_gtm < 0 ) + goto finish; + + autocommit = false; + + /* Build the node list based on the result got from GTM */ + for (i = 0; i < datanodecnt; i++) + { + nodelist = lappend_int(nodelist,datanodes[i]); + } + + /* Get connections */ + connections = get_handles(nodelist); + + /* Here do the real rollback to Datanodes */ + res = data_node_rollback_prepared(gxid, prepared_gxid, datanodecnt, connections, gid); + +finish: + /* In autocommit mode statistics is collected in DataNodeExec */ + if (!autocommit) + stat_transaction(tran_count); + if (!PersistentConnections) + release_handles(true); + autocommit = true; + clear_write_node_list(true); + if (res_gtm < 0) + ereport(ERROR, + (errcode(ERRCODE_INTERNAL_ERROR), + errmsg("Could not get GID data from GTM"))); + if (res != 0) + ereport(ERROR, + (errcode(ERRCODE_INTERNAL_ERROR), + errmsg("Could not rollback prepared transaction on Datanodes"))); +} + + +/* + * Rollback prepared transaction + * We first get the prepared informations from GTM and then do the treatment + * At the end both prepared GXID and GXID are committed. + */ +static int +data_node_rollback_prepared(GlobalTransactionId gxid, GlobalTransactionId prepared_gxid, + int conn_count, DataNodeHandle ** connections, char *gid) +{ + int result = 0; + int i; + RemoteQueryState *combiner = NULL; + struct timeval *timeout = NULL; + char *buffer = (char *) palloc0(20 + strlen(gid) + 1); + + /* Datanodes have reset after prepared state, so get a new gxid */ + gxid = BeginTranGTM(NULL); + + sprintf(buffer, "ROLLBACK PREPARED '%s'", gid); + + /* Send gxid and COMMIT PREPARED message to all the Datanodes */ + for (i = 0; i < conn_count; i++) + { + if (data_node_send_gxid(connections[i], gxid)) + { + add_error_message(connections[i], "Can not send request"); + result = EOF; + goto finish; + } + + if (data_node_send_query(connections[i], buffer)) + { + add_error_message(connections[i], "Can not send request"); + result = EOF; + goto finish; + } + } + + combiner = CreateResponseCombiner(conn_count, COMBINE_TYPE_NONE); + + /* Receive responses */ + if (data_node_receive_responses(conn_count, connections, timeout, combiner)) + result = EOF; + + /* Validate and close combiner */ + result = ValidateAndCloseCombiner(combiner) ? result : EOF; + +finish: + /* Both GXIDs used for PREPARE and COMMIT PREPARED are discarded from GTM snapshot here */ + CommitPreparedTranGTM(gxid, prepared_gxid); + + return result; +} + + +/* * Commit current transaction on data nodes where it has been started */ void diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c index 0c6208c..91456c4 100644 --- a/src/backend/tcop/utility.c +++ b/src/backend/tcop/utility.c @@ -337,6 +337,15 @@ ProcessUtility(Node *parsetree, break; case TRANS_STMT_PREPARE: +#ifdef PGXC + /* + * If 2PC if invoked from application, transaction is first prepared on Datanodes. + * 2PC file is not written for Coordinators to keep the possiblity + * of a COMMIT PREPARED on a separate Coordinator + */ + if (IS_PGXC_COORDINATOR) + DataNodePrepare(stmt->gid); +#endif if (!PrepareTransactionBlock(stmt->gid)) { /* report unsuccessful commit in completionTag */ @@ -346,13 +355,46 @@ ProcessUtility(Node *parsetree, break; case TRANS_STMT_COMMIT_PREPARED: +#ifdef PGXC + if (IS_PGXC_COORDINATOR) + DataNodeCommitPrepared(stmt->gid); +#endif PreventTransactionChain(isTopLevel, "COMMIT PREPARED"); + +#ifdef PGXC + if (IS_PGXC_DATANODE) + { + /* + * 2PC file of Coordinator is not flushed to disk when transaction is prepared + * so just skip this part. + */ +#endif FinishPreparedTransaction(stmt->gid, true); +#ifdef PGXC + } +#endif break; case TRANS_STMT_ROLLBACK_PREPARED: +#ifdef PGXC + if (IS_PGXC_COORDINATOR) + DataNodeRollbackPrepared(stmt->gid); +#endif + PreventTransactionChain(isTopLevel, "ROLLBACK PREPARED"); + +#ifdef PGXC + if (IS_PGXC_DATANODE) + { + /* + * 2PC file of Coordinator is not flushed to disk when transaction is prepared + * so just skip this part. + */ +#endif FinishPreparedTransaction(stmt->gid, false); +#ifdef PGXC + } +#endif break; case TRANS_STMT_ROLLBACK: diff --git a/src/gtm/client/fe-protocol.c b/src/gtm/client/fe-protocol.c index 0847b0d..ff73b8d 100644 --- a/src/gtm/client/fe-protocol.c +++ b/src/gtm/client/fe-protocol.c @@ -362,19 +362,18 @@ gtmpqParseSuccess(GTM_Conn *conn, GTM_Result *result) break; case TXN_BEGIN_GETGXID_AUTOVACUUM_RESULT: case TXN_PREPARE_RESULT: + case TXN_BEING_PREPARED_RESULT: if (gtmpqGetnchar((char *)&result->gr_resdata.grd_gxid, sizeof (GlobalTransactionId), conn)) result->gr_status = -1; break; case TXN_COMMIT_RESULT: + case TXN_COMMIT_PREPARED_RESULT: case TXN_ROLLBACK_RESULT: if (gtmpqGetnchar((char *)&result->gr_resdata.grd_gxid, sizeof (GlobalTransactionId), conn)) - { result->gr_status = -1; - break; - } break; case TXN_GET_GXID_RESULT: @@ -531,6 +530,60 @@ gtmpqParseSuccess(GTM_Conn *conn, GTM_Result *result) case TXN_GET_ALL_PREPARED_RESULT: break; + case TXN_GET_GID_DATA_RESULT: + if (gtmpqGetnchar((char *)&result->gr_resdata.grd_txn_get_gid_data.gxid, + sizeof (GlobalTransactionId), conn)) + { + result->gr_status = -1; + break; + } + if (gtmpqGetnchar((char *)&result->gr_resdata.grd_txn_get_gid_data.prepared_gxid, + sizeof (GlobalTransactionId), conn)) + { + result->gr_status = -1; + break; + } + if (gtmpqGetInt(&result->gr_resdata.grd_txn_get_gid_data.datanodecnt, + sizeof (int32), conn)) + { + result->gr_status = -1; + break; + } + if ((result->gr_resdata.grd_txn_get_gid_data.datanodes = (PGXC_NodeId *) + malloc(sizeof(PGXC_NodeId) * result->gr_resdata.grd_txn_get_gid_data.datanodecnt)) == NULL) + { + result->gr_status = -1; + break; + } + if (gtmpqGetnchar((char *)result->gr_resdata.grd_txn_get_gid_data.datanodes, + sizeof(PGXC_NodeId) * result->gr_resdata.grd_txn_get_gid_data.datanodecnt, conn)) + { + result->gr_status = -1; + break; + } + if (gtmpqGetInt(&result->gr_resdata.grd_txn_get_gid_data.coordcnt, + sizeof (int32), conn)) + { + result->gr_status = -1; + break; + } + if (result->gr_resdata.grd_txn_get_gid_data.coordcnt != 0) + { + if ((result->gr_resdata.grd_txn_get_gid_data.coordinators = (PGXC_NodeId *) + malloc(sizeof(PGXC_NodeId) * result->gr_resdata.grd_txn_get_gid_data.coordcnt)) == NULL) + { + result->gr_status = -1; + break; + } + if (gtmpqGetnchar((char *)result->gr_resdata.grd_txn_get_gid_data.coordinators, + sizeof(PGXC_NodeId) * result->gr_resdata.grd_txn_get_gid_data.coordcnt, conn)) + { + result->gr_status = -1; + break; + } + } + break; + default: printfGTMPQExpBuffer(&conn->errorMessage, "unexpected result type from server; result typr was \"%d\"\n", diff --git a/src/gtm/client/gtm_client.c b/src/gtm/client/gtm_client.c index 35f81ae..54b75fd 100644 --- a/src/gtm/client/gtm_client.c +++ b/src/gtm/client/gtm_client.c @@ -135,6 +135,7 @@ receive_failed: send_failed: return InvalidGlobalTransactionId; } + int commit_transaction(GTM_Conn *conn, GlobalTransactionId gxid) { @@ -175,7 +176,48 @@ commit_transaction(GTM_Conn *conn, GlobalTransactionId gxid) receive_failed: send_failed: return -1; +} + +int +commit_prepared_transaction(GTM_Conn *conn, GlobalTransactionId gxid, GlobalTransactionId prepared_gxid) +{ + GTM_Result *res = NULL; + time_t finish_time; + + /* Start the message */ + if (gtmpqPutMsgStart('C', true, conn) || + gtmpqPutInt(MSG_TXN_COMMIT_PREPARED, sizeof (GTM_MessageType), conn) || + gtmpqPutc(true, conn) || + gtmpqPutnchar((char *)&gxid, sizeof (GlobalTransactionId), conn) || + gtmpqPutc(true, conn) || + gtmpqPutnchar((char *)&prepared_gxid, sizeof (GlobalTransactionId), conn)) + goto send_failed; + + /* Finish the message */ + if (gtmpqPutMsgEnd(conn)) + goto send_failed; + + /* Flush to ensure backends gets it */ + if (gtmpqFlush(conn)) + goto send_failed; + + finish_time = time(NULL) + CLIENT_GTM_TIMEOUT; + if (gtmpqWaitTimed(true, false, conn, finish_time) || + gtmpqReadData(conn) < 0) + goto receive_failed; + + if ((res = GTMPQgetResult(conn)) == NULL) + goto receive_failed; + + if (res->gr_status == 0) + { + Assert(res->gr_type == TXN_COMMIT_PREPARED_RESULT); + Assert(res->gr_resdata.grd_gxid == gxid); + } +send_failed: +receive_failed: + return -1; } int @@ -222,19 +264,71 @@ send_failed: } int -prepare_transaction(GTM_Conn *conn, GlobalTransactionId gxid, - int nodecnt, PGXC_NodeId nodes[]) +being_prepared_transaction(GTM_Conn *conn, GlobalTransactionId gxid, char *gid, + int datanodecnt, PGXC_NodeId datanodes[], int coordcnt, + PGXC_NodeId coordinators[]) { GTM_Result *res = NULL; time_t finish_time; /* Start the message. */ if (gtmpqPutMsgStart('C', true, conn) || - gtmpqPutInt(MSG_TXN_PREPARE, sizeof (GTM_MessageType), conn) || + gtmpqPutInt(MSG_TXN_BEING_PREPARED, sizeof (GTM_MessageType), conn) || gtmpqPutc(true, conn) || gtmpqPutnchar((char *)&gxid, sizeof (GlobalTransactionId), conn) || - gtmpqPutInt(nodecnt, sizeof (int), conn) || - gtmpqPutnchar((char *)nodes, sizeof (PGXC_NodeId) * nodecnt, conn)) + /* Send also GID for an explicit prepared transaction */ + gtmpqPutInt(strlen(gid), sizeof (GTM_GIDLen), conn) || + gtmpqPutnchar((char *) gid, strlen(gid), conn) || + gtmpqPutInt(datanodecnt, sizeof (int), conn) || + gtmpqPutnchar((char *)datanodes, sizeof (PGXC_NodeId) * datanodecnt, conn) || + gtmpqPutInt(coordcnt, sizeof (int), conn)) + goto send_failed; + + /* Coordinator connections are not always involved in a transaction */ + if (coordcnt != 0 && gtmpqPutnchar((char *)coordinators, sizeof (PGXC_NodeId) * coordcnt, conn)) + goto send_failed; + + /* Finish the message. */ + if (gtmpqPutMsgEnd(conn)) + goto send_failed; + + /* Flush to ensure backend gets it. */ + if (gtmpqFlush(conn)) + goto send_failed; + + finish_time = time(NULL) + CLIENT_GTM_TIMEOUT; + if (gtmpqWaitTimed(true, false, conn, finish_time) || + gtmpqReadData(conn) < 0) + goto receive_failed; + + if ((res = GTMPQgetResult(conn)) == NULL) + goto receive_failed; + + if (res->gr_status == 0) + { + Assert(res->gr_type == TXN_BEING_PREPARED_RESULT); + Assert(res->gr_resdata.grd_gxid == gxid); + } + + return res->gr_status; + +receive_failed: +send_failed: + return -1; +} + + +int +prepare_transaction(GTM_Conn *conn, GlobalTransactionId gxid) +{ + GTM_Result *res = NULL; + time_t finish_time; + + /* Start the message. */ + if (gtmpqPutMsgStart('C', true, conn) || + gtmpqPutInt(MSG_TXN_PREPARE, sizeof (GTM_MessageType), conn) || + gtmpqPutc(true, conn) || + gtmpqPutnchar((char *)&gxid, sizeof (GlobalTransactionId), conn)) goto send_failed; /* Finish the message. */ @@ -266,6 +360,64 @@ send_failed: return -1; } +int +get_gid_data(GTM_Conn *conn, + GTM_IsolationLevel isolevel, + char *gid, + GlobalTransactionId *gxid, + GlobalTransactionId *prepared_gxid, + int *datanodecnt, + PGXC_NodeId **datanodes, + int *coordcnt, + PGXC_NodeId **coordinators) +{ + bool txn_read_only = false; + GTM_Result *res = NULL; + time_t finish_time; + + /* Start the message */ + if (gtmpqPutMsgStart('C', true, conn) || + gtmpqPutInt(MSG_TXN_GET_GID_DATA, sizeof (GTM_MessageType), conn) || + gtmpqPutInt(isolevel, sizeof (GTM_IsolationLevel), conn) || + gtmpqPutc(txn_read_only, conn) || + /* Send also GID for an explicit prepared transaction */ + gtmpqPutInt(strlen(gid), sizeof (GTM_GIDLen), conn) || + gtmpqPutnchar((char *) gid, strlen(gid), conn)) + goto send_failed; + + /* Finish the message */ + if (gtmpqPutMsgEnd(conn)) + goto send_failed; + + /* Flush to ensure backend gets it. */ + if (gtmpqFlush(conn)) + goto send_failed; + + finish_time = time(NULL) + CLIENT_GTM_TIMEOUT; + if (gtmpqWaitTimed(true, false, conn, finish_time) || + gtmpqReadData(conn) < 0) + goto receive_failed; + + if ((res = GTMPQgetResult(conn)) == NULL) + goto receive_failed; + + if (res->gr_status == 0) + { + *gxid = res->gr_resdata.grd_txn_get_gid_data.gxid; + *prepared_gxid = res->gr_resdata.grd_txn_get_gid_data.prepared_gxid; + *datanodes = res->gr_resdata.grd_txn_get_gid_data.datanodes; + *coordinators = res->gr_resdata.grd_txn_get_gid_data.coordinators; + *datanodecnt = res->gr_resdata.grd_txn_get_gid_data.datanodecnt; + *coordcnt = res->gr_resdata.grd_txn_get_gid_data.coordcnt; + } + + return res->gr_status; + +receive_failed: +send_failed: + return -1; +} + /* * Snapshot Management API */ diff --git a/src/gtm/main/gtm_txn.c b/src/gtm/main/gtm_txn.c index 2205167..949c123 100644 --- a/src/gtm/main/gtm_txn.c +++ b/src/gtm/main/gtm_txn.c @@ -149,6 +149,35 @@ GTM_GXIDToHandle(GlobalTransactionId gxid) } /* + * Given the GID (for a prepared transaction), find the corresponding + * transaction handle. + */ +GTM_TransactionHandle +GTM_GIDToHandle(char *gid) +{ + ListCell *elem = NULL; + GTM_TransactionInfo *gtm_txninfo = NULL; + + GTM_RWLockAcquire(>MTransactions.gt_TransArrayLock, GTM_LOCKMODE_READ); + + foreach(elem, GTMTransactions.gt_open_transactions) + { + gtm_txninfo = (GTM_TransactionInfo *)lfirst(elem); + if (gtm_txninfo->gti_gid && strcmp(gid,gtm_txninfo->gti_gid) == 0) + break; + gtm_txninfo = NULL; + } + + GTM_RWLockRelease(>MTransactions.gt_TransArrayLock); + + if (gtm_txninfo != NULL) + return gtm_txninfo->gti_handle; + else + return InvalidTransactionHandle; +} + + +/* * Given the transaction handle, find the corresponding transaction info * structure * @@ -159,7 +188,7 @@ GTM_GXIDToHandle(GlobalTransactionId gxid) GTM_TransactionInfo * GTM_HandleToTransactionInfo(GTM_TransactionHandle handle) { - GTM_TransactionInfo *gtm_txninfo = NULL; + GTM_TransactionInfo *gtm_txninfo = NULL; if ((handle < 0) || (handle > GTM_MAX_GLOBAL_TRANSACTIONS)) { @@ -180,6 +209,7 @@ GTM_HandleToTransactionInfo(GTM_TransactionHandle handle) return gtm_txninfo; } + /* * Remove the given transaction info structures from the global array. If the * calling thread does not have enough cached structures, we in fact keep the @@ -220,9 +250,27 @@ GTM_RemoveTransInfoMulti(GTM_TransactionInfo *gtm_txninfo[], int txn_count) * Now mark the transaction as aborted and mark the structure as not-in-use */ gtm_txninfo[ii]->gti_state = GTM_TXN_ABORTED; - gtm_txninfo[ii]->gti_nodecount = 0; + gtm_txninfo[ii]->gti_datanodecount = 0; + gtm_txninfo[ii]->gti_coordcount = 0; gtm_txninfo[ii]->gti_in_use = false; gtm_txninfo[ii]->gti_snapshot_set = false; + + /* Clean-up also structures that were used for prepared transactions */ + if (gtm_txninfo[ii]->gti_gid) + { + pfree(gtm_txninfo[ii]->gti_gid); + gtm_txninfo[ii]->gti_gid = NULL; + } + if (gtm_txninfo[ii]->gti_coordinators) + { + pfree(gtm_txninfo[ii]->gti_coordinators); + gtm_txninfo[ii]->gti_coordinators = NULL; + } + if (gtm_txninfo[ii]->gti_datanodes) + { + pfree(gtm_txninfo[ii]->gti_datanodes); + gtm_txninfo[ii]->gti_datanodes = NULL; + } } GTM_RWLockRelease(>MTransactions.gt_TransArrayLock); @@ -252,15 +300,21 @@ GTM_RemoveAllTransInfos(int backend_id) while (cell != NULL) { GTM_TransactionInfo *gtm_txninfo = lfirst(cell); - /* check if current entry is associated with the thread */ + /* + * Check if current entry is associated with the thread + * A transaction in prepared state has to be kept alive in the structure. + * It will be committed by another thread than this one. + */ if ((gtm_txninfo->gti_in_use) && + (gtm_txninfo->gti_state != GTM_TXN_PREPARED) && + (gtm_txninfo->gti_state != GTM_TXN_PREPARE_IN_PROGRESS) && (gtm_txninfo->gti_thread_id == thread_id) && ((gtm_txninfo->gti_backend_id == backend_id) || (backend_id == -1))) { /* remove the entry */ GTMTransactions.gt_open_transactions = list_delete_cell(GTMTransactions.gt_open_transactions, cell, prev); - /* update the latestComletedXid */ + /* update the latestCompletedXid */ if (GlobalTransactionIdIsNormal(gtm_txninfo->gti_gxid) && GlobalTransactionIdFollowsOrEquals(gtm_txninfo->gti_gxid, GTMTransactions.gt_latestCompletedXid)) @@ -272,10 +326,27 @@ GTM_RemoveAllTransInfos(int backend_id) * Now mark the transaction as aborted and mark the structure as not-in-use */ gtm_txninfo->gti_state = GTM_TXN_ABORTED; - gtm_txninfo->gti_nodecount = 0; + gtm_txninfo->gti_datanodecount = 0; + gtm_txninfo->gti_coordcount = 0; gtm_txninfo->gti_in_use = false; gtm_txninfo->gti_snapshot_set = false; - + + if (gtm_txninfo->gti_gid) + { + pfree(gtm_txninfo->gti_gid); + gtm_txninfo->gti_gid = NULL; + } + if (gtm_txninfo->gti_coordinators) + { + pfree(gtm_txninfo->gti_coordinators); + gtm_txninfo->gti_coordinators = NULL; + } + if (gtm_txninfo->gti_datanodes) + { + pfree(gtm_txninfo->gti_datanodes); + gtm_txninfo->gti_datanodes = NULL; + } + /* move to next cell in the list */ if (prev) cell = lnext(prev); @@ -583,7 +654,7 @@ GTM_BeginTransactionMulti(GTM_CoordinatorId coord_id, * without removing the corresponding references from the global array */ oldContext = MemoryContextSwitchTo(TopMostMemoryContext); - + for (kk = 0; kk < txn_count; kk++) { int ii, jj, startslot; @@ -627,10 +698,16 @@ GTM_BeginTransactionMulti(GTM_CoordinatorId coord_id, gtm_txninfo[kk]->gti_backend_id = connid[kk]; gtm_txninfo[kk]->gti_in_use = true; + gtm_txninfo[kk]->gti_coordcount = 0; + gtm_txninfo[kk]->gti_datanodes = 0; + gtm_txninfo[kk]->gti_gid = NULL; + gtm_txninfo[kk]->gti_coordinators = NULL; + gtm_txninfo[kk]->gti_datanodes = NULL; + gtm_txninfo[kk]->gti_handle = ii; gtm_txninfo[kk]->gti_vacuum = false; gtm_txninfo[kk]->gti_thread_id = pthread_self(); - GTMTransactions.gt_lastslot = ii; + GTMTransactions.gt_lastslot = ii; txns[kk] = ii; @@ -761,6 +838,29 @@ GTM_CommitTransactionMulti(GTM_TransactionHandle txn[], int txn_count, int statu } /* + * Prepare a transaction + */ +int +GTM_PrepareTransaction(GTM_TransactionHandle txn) +{ + GTM_TransactionInfo *gtm_txninfo = NULL; + + gtm_txninfo = GTM_HandleToTransactionInfo(txn); + + if (gtm_txninfo == NULL) + return STATUS_ERROR; + + /* + * Mark the transaction as prepared + */ + GTM_RWLockAcquire(>m_txninfo->gti_lock, GTM_LOCKMODE_WRITE); + gtm_txninfo->gti_state = GTM_TXN_PREPARED; + GTM_RWLockRelease(>m_txninfo->gti_lock); + + return STATUS_OK; +} + +/* * Commit a transaction */ int @@ -775,9 +875,12 @@ GTM_CommitTransaction(GTM_TransactionHandle txn) * Prepare a transaction */ int -GTM_PrepareTransaction(GTM_TransactionHandle txn, - uint32 nodecnt, - PGXC_NodeId nodes[]) +GTM_BeingPreparedTransaction(GTM_TransactionHandle txn, + char *gid, + uint32 datanodecnt, + PGXC_NodeId datanodes[], + uint32 coordcnt, + PGXC_NodeId coordinators[]) { GTM_TransactionInfo *gtm_txninfo = GTM_HandleToTransactionInfo(txn); @@ -785,15 +888,27 @@ GTM_PrepareTransaction(GTM_TransactionHandle txn, return STATUS_ERROR; /* - * Mark the transaction as being aborted + * Mark the transaction as being prepared */ GTM_RWLockAcquire(>m_txninfo->gti_lock, GTM_LOCKMODE_WRITE); - + gtm_txninfo->gti_state = GTM_TXN_PREPARE_IN_PROGRESS; - gtm_txninfo->gti_nodecount = nodecnt; - if (gtm_txninfo->gti_nodes == NULL) - gtm_txninfo->gti_nodes = (PGXC_NodeId *)MemoryContextAlloc(TopMostMemoryContext, sizeof (PGXC_NodeId) * GTM_MAX_2PC_NODES); - memcpy(gtm_txninfo->gti_nodes, nodes, sizeof (PGXC_NodeId) * nodecnt); + gtm_txninfo->gti_datanodecount = datanodecnt; + gtm_txninfo->gti_coordcount = coordcnt; + + if (gtm_txninfo->gti_datanodes == NULL) + gtm_txninfo->gti_datanodes = (PGXC_NodeId *)MemoryContextAlloc(TopMostMemoryContext, sizeof (PGXC_NodeId) * GTM_MAX_2PC_NODES); + memcpy(gtm_txninfo->gti_datanodes, datanodes, sizeof (PGXC_NodeId) * datanodecnt); + + /* It is possible that no coordinator is involved in a transaction */ + if (coordcnt != 0 && gtm_txninfo->gti_coordinators == NULL) + gtm_txninfo->gti_coordinators = (PGXC_NodeId *)MemoryContextAlloc(TopMostMemoryContext, sizeof (PGXC_NodeId) * GTM_MAX_2PC_NODES); + if (coordcnt != 0) + memcpy(gtm_txninfo->gti_coordinators, coordinators, sizeof (PGXC_NodeId) * coordcnt); + + if (gtm_txninfo->gti_gid == NULL) + gtm_txninfo->gti_gid = (char *)MemoryContextAlloc(TopMostMemoryContext, GTM_MAX_GID_LEN); + memcpy(gtm_txninfo->gti_gid, gid, strlen(gid)); GTM_RWLockRelease(>m_txninfo->gti_lock); @@ -804,12 +919,53 @@ GTM_PrepareTransaction(GTM_TransactionHandle txn, * Same as GTM_PrepareTransaction but takes GXID as input */ int -GTM_PrepareTransactionGXID(GlobalTransactionId gxid, - uint32 nodecnt, - PGXC_NodeId nodes[]) +GTM_BeingPreparedTransactionGXID(GlobalTransactionId gxid, + char *gid, + uint32 datanodecnt, + PGXC_NodeId datanodes[], + uint32 coordcnt, + PGXC_NodeId coordinators[]) { GTM_TransactionHandle txn = GTM_GXIDToHandle(gxid); - return GTM_PrepareTransaction(txn, nodecnt, nodes); + return GTM_BeingPreparedTransaction(txn, gid, datanodecnt, datanodes, coordcnt, coordinators); +} + +int +GTM_GetGIDData(GTM_TransactionHandle prepared_txn, + GlobalTransactionId *prepared_gxid, + int *datanodecnt, + PGXC_NodeId **datanodes, + int *coordcnt, + PGXC_NodeId **coordinators) +{ + GTM_TransactionInfo *gtm_txninfo = NULL; + MemoryContext oldContext; + + oldContext = MemoryContextSwitchTo(TopMostMemoryContext); + + gtm_txninfo = GTM_HandleToTransactionInfo(prepared_txn); + if (gtm_txninfo == NULL) + return STATUS_ERROR; + + /* then get the necessary Data */ + *prepared_gxid = gtm_txninfo->gti_gxid; + *datanodecnt = gtm_txninfo->gti_datanodecount; + *coordcnt = gtm_txninfo->gti_coordcount; + + *datanodes = (PGXC_NodeId *) palloc(sizeof (PGXC_NodeId) * gtm_txninfo->gti_datanodecount); + memcpy(*datanodes, gtm_txninfo->gti_datanodes, + sizeof (PGXC_NodeId) * gtm_txninfo->gti_datanodecount); + + if (coordcnt != 0) + { + *coordinators = (PGXC_NodeId *) palloc(sizeof (PGXC_NodeId) * gtm_txninfo->gti_coordcount); + memcpy(*coordinators, gtm_txninfo->gti_coordinators, + sizeof (PGXC_NodeId) * gtm_txninfo->gti_coordcount); + } + + MemoryContextSwitchTo(oldContext); + + return STATUS_OK; } /* @@ -1146,6 +1302,174 @@ ProcessCommitTransactionCommand(Port *myport, StringInfo message) } /* + * Process MSG_TXN_COMMIT_PREPARED_MSG + * Commit a prepared transaction + * Here the GXID used for PREPARE and COMMIT PREPARED are both committed + */ +void +ProcessCommitPreparedTransactionCommand(Port *myport, StringInfo message) +{ + StringInfoData buf; + int txn_count = 2; /* PREPARE and COMMIT PREPARED gxid's */ + GTM_TransactionHandle txn[txn_count]; + GlobalTransactionId gxid[txn_count]; + MemoryContext oldContext; + int status[txn_count]; + int isgxid[txn_count]; + int ii, count; + + for (ii = 0; ii < txn_count; ii++) + { + isgxid[ii] = pq_getmsgbyte(message); + if (isgxid[ii]) + { + const char *data = pq_getmsgbytes(message, sizeof (gxid[ii])); + if (data == NULL) + ereport(ERROR, + (EPROTO, + errmsg("Message does not contain valid GXID"))); + memcpy(&gxid[ii], data, sizeof (gxid[ii])); + txn[ii] = GTM_GXIDToHandle(gxid[ii]); + elog(DEBUG1, "ProcessCommitTransactionCommandMulti: gxid(%u), handle(%u)", gxid[ii], txn[ii]); + } + else + { + const char *data = pq_getmsgbytes(message, sizeof (txn[ii])); + if (data == NULL) + ereport(ERROR, + (EPROTO, + errmsg("Message does not contain valid Transaction Handle"))); + memcpy(&txn[ii], data, sizeof (txn[ii])); + elog(DEBUG1, "ProcessCommitTransactionCommandMulti: handle(%u)", txn[ii]); + } + } + + pq_getmsgend(message); + + oldContext = MemoryContextSwitchTo(TopMemoryContext); + + /* + * Commit the prepared transaction. + */ + count = GTM_CommitTransactionMulti(txn, txn_count, status); + + MemoryContextSwitchTo(oldContext); + + pq_beginmessage(&buf, 'S'); + pq_sendint(&buf, TXN_COMMIT_PREPARED_RESULT, 4); + if (myport->is_proxy) + { + GTM_ProxyMsgHeader proxyhdr; + proxyhdr.ph_conid = myport->conn_id; + pq_sendbytes(&buf, (char *)&proxyhdr, sizeof (GTM_ProxyMsgHeader)); + } + pq_sendbytes(&buf, (char *)&gxid[0], sizeof(GlobalTransactionId)); + pq_sendint(&buf, status[0], 4); + pq_endmessage(myport, &buf); + + if (!myport->is_proxy) + pq_flush(myport); + return; +} + + +/* + * Process MSG_TXN_GET_GID_DATA + * This message is used after at the beginning of a COMMIT PREPARED + * or a ROLLBACK PREPARED. + * For a given GID the following info is returned: + * - a fresh GXID, + * - GXID of the transaction that made the prepare + * - datanode and coordinator node list involved in the prepare + */ +void +ProcessGetGIDDataTransactionCommand(Port *myport, StringInfo message) +{ + StringInfoData buf; + char *gid; + int gidlen; + GTM_IsolationLevel txn_isolation_level; + bool txn_read_only; + MemoryContext oldContext; + GTM_TransactionHandle txn, prepared_txn; + /* Data to be sent back to client */ + GlobalTransactionId gxid, prepared_gxid; + PGXC_NodeId *coordinators = NULL; + PGXC_NodeId *datanodes = NULL; + int datanodecnt,coordcnt; + + /* take the isolation level and read_only instructions */ + txn_isolation_level = pq_getmsgint(message, sizeof (GTM_IsolationLevel)); + txn_read_only = pq_getmsgbyte(message); + + /* receive GID */ + gidlen = pq_getmsgint(message, sizeof (GTM_GIDLen)); + gid = (char *)pq_getmsgbytes(message, gidlen); + + pq_getmsgend(message); + + prepared_txn = GTM_GIDToHandle(gid); + if (prepared_txn == InvalidTransactionHandle) + ereport(ERROR, + (EINVAL, + errmsg("Failed to get GID Data for prepared transaction"))); + + oldContext = MemoryContextSwitchTo(TopMemoryContext); + + /* First get the GXID for the new transaction */ + txn = GTM_BeginTransaction(0, txn_isolation_level, txn_read_only); + if (txn == InvalidTransactionHandle) + ereport(ERROR, + (EINVAL, + errmsg("Failed to start a new transaction"))); + + gxid = GTM_GetGlobalTransactionId(txn); + if (gxid == InvalidGlobalTransactionId) + ereport(ERROR, + (EINVAL, + errmsg("Failed to get a new transaction id"))); + + /* + * Make the internal process, get the prepared information from GID. + */ + if (GTM_GetGIDData(prepared_txn, &prepared_gxid, &datanodecnt, &datanodes, &coordcnt, &coordinators) != STATUS_OK) + { + ereport(ERROR, + (EINVAL, + errmsg("Failed to get the information of prepared transaction"))); + } + + MemoryContextSwitchTo(oldContext); + + /* + * Send a SUCCESS message back to the client + */ + pq_beginmessage(&buf, 'S'); + pq_sendint(&buf, TXN_GET_GID_DATA_RESULT, 4); + if (myport->is_proxy) + { + GTM_ProxyMsgHeader proxyhdr; + proxyhdr.ph_conid = myport->conn_id; + pq_sendbytes(&buf, (char *)&proxyhdr, sizeof (GTM_ProxyMsgHeader)); + } + /* Send the two GXIDs */ + pq_sendbytes(&buf, (char *)&gxid, sizeof(GlobalTransactionId)); + pq_sendbytes(&buf, (char *)&prepared_gxid, sizeof(GlobalTransactionId)); + /* Then send the data linked to nodes involved in prepare */ + pq_sendint(&buf, datanodecnt, 4); + pq_sendbytes(&buf, (char *)datanodes, sizeof(PGXC_NodeId) * datanodecnt); + pq_sendint(&buf, coordcnt, 4); + if (coordcnt != 0) + pq_sendbytes(&buf, (char *)coordinators, sizeof(PGXC_NodeId) * coordcnt); + + pq_endmessage(myport, &buf); + + if (!myport->is_proxy) + pq_flush(myport); + return; +} + +/* * Process MSG_TXN_ROLLBACK message */ void @@ -1352,18 +1676,21 @@ ProcessRollbackTransactionCommandMulti(Port *myport, StringInfo message) } /* - * Process MSG_TXN_PREPARE message + * Process MSG_TXN_BEING_PREPARED message */ void -ProcessPrepareTransactionCommand(Port *myport, StringInfo message) +ProcessBeingPreparedTransactionCommand(Port *myport, StringInfo message) { StringInfoData buf; GTM_TransactionHandle txn; GlobalTransactionId gxid; int isgxid = 0; - int nodecnt; - PGXC_NodeId *nodes; + int datanodecnt,coordcnt; + GTM_GIDLen gidlen; + PGXC_NodeId *coordinators = NULL; + PGXC_NodeId *datanodes = NULL; MemoryContext oldContext; + char *gid; isgxid = pq_getmsgbyte(message); @@ -1387,26 +1714,104 @@ ProcessPrepareTransactionCommand(Port *myport, StringInfo message) memcpy(&txn, data, sizeof (txn)); } - nodecnt = pq_getmsgint(message, sizeof (nodecnt)); - nodes = (PGXC_NodeId *) palloc(sizeof (PGXC_NodeId) * nodecnt); - memcpy(nodes, pq_getmsgbytes(message, sizeof (PGXC_NodeId) * nodecnt), - sizeof (PGXC_NodeId) * nodecnt); + /* get GID */ + gidlen = pq_getmsgint(message, sizeof (GTM_GIDLen)); + gid = (char *)pq_getmsgbytes(message, gidlen); + /* Get Datanode Data */ + datanodecnt = pq_getmsgint(message, 4); + datanodes = (PGXC_NodeId *) palloc(sizeof (PGXC_NodeId) * datanodecnt); + memcpy(datanodes, pq_getmsgbytes(message, sizeof (PGXC_NodeId) * datanodecnt), + sizeof (PGXC_NodeId) * datanodecnt); + + /* Get Coordinator Data, can be possibly NULL */ + coordcnt = pq_getmsgint(message, 4); + if (coordcnt != 0) + { + coordinators = (PGXC_NodeId *) palloc(sizeof (PGXC_NodeId) * coordcnt); + memcpy(coordinators, pq_getmsgbytes(message, sizeof (PGXC_NodeId) * coordcnt), + sizeof (PGXC_NodeId) * coordcnt); + } pq_getmsgend(message); - oldContext = MemoryContextSwitchTo(TopMemoryContext); + oldContext = MemoryContextSwitchTo(TopMostMemoryContext); /* * Prepare the transaction */ - if (GTM_PrepareTransaction(txn, nodecnt, nodes) != STATUS_OK) + if (GTM_BeingPreparedTransaction(txn, gid, datanodecnt, datanodes, coordcnt, coordinators) != STATUS_OK) ereport(ERROR, (EINVAL, - errmsg("Failed to commit the transaction"))); + errmsg("Failed to prepare the transaction"))); MemoryContextSwitchTo(oldContext); - pfree(nodes); + if (datanodes) + pfree(datanodes); + if (coordinators) + pfree(coordinators); + + pq_beginmessage(&buf, 'S'); + pq_sendint(&buf, TXN_BEING_PREPARED_RESULT, 4); + if (myport->is_proxy) + { + GTM_ProxyMsgHeader proxyhdr; + proxyhdr.ph_conid = myport->conn_id; + pq_sendbytes(&buf, (char *)&proxyhdr, sizeof (GTM_ProxyMsgHeader)); + } + pq_sendbytes(&buf, (char *)&gxid, sizeof(GlobalTransactionId)); + pq_endmessage(myport, &buf); + + if (!myport->is_proxy) + pq_flush(myport); + return; +} + +/* + * Process MSG_TXN_PREPARE message + */ +void +ProcessPrepareTransactionCommand(Port *myport, StringInfo message) +{ + StringInfoData buf; + GTM_TransactionHandle txn; + GlobalTransactionId gxid; + int isgxid = 0; + MemoryContext oldContext; + int status = STATUS_OK; + + isgxid = pq_getmsgbyte(message); + + if (isgxid) + { + const char *data = pq_getmsgbytes(message, sizeof (gxid)); + if (data == NULL) + ereport(ERROR, + (EPROTO, + errmsg("Message does not contain valid GXID"))); + memcpy(&gxid, data, sizeof (gxid)); + txn = GTM_GXIDToHandle(gxid); + } + else + { + const char *data = pq_getmsgbytes(message, sizeof (txn)); + if (data == NULL) + ereport(ERROR, + (EPROTO, + errmsg("Message does not contain valid Transaction Handle"))); + memcpy(&txn, data, sizeof (txn)); + } + + pq_getmsgend(message); + + oldContext = MemoryContextSwitchTo(TopMostMemoryContext); + + /* + * Commit the transaction + */ + status = GTM_PrepareTransaction(txn); + + MemoryContextSwitchTo(oldContext); pq_beginmessage(&buf, 'S'); pq_sendint(&buf, TXN_PREPARE_RESULT, 4); @@ -1424,6 +1829,7 @@ ProcessPrepareTransactionCommand(Port *myport, StringInfo message) return; } + /* * Process MSG_TXN_GET_GXID message */ diff --git a/src/gtm/main/main.c b/src/gtm/main/main.c index 667967a..1a6e546 100644 --- a/src/gtm/main/main.c +++ b/src/gtm/main/main.c @@ -769,12 +769,15 @@ ProcessCommand(Port *myport, StringInfo input_message) case MSG_TXN_BEGIN_GETGXID: case MSG_TXN_BEGIN_GETGXID_AUTOVACUUM: case MSG_TXN_PREPARE: + case MSG_TXN_BEING_PREPARED: case MSG_TXN_COMMIT: + case MSG_TXN_COMMIT_PREPARED: case MSG_TXN_ROLLBACK: case MSG_TXN_GET_GXID: case MSG_TXN_BEGIN_GETGXID_MULTI: case MSG_TXN_COMMIT_MULTI: case MSG_TXN_ROLLBACK_MULTI: + case MSG_TXN_GET_GID_DATA: ProcessTransactionCommand(myport, mtype, input_message); break; @@ -795,7 +798,7 @@ ProcessCommand(Port *myport, StringInfo input_message) case MSG_SEQUENCE_ALTER: ProcessSequenceCommand(myport, mtype, input_message); break; - + case MSG_TXN_GET_STATUS: case MSG_TXN_GET_ALL_PREPARED: ProcessQueryCommand(myport, mtype, input_message); @@ -938,39 +941,47 @@ ProcessTransactionCommand(Port *myport, GTM_MessageType mtype, StringInfo messag switch (mtype) { - case MSG_TXN_BEGIN: + case MSG_TXN_BEGIN: ProcessBeginTransactionCommand(myport, message); break; - case MSG_TXN_BEGIN_GETGXID: + case MSG_TXN_BEGIN_GETGXID: ProcessBeginTransactionGetGXIDCommand(myport, message); break; - case MSG_TXN_BEGIN_GETGXID_AUTOVACUUM: + case MSG_TXN_BEGIN_GETGXID_AUTOVACUUM: ProcessBeginTransactionGetGXIDAutovacuumCommand(myport, message); break; - case MSG_TXN_BEGIN_GETGXID_MULTI: + case MSG_TXN_BEGIN_GETGXID_MULTI: ProcessBeginTransactionGetGXIDCommandMulti(myport, message); break; - case MSG_TXN_PREPARE: + case MSG_TXN_BEING_PREPARED: + ProcessBeingPreparedTransactionCommand(myport, message); + break; + + case MSG_TXN_PREPARE: ProcessPrepareTransactionCommand(myport, message); break; - case MSG_TXN_COMMIT: + case MSG_TXN_COMMIT: ProcessCommitTransactionCommand(myport, message); break; - case MSG_TXN_ROLLBACK: + case MSG_TXN_COMMIT_PREPARED: + ProcessCommitPreparedTransactionCommand(myport, message); + break; + + case MSG_TXN_ROLLBACK: ProcessRollbackTransactionCommand(myport, message); break; - case MSG_TXN_COMMIT_MULTI: + case MSG_TXN_COMMIT_MULTI: ProcessCommitTransactionCommandMulti(myport, message); break; - case MSG_TXN_ROLLBACK_MULTI: + case MSG_TXN_ROLLBACK_MULTI: ProcessRollbackTransactionCommandMulti(myport, message); break; @@ -978,6 +989,9 @@ ProcessTransactionCommand(Port *myport, GTM_MessageType mtype, StringInfo messag ProcessGetGXIDTransactionCommand(myport, message); break; + case MSG_TXN_GET_GID_DATA: + ProcessGetGIDDataTransactionCommand(myport, message); + default: Assert(0); /* Shouldn't come here.. keep compiler quite */ } diff --git a/src/gtm/proxy/proxy_main.c b/src/gtm/proxy/proxy_main.c index 66b1594..d9ca329 100644 --- a/src/gtm/proxy/proxy_main.c +++ b/src/gtm/proxy/proxy_main.c @@ -949,9 +949,12 @@ ProcessCommand(GTMProxy_ConnectionInfo *conninfo, GTM_Conn *gtm_conn, case MSG_TXN_BEGIN_GETGXID: case MSG_TXN_BEGIN_GETGXID_AUTOVACUUM: case MSG_TXN_PREPARE: + case MSG_TXN_BEING_PREPARED: case MSG_TXN_COMMIT: + case MSG_TXN_COMMIT_PREPARED: case MSG_TXN_ROLLBACK: case MSG_TXN_GET_GXID: + case MSG_TXN_GET_GID_DATA: ProcessTransactionCommand(conninfo, gtm_conn, mtype, input_message); break; @@ -1115,7 +1118,11 @@ ProcessResponse(GTMProxy_ThreadInfo *thrinfo, GTMProxy_CommandInfo *cmdinfo, case MSG_TXN_BEGIN: case MSG_TXN_BEGIN_GETGXID_AUTOVACUUM: case MSG_TXN_PREPARE: + case MSG_TXN_BEING_PREPARED: + /* There are not so many 2PC from application messages, so just proxy it. */ + case MSG_TXN_COMMIT_PREPARED: case MSG_TXN_GET_GXID: + case MSG_TXN_GET_GID_DATA: case MSG_SNAPSHOT_GXID_GET: case MSG_SEQUENCE_INIT: case MSG_SEQUENCE_GET_CURRENT: @@ -1165,8 +1172,6 @@ ProcessResponse(GTMProxy_ThreadInfo *thrinfo, GTMProxy_CommandInfo *cmdinfo, errmsg("invalid frontend message type %d", cmdinfo->ci_mtype))); } - - } /* ---------------- @@ -1302,7 +1307,10 @@ ProcessTransactionCommand(GTMProxy_ConnectionInfo *conninfo, GTM_Conn *gtm_conn, break; case MSG_TXN_BEGIN_GETGXID_AUTOVACUUM: - case MSG_TXN_PREPARE: + case MSG_TXN_PREPARE: + case MSG_TXN_BEING_PREPARED: + case MSG_TXN_GET_GID_DATA: + case MSG_TXN_COMMIT_PREPARED: GTMProxy_ProxyCommand(conninfo, gtm_conn, mtype, message); break; diff --git a/src/include/access/gtm.h b/src/include/access/gtm.h index 4878d92..6740c86 100644 --- a/src/include/access/gtm.h +++ b/src/include/access/gtm.h @@ -24,6 +24,23 @@ extern GlobalTransactionId BeginTranGTM(GTM_Timestamp *timestamp); extern GlobalTransactionId BeginTranAutovacuumGTM(void); extern int CommitTranGTM(GlobalTransactionId gxid); extern int RollbackTranGTM(GlobalTransactionId gxid); +extern int BeingPreparedTranGTM(GlobalTransactionId gxid, + char *gid, + int datanodecnt, + PGXC_NodeId datanodes[], + int coordcount, + PGXC_NodeId coordinators[]); +extern int PrepareTranGTM(GlobalTransactionId gxid); +extern int GetGIDDataGTM(char *gid, + GlobalTransactionId *gxid, + GlobalTransactionId *prepared_gxid, + int *datanodecnt, + PGXC_NodeId **datanodes, + int *coordcnt, + PGXC_NodeId **coordinators); +extern int CommitPreparedTranGTM(GlobalTransactionId gxid, + GlobalTransactionId prepared_gxid); + extern GTM_Snapshot GetSnapshotGTM(GlobalTransactionId gxid, bool canbe_grouped); /* Sequence interface APIs with GTM */ diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h index a3a1492..485f7fa 100644 --- a/src/include/access/twophase.h +++ b/src/include/access/twophase.h @@ -19,6 +19,10 @@ #include "storage/proc.h" #include "utils/timestamp.h" +#ifdef PGXC +#include "pgxc/pgxc.h" +#endif + /* * GlobalTransactionData is defined in twophase.c; other places have no * business knowing the internal definition. @@ -38,6 +42,10 @@ extern GlobalTransaction MarkAsPreparing(TransactionId xid, const char *gid, TimestampTz prepared_at, Oid owner, Oid databaseid); +#ifdef PGXC +extern void RemoveGXactCoord(GlobalTransaction gxact); +#endif + extern void StartPrepare(GlobalTransaction gxact); extern void EndPrepare(GlobalTransaction gxact); diff --git a/src/include/gtm/gtm_c.h b/src/include/gtm/gtm_c.h index 0a4c941..da15df3 100644 --- a/src/include/gtm/gtm_c.h +++ b/src/include/gtm/gtm_c.h @@ -38,6 +38,7 @@ typedef uint32 GlobalTransactionId; /* 32-bit global transaction ids */ typedef uint32 PGXC_NodeId; typedef uint32 GTM_CoordinatorId; typedef int16 GTMProxy_ConnID; +typedef uint32 GTM_GIDLen; #define InvalidGTMProxyConnID -1 diff --git a/src/include/gtm/gtm_client.h b/src/include/gtm/gtm_client.h index 9db6884..4fe4bcf 100644 --- a/src/include/gtm/gtm_client.h +++ b/src/include/gtm/gtm_client.h @@ -29,7 +29,9 @@ typedef union GTM_ResultData } grd_gxid_tp; /* TXN_BEGIN_GETGXID */ GlobalTransactionId grd_gxid; /* TXN_PREPARE + * TXN_BEING_PREPARED * TXN_COMMIT + * TXN_COMMIT_PREPARED * TXN_ROLLBACK */ @@ -70,6 +72,16 @@ typedef union GTM_ResultData int status[GTM_MAX_GLOBAL_TRANSACTIONS]; } grd_txn_snap_multi; + struct + { + GlobalTransactionId gxid; + GlobalTransactionId prepared_gxid; + int datanodecnt; + int coordcnt; + PGXC_NodeId *datanodes; + PGXC_NodeId *coordinators; + } grd_txn_get_gid_data; /* TXN_GET_GID_DATA_RESULT */ + /* * TODO * TXN_GET_STATUS @@ -111,9 +123,16 @@ void disconnect_gtm(GTM_Conn *conn); GlobalTransactionId begin_transaction(GTM_Conn *conn, GTM_IsolationLevel isolevel, GTM_Timestamp *timestamp); GlobalTransactionId begin_transaction_autovacuum(GTM_Conn *conn, GTM_IsolationLevel isolevel); int commit_transaction(GTM_Conn *conn, GlobalTransactionId gxid); +int commit_prepared_transaction(GTM_Conn *conn, GlobalTransactionId gxid, GlobalTransactionId prepared_gxid); int abort_transaction(GTM_Conn *conn, GlobalTransactionId gxid); -int prepare_transaction(GTM_Conn *conn, GlobalTransactionId gxid, - int nodecnt, PGXC_NodeId nodes[]); +int being_prepared_transaction(GTM_Conn *conn, GlobalTransactionId gxid, char *gid, + int datanodecnt, PGXC_NodeId datanodes[], + int coordcnt, PGXC_NodeId coordinators[]); +int prepare_transaction(GTM_Conn *conn, GlobalTransactionId gxid); +int get_gid_data(GTM_Conn *conn, GTM_IsolationLevel isolevel, char *gid, + GlobalTransactionId *gxid, GlobalTransactionId *prepared_gxid, + int *datanodecnt, PGXC_NodeId **datanodes, int *coordcnt, + PGXC_NodeId **coordinators); /* * Snapshot Management API diff --git a/src/include/gtm/gtm_msg.h b/src/include/gtm/gtm_msg.h index e76e762..e1730eb 100644 --- a/src/include/gtm/gtm_msg.h +++ b/src/include/gtm/gtm_msg.h @@ -22,11 +22,14 @@ typedef enum GTM_MessageType MSG_TXN_BEGIN, /* Start a new transaction */ MSG_TXN_BEGIN_GETGXID, /* Start a new transaction and get GXID */ MSG_TXN_BEGIN_GETGXID_MULTI, /* Start multiple new transactions and get GXIDs */ - MSG_TXN_PREPARE, /* Prepare a transation for commit */ + MSG_TXN_BEING_PREPARED, /* Begins to prepare a transation for commit */ MSG_TXN_COMMIT, /* Commit a running or prepared transaction */ MSG_TXN_COMMIT_MULTI, /* Commit multiple running or prepared transactions */ + MSG_TXN_COMMIT_PREPARED, /* Commit a prepared transaction */ + MSG_TXN_PREPARE, /* Finish preparing a transaction */ MSG_TXN_ROLLBACK, /* Rollback a transaction */ MSG_TXN_ROLLBACK_MULTI, /* Rollback multiple transactions */ + MSG_TXN_GET_GID_DATA, /* Get info associated with a GID, and get a GXID */ MSG_TXN_GET_GXID, /* Get a GXID for a transaction */ MSG_SNAPSHOT_GET, /* Get a global snapshot */ MSG_SNAPSHOT_GET_MULTI, /* Get multiple global snapshots */ @@ -59,10 +62,13 @@ typedef enum GTM_ResultType TXN_BEGIN_GETGXID_RESULT, TXN_BEGIN_GETGXID_MULTI_RESULT, TXN_PREPARE_RESULT, + TXN_BEING_PREPARED_RESULT, + TXN_COMMIT_PREPARED_RESULT, TXN_COMMIT_RESULT, TXN_COMMIT_MULTI_RESULT, TXN_ROLLBACK_RESULT, TXN_ROLLBACK_MULTI_RESULT, + TXN_GET_GID_DATA_RESULT, TXN_GET_GXID_RESULT, SNAPSHOT_GET_RESULT, SNAPSHOT_GET_MULTI_RESULT, diff --git a/src/include/gtm/gtm_txn.h b/src/include/gtm/gtm_txn.h index 2d78946..5e3a02c 100644 --- a/src/include/gtm/gtm_txn.h +++ b/src/include/gtm/gtm_txn.h @@ -116,8 +116,11 @@ typedef struct GTM_TransactionInfo GTM_IsolationLevel gti_isolevel; bool... [truncated message content] |
From: mason_s <ma...@us...> - 2010-09-06 23:58:58
|
Project "Postgres-XC". The branch, master has been updated via 19a8fa536779653524a1feb862c18277efa317f4 (commit) from 06c882f78694a31749746aad0cb76347a3f7bcef (commit) - Log ----------------------------------------------------------------- commit 19a8fa536779653524a1feb862c18277efa317f4 Author: Mason Sharp <ma...@us...> Date: Mon Sep 6 19:54:53 2010 -0400 Improved error handling. The primary focus is to better handle the case of a stopped or crashed data node on the coordinator. Also, before a rollback make sure connections are clean. If there was an error, tell the pooler to destroy the connections instead of returning them to the pools, even the data node connections that did not have an error but are involved in the statement. This is becaue there may be some remaining messages buffered or in transit, and could affect subsequent requests. diff --git a/src/backend/pgxc/pool/datanode.c b/src/backend/pgxc/pool/datanode.c index ba56ca1..31b5bc0 100644 --- a/src/backend/pgxc/pool/datanode.c +++ b/src/backend/pgxc/pool/datanode.c @@ -37,7 +37,6 @@ #include "utils/snapmgr.h" #include "../interfaces/libpq/libpq-fe.h" -#define NO_SOCKET -1 static int node_count = 0; static DataNodeHandle *handles = NULL; @@ -280,7 +279,8 @@ retry: { add_error_message(conn, "unexpected EOF on datanode connection"); elog(WARNING, "unexpected EOF on datanode connection"); - return EOF; + /* Should we read from the other connections before returning? */ + return EOF; } else { @@ -429,6 +429,18 @@ retry: } +/* + * Clear out socket data and buffer. + * Throw away any data. + */ +void +clear_socket_data (DataNodeHandle *conn) +{ + do { + conn->inStart = conn->inCursor = conn->inEnd = 0; + } while (data_node_read_data(conn) > 0); +} + /* * Get one character from the connection buffer and advance cursor */ @@ -529,14 +541,20 @@ get_message(DataNodeHandle *conn, int *len, char **msg) } -/* Release all data node connections back to pool and release occupied memory */ +/* + * Release all data node connections back to pool and release occupied memory + * + * If force_drop is true, we force dropping all of the connections, such as after + * a rollback, which was likely issued due to an error. + */ void -release_handles(void) +release_handles(bool force_drop) { int i; int discard[NumDataNodes]; int ndisc = 0; + if (node_count == 0) return; @@ -546,7 +564,9 @@ release_handles(void) if (handle->sock != NO_SOCKET) { - if (handle->state != DN_CONNECTION_STATE_IDLE) + if (force_drop) + discard[ndisc++] = handle->nodenum; + else if (handle->state != DN_CONNECTION_STATE_IDLE) { elog(WARNING, "Connection to data node %d has unexpected state %d and will be dropped", handle->nodenum, handle->state); discard[ndisc++] = handle->nodenum; @@ -1070,6 +1090,12 @@ get_transaction_nodes(DataNodeHandle **connections) { for (i = 0; i < NumDataNodes; i++) { + /* + * We may want to consider also not returning connections with a + * state of DN_CONNECTION_STATE_ERROR_NOT_READY or + * DN_CONNECTION_STATE_ERROR_FATAL. + * ERROR_NOT_READY can happen if the data node abruptly disconnects. + */ if (handles[i].sock != NO_SOCKET && handles[i].transaction_status != 'I') connections[tran_count++] = &handles[i]; } @@ -1077,3 +1103,29 @@ get_transaction_nodes(DataNodeHandle **connections) return tran_count; } + +/* + * Return those node connections that appear to be active and + * have data to consume on them. + */ +int +get_active_nodes (DataNodeHandle **connections) +{ + int active_count = 0; + int i; + + if (node_count) + { + for (i = 0; i < NumDataNodes; i++) + { + if (handles[i].sock != NO_SOCKET && + handles[i].state != DN_CONNECTION_STATE_IDLE && + handles[i].state != DN_CONNECTION_STATE_ERROR_NOT_READY && + handles[i].state != DN_CONNECTION_STATE_ERROR_FATAL) + connections[active_count++] = &handles[i]; + } + } + + return active_count; +} + diff --git a/src/backend/pgxc/pool/execRemote.c b/src/backend/pgxc/pool/execRemote.c index f065289..05dbe2e 100644 --- a/src/backend/pgxc/pool/execRemote.c +++ b/src/backend/pgxc/pool/execRemote.c @@ -15,6 +15,7 @@ *------------------------------------------------------------------------- */ +#include <time.h> #include "postgres.h" #include "access/gtm.h" #include "access/xact.h" @@ -30,6 +31,10 @@ #include "utils/tuplesort.h" #include "utils/snapmgr.h" +#define END_QUERY_TIMEOUT 20 +#define CLEAR_TIMEOUT 5 + + extern char *deparseSql(RemoteQueryState *scanstate); /* @@ -50,6 +55,9 @@ static int data_node_rollback(int conn_count, DataNodeHandle ** connections); static void clear_write_node_list(); +static int handle_response_clear(DataNodeHandle * conn); + + #define MAX_STATEMENTS_PER_TRAN 10 /* Variables to collect statistics */ @@ -761,7 +769,8 @@ HandleError(RemoteQueryState *combiner, char *msg_body, size_t len) { combiner->errorMessage = pstrdup(message); /* Error Code is exactly 5 significant bytes */ - memcpy(combiner->errorCode, code, 5); + if (code) + memcpy(combiner->errorCode, code, 5); } /* @@ -916,7 +925,7 @@ data_node_receive_responses(const int conn_count, DataNodeHandle ** connections, * Read results. * Note we try and read from data node connections even if there is an error on one, * so as to avoid reading incorrect results on the next statement. - * It might be better to just destroy these connections and tell the pool manager. + * Other safegaurds exist to avoid this, however. */ while (count > 0) { @@ -971,6 +980,7 @@ handle_response(DataNodeHandle * conn, RemoteQueryState *combiner) { char *msg; int msg_len; + char msg_type; for (;;) { @@ -991,7 +1001,8 @@ handle_response(DataNodeHandle * conn, RemoteQueryState *combiner) } /* TODO handle other possible responses */ - switch (get_message(conn, &msg_len, &msg)) + msg_type = get_message(conn, &msg_len, &msg); + switch (msg_type) { case '\0': /* Not enough data in the buffer */ conn->state = DN_CONNECTION_STATE_QUERY; @@ -1056,15 +1067,85 @@ handle_response(DataNodeHandle * conn, RemoteQueryState *combiner) case 'I': /* EmptyQuery */ default: /* sync lost? */ + elog(WARNING, "Received unsupported message type: %c", msg_type); conn->state = DN_CONNECTION_STATE_ERROR_FATAL; return RESPONSE_EOF; } } - /* Keep compiler quiet */ + return RESPONSE_EOF; } /* + * Like handle_response, but for consuming the messages, + * in case we of an error to clean the data node connection. + * Return values: + * RESPONSE_EOF - need to receive more data for the connection + * RESPONSE_COMPLETE - done with the connection, or done trying (error) + */ +static int +handle_response_clear(DataNodeHandle * conn) +{ + char *msg; + int msg_len; + char msg_type; + + for (;;) + { + /* No data available, exit */ + if (conn->state == DN_CONNECTION_STATE_QUERY) + return RESPONSE_EOF; + + /* + * If we are in the process of shutting down, we + * may be rolling back, and the buffer may contain other messages. + * We want to avoid a procarray exception + * as well as an error stack overflow. + */ + if (proc_exit_inprogress) + { + conn->state = DN_CONNECTION_STATE_ERROR_FATAL; + return RESPONSE_COMPLETE; + } + + msg_type = get_message(conn, &msg_len, &msg); + switch (msg_type) + { + case '\0': /* Not enough data in the buffer */ + case 'c': /* CopyToCommandComplete */ + case 'C': /* CommandComplete */ + case 'T': /* RowDescription */ + case 'D': /* DataRow */ + case 'H': /* CopyOutResponse */ + case 'd': /* CopyOutDataRow */ + case 'A': /* NotificationResponse */ + case 'N': /* NoticeResponse */ + break; + case 'E': /* ErrorResponse */ + conn->state = DN_CONNECTION_STATE_ERROR_NOT_READY; + /* + * Do not return with an error, we still need to consume Z, + * ready-for-query + */ + break; + case 'Z': /* ReadyForQuery */ + conn->transaction_status = msg[0]; + conn->state = DN_CONNECTION_STATE_IDLE; + return RESPONSE_COMPLETE; + case 'I': /* EmptyQuery */ + default: + /* sync lost? */ + elog(WARNING, "Received unsupported message type: %c", msg_type); + conn->state = DN_CONNECTION_STATE_ERROR_FATAL; + return RESPONSE_COMPLETE; + } + } + + return RESPONSE_EOF; +} + + +/* * Send BEGIN command to the Data nodes and receive responses */ static int @@ -1150,13 +1231,13 @@ finish: if (!autocommit) stat_transaction(tran_count); if (!PersistentConnections) - release_handles(); + release_handles(false); autocommit = true; clear_write_node_list(); if (res != 0) ereport(ERROR, (errcode(ERRCODE_INTERNAL_ERROR), - errmsg("Could not commit connection on data nodes"))); + errmsg("Could not commit (or autocommit) data node connection"))); } @@ -1271,6 +1352,7 @@ finish: /* * Rollback current transaction + * This will happen */ int DataNodeRollback(void) @@ -1279,6 +1361,10 @@ DataNodeRollback(void) int tran_count; DataNodeHandle *connections[NumDataNodes]; + + /* Consume any messages on the data nodes first if necessary */ + DataNodeConsumeMessages(); + /* gather connections to rollback */ tran_count = get_transaction_nodes(connections); @@ -1296,7 +1382,7 @@ finish: if (!autocommit) stat_transaction(tran_count); if (!PersistentConnections) - release_handles(); + release_handles(true); autocommit = true; clear_write_node_list(); return res; @@ -1313,11 +1399,19 @@ data_node_rollback(int conn_count, DataNodeHandle ** connections) struct timeval *timeout = NULL; RemoteQueryState *combiner; + + /* + * Rollback is a special case, being issued because of an error. + * We try to read and throw away any extra data on the connection before + * issuing our rollbacks so that we did not read the results of the + * previous command. + */ + for (i = 0; i < conn_count; i++) + clear_socket_data(connections[i]); + /* Send ROLLBACK - */ for (i = 0; i < conn_count; i++) - { data_node_send_query(connections[i], "ROLLBACK"); - } combiner = CreateResponseCombiner(conn_count, COMBINE_TYPE_NONE); /* Receive responses */ @@ -1487,7 +1581,7 @@ DataNodeCopyBegin(const char *query, List *nodelist, Snapshot snapshot, bool is_ if (need_tran) DataNodeCopyFinish(connections, 0, COMBINE_TYPE_NONE); else if (!PersistentConnections) - release_handles(); + release_handles(false); } pfree(connections); @@ -1711,7 +1805,7 @@ DataNodeCopyOut(Exec_Nodes *exec_nodes, DataNodeHandle** copy_connections, FILE* if (!ValidateAndCloseCombiner(combiner)) { if (autocommit && !PersistentConnections) - release_handles(); + release_handles(false); pfree(copy_connections); ereport(ERROR, (errcode(ERRCODE_DATA_CORRUPTED), @@ -2136,8 +2230,10 @@ ExecRemoteQuery(RemoteQueryState *node) if (connections[i]->transaction_status != 'T') new_connections[new_count++] = connections[i]; - if (new_count) - data_node_begin(new_count, new_connections, gxid); + if (new_count && data_node_begin(new_count, new_connections, gxid)) + ereport(ERROR, + (errcode(ERRCODE_INTERNAL_ERROR), + errmsg("Could not begin transaction on data nodes."))); } /* Get the SQL string */ @@ -2292,7 +2388,7 @@ ExecRemoteQuery(RemoteQueryState *node) { ExecSetSlotDescriptor(scanslot, node->tuple_desc); /* - * Now tuple table slot is responcible for freeing the + * Now tuple table slot is responsible for freeing the * descriptor */ node->tuple_desc = NULL; @@ -2492,9 +2588,88 @@ ExecRemoteQuery(RemoteQueryState *node) return resultslot; } +/* + * End the remote query + */ void ExecEndRemoteQuery(RemoteQueryState *node) { + + /* + * If processing was interrupted, (ex: client did not consume all the data, + * or a subquery with LIMIT) we may still have data on the nodes. Try and consume. + * We do not simply call DataNodeConsumeMessages, because the same + * connection could be used for multiple RemoteQuery steps. + * + * It seems most stable checking command_complete_count + * and only then working with conn_count + * + * PGXCTODO: Change in the future when we remove materialization nodes. + */ + if (node->command_complete_count < node->node_count) + { + elog(WARNING, "Extra data node messages when ending remote query step"); + + while (node->conn_count > 0) + { + int i = 0; + int res; + + /* + * Just consume the rest of the messages + */ + if ((i = node->current_conn + 1) == node->conn_count) + i = 0; + + for (;;) + { + /* throw away message */ + if (node->msg) + { + pfree(node->msg); + node->msg = NULL; + } + + res = handle_response(node->connections[i], node); + + if (res == RESPONSE_COMPLETE || + node->connections[i]->state == DN_CONNECTION_STATE_ERROR_FATAL || + node->connections[i]->state == DN_CONNECTION_STATE_ERROR_NOT_READY) + { + if (--node->conn_count == 0) + break; + if (i == node->conn_count) + i = 0; + else + node->connections[i] = node->connections[node->conn_count]; + if (node->current_conn == node->conn_count) + node->current_conn = i; + } + else if (res == RESPONSE_EOF) + { + /* go to next connection */ + if (++i == node->conn_count) + i = 0; + + /* if we cycled over all connections we need to receive more */ + if (i == node->current_conn) + { + struct timeval timeout; + timeout.tv_sec = END_QUERY_TIMEOUT; + timeout.tv_usec = 0; + + if (data_node_receive(node->conn_count, node->connections, &timeout)) + ereport(ERROR, + (errcode(ERRCODE_INTERNAL_ERROR), + errmsg("Failed to read response from data nodes when ending query"))); + } + } + } + } + elog(WARNING, "Data node connection buffers cleaned"); + } + + /* * Release tuplesort resources */ @@ -2517,6 +2692,64 @@ ExecEndRemoteQuery(RemoteQueryState *node) CloseCombiner(node); } +/* + * Consume any remaining messages on the connections. + * This is useful for calling after ereport() + */ +void +DataNodeConsumeMessages(void) +{ + int i; + int active_count = 0; + int res; + struct timeval timeout; + DataNodeHandle *connection = NULL; + DataNodeHandle **connections = NULL; + DataNodeHandle *active_connections[NumDataNodes]; + + + active_count = get_active_nodes(active_connections); + + /* Iterate through handles in use and try and clean */ + for (i = 0; i < active_count; i++) + { + elog(WARNING, "Consuming data node messages after error."); + + connection = active_connections[i]; + + res = RESPONSE_EOF; + + while (res != RESPONSE_COMPLETE) + { + int res = handle_response_clear(connection); + + if (res == RESPONSE_EOF) + { + if (!connections) + connections = (DataNodeHandle **) palloc(sizeof(DataNodeHandle*)); + + connections[0] = connection; + + /* Use a timeout so we do not wait forever */ + timeout.tv_sec = CLEAR_TIMEOUT; + timeout.tv_usec = 0; + if (data_node_receive(1, connections, &timeout)) + { + /* Mark this as bad, move on to next one */ + connection->state = DN_CONNECTION_STATE_ERROR_FATAL; + break; + } + } + if (connection->state == DN_CONNECTION_STATE_ERROR_FATAL + || connection->state == DN_CONNECTION_STATE_IDLE) + break; + } + } + + if (connections) + pfree(connections); +} + /* ---------------------------------------------------------------- * ExecRemoteQueryReScan @@ -2609,8 +2842,11 @@ ExecRemoteUtility(RemoteQuery *node) if (connections[i]->transaction_status != 'T') new_connections[new_count++] = connections[i]; - if (new_count) - data_node_begin(new_count, new_connections, gxid); + if (new_count && data_node_begin(new_count, new_connections, gxid)) + ereport(ERROR, + (errcode(ERRCODE_INTERNAL_ERROR), + errmsg("Could not begin transaction on data nodes"))); + } /* See if we have a primary nodes, execute on it first before the others */ @@ -2760,10 +2996,11 @@ DataNodeCleanAndRelease(int code, Datum arg) /* Rollback on GTM if transaction id opened. */ RollbackTranGTM((GlobalTransactionId) GetCurrentTransactionIdIfAny()); - } - /* Release data node connections */ - release_handles(); + release_handles(true); + } else + /* Release data node connections */ + release_handles(false); /* Close connection with GTM */ CloseGTM(); diff --git a/src/include/pgxc/datanode.h b/src/include/pgxc/datanode.h index 4202e2e..4039c45 100644 --- a/src/include/pgxc/datanode.h +++ b/src/include/pgxc/datanode.h @@ -23,6 +23,9 @@ #include "utils/snapshot.h" #include <unistd.h> +#define NO_SOCKET -1 + + /* Connection to data node maintained by Pool Manager */ typedef struct PGconn NODE_CONNECTION; @@ -80,8 +83,9 @@ extern int DataNodeConnClean(NODE_CONNECTION * conn); extern void DataNodeCleanAndRelease(int code, Datum arg); extern DataNodeHandle **get_handles(List *nodelist); -extern void release_handles(void); +extern void release_handles(bool force_drop); extern int get_transaction_nodes(DataNodeHandle ** connections); +extern int get_active_nodes(DataNodeHandle ** connections); extern int ensure_in_buffer_capacity(size_t bytes_needed, DataNodeHandle * handle); extern int ensure_out_buffer_capacity(size_t bytes_needed, DataNodeHandle * handle); @@ -100,5 +104,6 @@ extern int data_node_flush(DataNodeHandle *handle); extern char get_message(DataNodeHandle *conn, int *len, char **msg); extern void add_error_message(DataNodeHandle * handle, const char *message); +extern void clear_socket_data (DataNodeHandle *conn); #endif diff --git a/src/include/pgxc/execRemote.h b/src/include/pgxc/execRemote.h index 143c8fa..fbc4db0 100644 --- a/src/include/pgxc/execRemote.h +++ b/src/include/pgxc/execRemote.h @@ -96,6 +96,7 @@ extern int handle_response(DataNodeHandle * conn, RemoteQueryState *combiner); extern bool FetchTuple(RemoteQueryState *combiner, TupleTableSlot *slot); extern void ExecRemoteQueryReScan(RemoteQueryState *node, ExprContext *exprCtxt); +extern void DataNodeConsumeMessages(void); extern int primary_data_node; #endif ----------------------------------------------------------------------- Summary of changes: src/backend/pgxc/pool/datanode.c | 62 ++++++++- src/backend/pgxc/pool/execRemote.c | 275 +++++++++++++++++++++++++++++++++--- src/include/pgxc/datanode.h | 7 +- src/include/pgxc/execRemote.h | 1 + 4 files changed, 320 insertions(+), 25 deletions(-) hooks/post-receive -- Postgres-XC |
From: mason_s <ma...@us...> - 2010-08-31 17:22:41
|
Project "Postgres-XC". The branch, master has been updated via 06c882f78694a31749746aad0cb76347a3f7bcef (commit) from 58d1f0d4fe5de5db3655f52df9031abc1ce5b84e (commit) - Log ----------------------------------------------------------------- commit 06c882f78694a31749746aad0cb76347a3f7bcef Author: Mason Sharp <ma...@us...> Date: Tue Aug 31 13:21:36 2010 -0400 Fix a bug with AVG() We tried to avoid coordinator aggregate handling when only a single node is involved, but that causes a problem for some aggregates. diff --git a/src/backend/pgxc/plan/planner.c b/src/backend/pgxc/plan/planner.c index c8911b7..e18e813 100644 --- a/src/backend/pgxc/plan/planner.c +++ b/src/backend/pgxc/plan/planner.c @@ -2158,10 +2158,14 @@ pgxc_planner(Query *query, int cursorOptions, ParamListInfo boundParams) if (query_step->exec_nodes) query_step->combine_type = get_plan_combine_type( query, query_step->exec_nodes->baselocatortype); - /* Only set up if running on more than one node */ - if (query_step->exec_nodes && query_step->exec_nodes->nodelist && - list_length(query_step->exec_nodes->nodelist) > 1) - query_step->simple_aggregates = get_simple_aggregates(query); + + /* Set up simple aggregates */ + /* PGXCTODO - we should detect what types of aggregates are used. + * in some cases we can avoid the final step and merely proxy results + * (when there is only one data node involved) instead of using + * coordinator consolidation. At the moment this is needed for AVG() + */ + query_step->simple_aggregates = get_simple_aggregates(query); /* * Add sorting to the step ----------------------------------------------------------------------- Summary of changes: src/backend/pgxc/plan/planner.c | 12 ++++++++---- 1 files changed, 8 insertions(+), 4 deletions(-) hooks/post-receive -- Postgres-XC |
From: mason_s <ma...@us...> - 2010-08-31 17:11:13
|
Project "Postgres-XC". The branch, master has been updated via 58d1f0d4fe5de5db3655f52df9031abc1ce5b84e (commit) from 9894afcd6d20b47c303c49b8ed5141d2b7902237 (commit) - Log ----------------------------------------------------------------- commit 58d1f0d4fe5de5db3655f52df9031abc1ce5b84e Author: Mason Sharp <ma...@us...> Date: Tue Aug 31 13:09:34 2010 -0400 Fixed a bug in GTM introduced with timestamp piggybacking with GXID. Without this, one could not use GTM directly, only through the proxy Discovered and written by Andrei Martsinchyk diff --git a/src/gtm/main/gtm_txn.c b/src/gtm/main/gtm_txn.c index dec0a63..2205167 100644 --- a/src/gtm/main/gtm_txn.c +++ b/src/gtm/main/gtm_txn.c @@ -894,6 +894,7 @@ ProcessBeginTransactionGetGXIDCommand(Port *myport, StringInfo message) StringInfoData buf; GTM_TransactionHandle txn; GlobalTransactionId gxid; + GTM_Timestamp timestamp; MemoryContext oldContext; txn_isolation_level = pq_getmsgint(message, sizeof (GTM_IsolationLevel)); @@ -901,6 +902,9 @@ ProcessBeginTransactionGetGXIDCommand(Port *myport, StringInfo message) oldContext = MemoryContextSwitchTo(TopMemoryContext); + /* GXID has been received, now it's time to get a GTM timestamp */ + timestamp = GTM_TimestampGetCurrent(); + /* * Start a new transaction * @@ -931,6 +935,7 @@ ProcessBeginTransactionGetGXIDCommand(Port *myport, StringInfo message) pq_sendbytes(&buf, (char *)&proxyhdr, sizeof (GTM_ProxyMsgHeader)); } pq_sendbytes(&buf, (char *)&gxid, sizeof(gxid)); + pq_sendbytes(&buf, (char *)×tamp, sizeof (GTM_Timestamp)); pq_endmessage(myport, &buf); if (!myport->is_proxy) ----------------------------------------------------------------------- Summary of changes: src/gtm/main/gtm_txn.c | 5 +++++ 1 files changed, 5 insertions(+), 0 deletions(-) hooks/post-receive -- Postgres-XC |
From: Michael P. <mic...@us...> - 2010-08-23 08:15:54
|
Project "Postgres-XC". The annotated tag, v0.9.2 has been created at 7402b46760f3fd0d140fd177edfecaae31ec058b (tag) tagging d7ca431066efe320107581186ab853b28fa5f7a7 (commit) replaces v0.9.1 tagged by Michael P on Mon Aug 23 17:16:55 2010 +0900 - Log ----------------------------------------------------------------- Postgres-XC version 0.9.2 tag Andrei Martsinchyk (4): Reverted PANIC ereports back to ERROR Use ereport instead of Assert if sort operation is not defined If expressions should be added to ORDER BY clause of the step query Fixed a bug when searching terminating semicolon. Mason S (15): Fixed a bug when using a table after it had been created in the same Minor change that updates COPY so that it knows ahead Add support for immutable stored functions and enable support Support for pg_dump and pg_restore. Add support for views. When using hash distributed tables and a value that corresponds to Do not allow WITH RECURSIVE or windowing functions until Do not yet allow creation of temp tables until we properly handle them. Handle more types of queries to determine whether or not they Allow rules to be created, provided that they do not use NOTIFY, Fixed assertion Add support for ORDER BY adn DISTINCT. Changed some error messages so that they will not be duplicates In Postgres-XC, the error stack may overflow because Fix a crash that may occur within the pooler when a Michael P (3): Remove an unnecessary file for the repository. Support for RENAME/DROP SCHEMA with sequences Support for cold synchronization of catalog table of coordinator. Pavan Deolasee (3): Add support for ALTER Sequence. Michael Paquier with some editorilization from Pavan Deolasee Add a missing include file from the previous commit Handling ALTER SEQUENCE at the GTM proxy as well. Michael Paquier. ----------------------------------------------------------------------- hooks/post-receive -- Postgres-XC |