summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2017-08-04Message style improvementsPeter Eisentraut
2017-08-04hash: Increase the number of possible overflow bitmaps by 8x.Robert Haas
Per a report from AP, it's not that hard to exhaust the supply of bitmap pages if you create a table with a hash index and then insert a few billion rows - and then you start getting errors when you try to insert additional rows. In the particular case reported by AP, there's another fix that we can make to improve recycling of overflow pages, which is another way to avoid the error, but there may be other cases where this problem happens and that fix won't help. So let's buy ourselves as much headroom as we can without rearchitecting anything. The comments claim that the old limit was 64GB, but it was really only 32GB, because we didn't use all the bits in the page for bitmap bits - only the largest power of 2 that could fit after deducting space for the page header and so forth. Thus, we have 4kB per page for bitmap bits, not 8kB. The new limit is thus actually 8 times the old *real* limit but only 4 times the old *purported* limit. Since this breaks on-disk compatibility, bump HASH_VERSION. We've already done this earlier in this release cycle, so this doesn't cause any incremental inconvenience for people using pg_upgrade from releases prior to v10. However, users who use pg_upgrade to reach 10beta3 or later from 10beta2 or earlier will need to REINDEX any hash indexes again. Amit Kapila and Robert Haas Discussion: http://postgr.es/m/[email protected]
2017-08-04Apply ALTER ... SET NOT NULL recursively in ALTER ... ADD PRIMARY KEY.Tom Lane
If you do ALTER COLUMN SET NOT NULL against an inheritance parent table, it will recurse to mark all the child columns as NOT NULL as well. This is necessary for consistency: if the column is labeled NOT NULL then reading it should never produce nulls. However, that didn't happen in the case where ALTER ... ADD PRIMARY KEY marks a target column NOT NULL that wasn't before. That was questionable from the beginning, and now Tushar Ahuja points out that it can lead to dump/restore failures in some cases. So let's make that case recurse too. Although this is meant to fix a bug, it's enough of a behavioral change that I'm pretty hesitant to back-patch, especially in view of the lack of similar field complaints. It doesn't seem to be too late to put it into v10 though. Michael Paquier, editorialized on slightly by me Discussion: https://postgr.es/m/[email protected]
2017-08-04Disallow SSL session tickets.Tom Lane
We don't actually support session tickets, since we do not create an SSL session identifier. But it seems that OpenSSL will issue a session ticket on-demand anyway, which will then fail when used. This results in reconnection failures when using ticket-aware client-side SSL libraries (such as the Npgsql .NET driver), as reported by Shay Rojansky. To fix, just tell OpenSSL not to issue tickets. At some point in the far future, we might consider enabling tickets instead. But the security implications of that aren't entirely clear; and besides it would have little benefit except for very short-lived database connections, which is Something We're Bad At anyhow. It would take a lot of other work to get to a point where that would really be an exciting thing to do. While at it, also tell OpenSSL not to use a session cache. This doesn't really do anything, since a backend would never populate the cache anyway, but it might gain some micro-efficiencies and/or reduce security exposures. Patch by me, per discussion with Heikki Linnakangas and Shay Rojansky. Back-patch to all supported versions. Discussion: https://postgr.es/m/CADT4RqBU8N-csyZuzaook-c795dt22Zcwg1aHWB6tfVdAkodZA@mail.gmail.com
2017-08-04Check for partitioned table correctly.Pavan Deolasee
While checking where to forward DROP TABLE command, we were not checking for partitioned table correctly. That resuled in incorrectly sending DROP TABLE to remote coordinator for temporary partitioned tables.
2017-08-04Correct a mistake occurred during merging sequence.c codePavan Deolasee
We were incorrectly overwriting the 'cached' value in the SeqTable element, thus causing another request to the GTM when nextval is fetched. This resulted in an unintentional gaps in the sequence values. This patch fixes that, though we might still get gaps unless sequence_range is set to 1. But this is by design to reduce repeated round trips to the GTM.
2017-08-04Further unify ROLE and USER command grammar rulesPeter Eisentraut
ALTER USER ... SET did not support all the syntax variants of ALTER ROLE ... SET. Fix that, and to avoid further deviations of this kind, unify many the grammar rules for ROLE/USER/GROUP commands. Reported-by: Pavel Golub <[email protected]>
2017-08-03Fix pg_dump/pg_restore to emit REFRESH MATERIALIZED VIEW commands last.Tom Lane
Because we push all ACL (i.e. GRANT/REVOKE) restore steps to the end, materialized view refreshes were occurring while the permissions on referenced objects were still at defaults. This led to failures if, say, an MV owned by user A reads from a table owned by user B, even if B had granted the necessary privileges to A. We've had multiple complaints about that type of restore failure, most recently from Jordan Gigov. The ideal fix for this would be to start treating ACLs as dependency- sortable objects, rather than hard-wiring anything about their dump order (the existing approach is a messy kluge dating to commit dc0e76ca3). But that's going to be a rather major change, and it certainly wouldn't lead to a back-patchable fix. As a short-term solution, convert the existing two-pass hack (ie, normal objects then ACLs) to a three-pass hack, ie, normal objects then ACLs then matview refreshes. Because this happens in RestoreArchive(), it will also fix the problem when restoring from an existing archive-format dump. (Note this means that if a matview refresh would have failed under the permissions prevailing at dump time, it'll fail during restore as well. We'll define that as user error rather than something we should try to work around.) To avoid performance loss in parallel restore, we need the matview refreshes to still be parallelizable. Hence, clean things up enough so that both ACLs and matviews are handled by the parallel restore infrastructure, instead of reverting back to serial restore for ACLs. There is still a final serial step, but it shouldn't normally have to do anything; it's only there to try to recover if we get stuck due to some problem like unresolved circular dependencies. Patch by me, but it owes something to an earlier attempt by Kevin Grittner. Back-patch to 9.3 where materialized views were introduced. Discussion: https://postgr.es/m/[email protected]
2017-08-03Fix build on zlib-less environmentsAlvaro Herrera
Commit 4d57e8381677 added support for getting I/O errors out of zlib, but it introduced a portability problem for systems without zlib. Repair by wrapping the zlib call inside #ifdef and restore the original code in the other branch. This serves to illustrate the inadequacy of the zlib abstraction in pg_backup_archiver: there is no way to call gzerror() in that abstraction. This means that the several places that call GZREAD and GZWRITE are currently doing error reporting wrongly, but ENOTIME to get it fixed before next week's release set. Backpatch to 9.4, like the commit that introduced the problem.
2017-08-03Fix lock upgrade hazard in ATExecAttachPartition.Robert Haas
Amit Langote Discussion: http://postgr.es/m/CAFjFpReT_kq_uwU_B8aWDxR7jNGE=P0iELycdq5oupi=xSQTOw@mail.gmail.com
2017-08-03Code beautification for ATExecAttachPartition.Robert Haas
Amit Langote Discussion: http://postgr.es/m/CAFjFpReT_kq_uwU_B8aWDxR7jNGE=P0iELycdq5oupi=xSQTOw@mail.gmail.com
2017-08-03Allow a foreign table CHECK constraint to be initially NOT VALID.Robert Haas
For a table, the constraint can be considered validated immediately, because the table must be empty. But for a foreign table this is not necessarily the case. Fixes a bug in commit f27a6b15e6566fba7748d0d9a3fc5bcfd52c4a1b. Amit Langote, with some changes by me. Discussion: http://postgr.es/m/[email protected]
2017-08-03Improve ExecModifyTable comments.Robert Haas
Some of these comments wrongly implied that only an AFTER ROW trigger will cause a 'wholerow' attribute to be present for a foreign table, but a BEFORE ROW trigger can have the same effect. Others implied that it would always be present for a foreign table, but that's not true either. Etsuro Fujita and Robert Haas Discussion: http://postgr.es/m/[email protected]
2017-08-03Teach map_partition_varattnos to handle whole-row expressions.Robert Haas
Otherwise, partitioned tables with RETURNING expressions or subject to a WITH CHECK OPTION do not work properly. Amit Langote, reviewed by Amit Khandekar and Etsuro Fujita. A few comment changes by me. Discussion: http://postgr.es/m/[email protected]
2017-08-03Add new files to nls.mk and add translation markersPeter Eisentraut
2017-08-02Fix pg_dump's errno checking for zlib I/OAlvaro Herrera
Some error reports were reporting strerror(errno), which for some error conditions coming from zlib are wrong, resulting in confusing reports such as pg_restore: [compress_io] could not read from input file: Success which makes no sense. To correctly extract the error message we need to use gzerror(), so let's do that. This isn't as comprehensive or as neat as I would like, but at least it should improve things in many common cases. The zlib abstraction in compress_io does not seem to be applied consistently enough; we could perhaps improve that, but it seems master-only material, not a bug fix for back-patching. This problem goes back all the way, but I decided to apply back to 9.4 only, because older branches don't contain commit 14ea89366 which this change depends on. Authors: Vladimir Kunschikov, Álvaro Herrera Discussion: https://postgr.es/m/[email protected]
2017-08-02Remove broken and useless entry-count printing in HASH_DEBUG code.Tom Lane
init_htab(), with #define HASH_DEBUG, prints a bunch of hashtable parameters. It used to also print nentries, but commit 44ca4022f changed that to "hash_get_num_entries(hctl)", which is wrong (the parameter should be "hashp"). Rather than correct the coding, though, let's just remove that field from the printout. The table must be empty, since we just finished building it, so expensively calculating the number of entries is rather pointless. Moreover hash_get_num_entries makes assumptions (about not needing locks) which we could do without in debugging code. Noted by Choi Doo-Won in bug #14764. Back-patch to 9.6 where the faulty code was introduced. Discussion: https://postgr.es/m/[email protected]
2017-08-02Get a snapshot before COPY in table syncPeter Eisentraut
This fixes a crash if the local table has a function index and the function makes non-immutable calls. Reported-by: Scott Milliken <[email protected]> Author: Masahiko Sawada <[email protected]>
2017-08-02Remove duplicate setting of SSL_OP_SINGLE_DH_USE option.Tom Lane
Commit c0a15e07c moved the setting of OpenSSL's SSL_OP_SINGLE_DH_USE option into a new subroutine initialize_dh(), but forgot to remove it from where it was. SSL_CTX_set_options() is a trivial function, amounting indeed to just "ctx->options |= op", hence there's no reason to contort the code or break separation of concerns to avoid calling it twice. So separating the DH setup from disabling of old protocol versions is a good change, but we need to finish the job. Noted while poking into the question of SSL session tickets.
2017-08-02Fix OBJECT_TYPE/OBJECT_DOMAIN confusionPeter Eisentraut
This doesn't have a significant impact except that now SECURITY LABEL ON DOMAIN rejects types that are not domains. Reported-by: 高增琦 <[email protected]>
2017-08-02Make temporary tables use shared storage on datanodesPavan Deolasee
Since a temporary table may be accessed by multiple backends on a datanode, XL mostly treats such tables as regular tables. But the technique that was used to distingush between temporary tables that may need shared storage vs those which are accessed only by a single backend, wasn't very full proof. We were relying on global session activation to make that distinction. This clearly fails when a background process, such as autovacuuum process, tries to figure out whether a table is using local or shared storage. This was leading to various problems, such as, when the underlying file system objects for the table were getting cleaned up, but without first discarding all references to the table from the shared buffers. We now make all temp tables to use shared storage on the datanodes and thus simplify things. Only EXECUTE DIRECT anyways does not set up global session, so I don't think this will have any meaningful impact on the performance. This should fix the checkpoint failures during regression tests.
2017-08-02Revert test case added by commit 1e165d05fe06a9072867607886f818bc255507db.Tom Lane
The buildfarm is still showing at least three distinct behaviors for a bad locale name in CREATE COLLATION. Although this test was helpful for getting the error reporting code into some usable shape, it doesn't seem worth carrying multiple expected-files in order to support the test in perpetuity. So pull it back out. Discussion: https://postgr.es/m/CAKKotZS-wcDcofXDCH=sidiuajE+nqHn2CGjLLX78anyDmi3gQ@mail.gmail.com
2017-08-01Second try at getting useful errors out of newlocale/_create_locale.Tom Lane
The early buildfarm returns for commit 1e165d05f are pretty awful: not only does Windows not return a useful error, but it looks like a lot of Unix-ish platforms don't either. Given the number of different errnos seen so far, guess that what's really going on is that some newlocale() implementations fail to set errno at all. Hence, let's try zeroing errno just before newlocale() and then if it's still zero report as though it's ENOENT. That should cover the Windows case too. It's clear that we'll have to drop the regression test case, unless we want to maintain a separate expected-file for platforms without HAVE_LOCALE_T. But I'll leave it there awhile longer to see if this actually improves matters or not. Discussion: https://postgr.es/m/CAKKotZS-wcDcofXDCH=sidiuajE+nqHn2CGjLLX78anyDmi3gQ@mail.gmail.com
2017-08-01Suppress less info in regression tests using DROP CASCADE.Tom Lane
DROP CASCADE doesn't currently promise to visit dependent objects in a fixed order, so when the regression tests use it, we typically need to suppress the details of which objects get dropped in order to have predictable test output. Traditionally we've done that by setting client_min_messages higher than NOTICE, but there's a better way: we can "\set VERBOSITY terse" in psql. That suppresses the DETAIL message with the object list, but we still get the basic notice telling how many objects were dropped. So at least the test case can verify that the expected number of objects were dropped. The VERBOSITY method was already in use in a few places, but run around and use it wherever it makes sense. Discussion: https://postgr.es/m/[email protected]
2017-08-01Try to deliver a sane message for _create_locale() failure on Windows.Tom Lane
We were just printing errno, which is certainly not gonna work on Windows. Now, it's not entirely clear from Microsoft's documentation whether _create_locale() adheres to standard Windows error reporting conventions, but let's assume it does and try to map the GetLastError result to an errno. If this turns out not to work, probably the best thing to do will be to assume the error is always ENOENT on Windows. This is a longstanding bug, but given the lack of previous field complaints, I'm not excited about back-patching it. Per report from Murtuza Zabuawala. Discussion: https://postgr.es/m/CAKKotZS-wcDcofXDCH=sidiuajE+nqHn2CGjLLX78anyDmi3gQ@mail.gmail.com
2017-08-01Allow creation of C/POSIX collations without depending on libc behavior.Tom Lane
Most of our collations code has special handling for the locale names "C" and "POSIX", allowing those collations to be used whether or not the system libraries think those locale names are valid, or indeed whether said libraries even have any locale support. But we missed handling things that way in CREATE COLLATION. This meant you couldn't clone the C/POSIX collations, nor explicitly define a new collation using those locale names, unless the libraries allow it. That's pretty pointless, as well as being a violation of pg_newlocale_from_collation's API specification. The practical effect of this change is quite limited: it allows creating such collations even on platforms that don't HAVE_LOCALE_T, and it allows making "POSIX" collation objects on Windows, which before this would only let you make "C" collation objects. Hence, even though this is a bug fix IMO, it doesn't seem worth the trouble to back-patch. In passing, suppress the DROP CASCADE detail messages at the end of the collation regression test. I'm surprised we've never been bit by message ordering issues there. Per report from Murtuza Zabuawala. Discussion: https://postgr.es/m/CAKKotZS-wcDcofXDCH=sidiuajE+nqHn2CGjLLX78anyDmi3gQ@mail.gmail.com
2017-08-01Comment fix for partition_rbound_cmp().Dean Rasheed
This was an oversight in d363d42. Beena Emerson
2017-07-31Fix comment.Tatsuo Ishii
XLByteToSeg and XLByteToPrevSeg calculate only a segment number. The definition of these macros were modified by commit dfda6ebaec6763090fb78b458a979b558c50b39b but the comment remain unchanged. Patch by Yugo Nagata. Back patched to 9.3 and beyond.
2017-07-31Fix typoPeter Eisentraut
Author: Masahiko Sawada <[email protected]>
2017-07-31Fix typoPeter Eisentraut
Author: Etsuro Fujita <[email protected]>
2017-07-31Always use 2048 bit DH parameters for OpenSSL ephemeral DH ciphers.Heikki Linnakangas
1024 bits is considered weak these days, but OpenSSL always passes 1024 as the key length to the tmp_dh callback. All the code to handle other key lengths is, in fact, dead. To remedy those issues: * Only include hard-coded 2048-bit parameters. * Set the parameters directly with SSL_CTX_set_tmp_dh(), without the callback * The name of the file containing the DH parameters is now a GUC. This replaces the old hardcoded "dh1024.pem" filename. (The files for other key lengths, dh512.pem, dh2048.pem, etc. were never actually used.) This is not a new problem, but it doesn't seem worth the risk and churn to backport. If you care enough about the strength of the DH parameters on old versions, you can create custom DH parameters, with as many bits as you wish, and put them in the "dh1024.pem" file. Per report by Nicolas Guini and Damian Quiroga. Reviewed by Michael Paquier. Discussion: https://www.postgresql.org/message-id/CAMxBoUyjOOautVozN6ofzym828aNrDjuCcOTcCquxjwS-L2hGQ@mail.gmail.com
2017-07-31Tighten coding for non-composite case in plperl's return_next.Tom Lane
Coverity complained about this code's practice of using scalar variables as single-element arrays. While that's really just nitpicking, it probably is more readable to declare them as arrays, so let's do that. A more important point is that the code was just blithely assuming that the result tupledesc has exactly one column; if it doesn't, we'd likely get a crash of some sort in tuplestore_putvalues. Since the tupledesc is manufactured outside of plperl, that seems like an uncomfortably long chain of assumptions. We can nail it down at little cost with a sanity check earlier in the function.
2017-07-31Fix function comment for dumpACL()Stephen Frost
The comment for dumpACL() got neglected when initacls and initracls were added and the discussion of what 'racls' is wasn't very clear either. Per complaint from Tom.
2017-07-31Don't run ALTER ENUM in an autocommit block on remote nodesPavan Deolasee
Before PG 10, Postgres did not allow ALTER ENUM to be run inside a transaction block. So we used to run these commands in auto-commit mode on the remote nodes. But now Postgres has removed the restriction. So we also run the statements in transaction block. This fixes regression failures in the 'enum' test case.
2017-07-31Copy distribution information correctly to ProjectSet pathPavan Deolasee
ProjectSet is a new path type in PG 10 and we'd missed to copy the distribution information correctly to the path. This was resulting in failures in many regression test cases. Lack of distribution information, prevented the distributed query planner from adding a Remote Subplan node on top of the plan, thus resulting in local execution of the plan. Since the underlying table is actually a distributed table, local execution fails to fetch any data. Fix this by properly copying distribution info. Several regression failures are fixed automatically with this patch.
2017-07-31Add missing comment in postgresql.conf.Tatsuo Ishii
current_source requires to restart server to reflect the new value. Per Yugo Nagata and Masahiko Sawada. Back patched to 9.2 and beyond.
2017-07-31Add missing comment in postgresql.conf.Tatsuo Ishii
dynamic_shared_memory_type requires to restart server to reflect the new value. Per Yugo Nagata and Masahiko Sawada. Back pached to 9.4 and beyond.
2017-07-31Add missing comment in postgresql.conf.Tatsuo Ishii
max_logical_replication_workers requires to restart server to reflect the new value. Per Yugo Nagata. Minor editing by me.
2017-07-31Partially accept plan changes in updatable_viewsTomas Vondra
Upstream commit 215b43cdc8d6b4a1700886a39df1ee735cb0274d significantly reworked planning of leaky functions. In practice that change means we no longer have to push leaky functions into a subquery. Which greatly simplifies some plans, including the two in this patch. This commit accepts the plans only partially, though. It uses the plans from upstream, and adds a Remote Subquery Scan node at the top, so we accept the general plan shape change. But there are a few additional differences that need futher evaluation, particularly in target lists (Postgres-XL generating more entries than upstream) and SubPlans (Postgres-XL only generating one subplan, while upstream generates two alternative ones).
2017-07-31Accept aggregation plan changes in xc_remote testsTomas Vondra
The plans changed mainly due to abandoning the custom implementation two-phase aggregation code, and using the upstream parallel aggregation. That means we have stopped showing schema name in target lists, so instead of Output: pg_catalog.avg((avg(xcrem_employee.salary))) the EXPLAIN now shows Output: avg(xcrem_employee.salary) and we also do projection at the scan nodes, so the target list only shows the necessary subset of columns. A somewhat surprising change is that the plans switch from distributed aggregate plans like this one -> Aggregate -> Remote Subquery Scan -> Aggregate -> Seq Scan to always performing simple (non-distributed) aggregate like this -> Aggregate -> Remote Subquery Scan -> Seq Scan This happens due to create_grouping_paths() relying on consider_parallel flag when setting try_distributed_aggregate, disabling distributed aggregation when consider_parallel=false. Both affected plans are however for UPDATE queries, and PostgreSQL disables parallelism for queries that do writes, so we end up with try_distributed_aggregate=false. We should probably enable distributed aggregates in these cases, but we can't ignore consider_parallel entirely, as we likely need some of the checks. We will probably end up with consider_distributed flag, set in a similar way to consider_parallel, but that's more an enhancement than a bug fix.
2017-07-31Reject SQL functions containing utility statementsTomas Vondra
The check was not effective for the same reason as 5a54abb7acd, that is not accounting for XL wrapping the original command into RawStmt. Fix that by checking parsetree->stmt, and also add an assert checking we actually got a RawStmt in the first place.
2017-07-31Produce proper error message for COPY (SELECT INTO)Tomas Vondra
Produce the right error message for COPY (SELECT INTO) queries, that is ERROR: COPY (SELECT INTO) is not supported instead of the incorrect ERROR: COPY query must have a RETURNING clause The root cause is that the check in BeginCopy() was testing raw_query, but XL wraps the original command in RawStmt, so we should be checking raw_query->stmt instead.
2017-07-31Add explicit VACUUM to inet test to actually do IOSTomas Vondra
Some of the queries in inet test are meant to exercise Index Only Scans. Postgres-XL was not however picking those plans due to stale stats on the coordinator (reltuples and relpages in pg_class). On plain PostgreSQL the tests work fine, as CREATE INDEX also updates statistics stored in the pg_class catalog. For example this CREATE TABLE t (a INT); INSERT INTO t SELECT i FROM generate_series(1,1000) s(i); SELECT relpages, reltuples FROM pg_class WHERE relname = 't'; CREATE INDEX ON t(a); SELECT relpages, reltuples FROM pg_class WHERE relname = 't'; will show zeroes before the CREATE INDEX command, and accurate values after it completes. On Postgres-XL that is not the case, and we will return zeroes even after the CREATE INDEX command. To actually update the statistics we need to fetch information from the datanodes the way VACUUM does it. Fixed by adding an explicit VACUUM call right after the CREATE INDEX, to fetch the stats from the datanodes and update the coordinator catalogs.
2017-07-31Tweak the query plan check in join regression testTomas Vondra
The test expects the plan to use Index Scan, but with 1000 rows the differences are very small. With two data nodes, we however compute the estimates as if the tables had 500 rows, making the cost difference even smaller. Fixed by increasing the total number of rows to 2000, which means each datanode has about 1000 and uses the same cost estimates as upstream.
2017-07-31Remove extra snprintf call in pg_tablespace_databasesTomas Vondra
The XL code did two function calls in the else branch, about like this: else /* Postgres-XC tablespaces also include node name in path */ sprintf(fctx->location, "pg_tblspc/%u/%s_%s", tablespaceOid, TABLESPACE_VERSION_DIRECTORY, PGXCNodeName); fctx->location = psprintf("pg_tblspc/%u/%s_%s", tablespaceOid, TABLESPACE_VERSION_DIRECTORY, PGXCNodeName); which is wrong, as only the first call is actually the else branch, the second call is executed unconditionally. In fact, the two calls attempt to construct the same location string, but the sprintf call assumes the 'fctx->location' string is already allocated. But it actually is not, so it's likely to cause a segfault. Fixed by removing the sprintf() call, keeping just the psprintf() one. Noticed thanks to GCC 6.3 complaining about incorrect indentation. Backpatch to XL 9.5.
2017-07-31Fix confusing indentation in gtm_client.cTomas Vondra
GCC 6.3 complains that the indentation in gtm_sync_standby() is somewhat confusing, as it might mislead people to think that a command is part of an if branch. So fix that by removing the unnecessary indentation.
2017-07-31Refactor the construction of distributed grouping pathsTomas Vondra
The code generating distributed grouping paths was originally structured like this: if (try_distributed_aggregation) { ... } if (can_sort && try_distributed_aggregation) { ... } if (can_hash && try_distributed_aggregation) { ... } It's refactored like this, to resemble the upstream part of the code: if (try_distributed_aggregation) { ... if (can_sort) { ... } if (can_hash) { ... } }
2017-07-31Accept plan change in xc_groupby regression testTomas Vondra
The plan changed in two ways. Firstly, the targetlists changed due to abandoning the custom distributed aggregation and reusing the upstream partial aggregation code. That means we're not prefixing the aggregate with schema name, etc. The plan also switches from distributed aggregation to plain aggregation with all the work done on top of a remote query. This happens simply due to costing, as the tables are tiny and two-phase aggregation has some overhead. The original implementation (as in XL 9.5) distributed the aggregate unconditionally, ignoring the costing. Parf of the problem is that the query groups by two columns from two different tables, resulting in overestimation of the number of groups. That means the optimizer thinks distributing the aggregation would not reduce the number of rows, which increases the cost estimate as each row requires network transfer and the finalize aggregate also depends on the number of input rows. We could make the tables larger and the optimizer would eventually switch to distributed aggregate. For example this seems to do the trick: insert into xc_groupby_tab1 select 1, mod(i,1000) from generate_series(1,20000) s(i); insert into xc_groupby_tab2 select 1, mod(i,1000) from generate_series(1,20000) s(i); But it does not seem worth it, considering it's just a workaround for the estimation issue and the increased duration. And we already have other regression tests testing plausible queries benefiting from distributed aggregation. So just accept the plan change.
2017-07-30Move ExecProcNode from dispatch to function pointer based model.Andres Freund
This allows us to add stack-depth checks the first time an executor node is called, and skip that overhead on following calls. Additionally it yields a nice speedup. While it'd probably have been a good idea to have that check all along, it has become more important after the new expression evaluation framework in b8d7f053c5c2bf2a7e - there's no stack depth check in common paths anymore now. We previously relied on ExecEvalExpr() being executed somewhere. We should move towards that model for further routines, but as this is required for v10, it seems better to only do the necessary (which already is quite large). Author: Andres Freund, Tom Lane Reported-By: Julien Rouhaud Discussion: https://postgr.es/m/[email protected] https://postgr.es/m/[email protected]
2017-07-30Move interrupt checking from ExecProcNode() to executor nodes.Andres Freund
In a followup commit ExecProcNode(), and especially the large switch it contains, will largely be replaced by a function pointer directly to the correct node. The node functions will then get invoked by a thin inline function wrapper. To avoid having to include miscadmin.h in headers - CHECK_FOR_INTERRUPTS() - move the interrupt checks into the individual executor routines. While looking through all executor nodes, I noticed a number of arguably missing interrupt checks, add these too. Author: Andres Freund, Tom Lane Reviewed-By: Tom Lane Discussion: https://postgr.es/m/[email protected]