summaryrefslogtreecommitdiff
path: root/src/backend/utils/adt/tsquery_rewrite.c
AgeCommit message (Collapse)Author
2025-01-01Update copyright for 2025Bruce Momjian
Backpatch-through: 13
2024-01-04Update copyright for 2024Bruce Momjian
Reported-by: Michael Paquier Discussion: https://postgr.es/m/[email protected] Backpatch-through: 12
2024-01-02tsquery: fix typo "rewrited" -> "rewritten"Robert Haas
Dagfinn Ilmari MannsÃ¥ker, reviewed by Shubham Khanna. Discussion: http://postgr.es/m/[email protected]
2023-01-02Update copyright for 2023Bruce Momjian
Backpatch-through: 11
2022-01-08Update copyright for 2022Bruce Momjian
Backpatch-through: 10
2021-01-02Update copyright for 2021Bruce Momjian
Backpatch-through: 9.5
2020-01-01Update copyrights for 2020Bruce Momjian
Backpatch-through: update all files in master, backpatch legal files through 9.4
2019-01-02Update copyright for 2019Bruce Momjian
Backpatch-through: certain files through 9.4
2018-01-03Update copyright for 2018Bruce Momjian
Backpatch-through: certain files through 9.3
2017-03-12Use wrappers of PG_DETOAST_DATUM_PACKED() more.Noah Misch
This makes almost all core code follow the policy introduced in the previous commit. Specific decisions: - Text search support functions with char* and length arguments, such as prsstart and lexize, may receive unaligned strings. I doubt maintainers of non-core text search code will notice. - Use plain VARDATA() on values detoasted or synthesized earlier in the same function. Use VARDATA_ANY() on varlenas sourced outside the function, even if they happen to always have four-byte headers. As an exception, retain the universal practice of using VARDATA() on return values of SendFunctionCall(). - Retain PG_GETARG_BYTEA_P() in pageinspect. (Page images are too large for a one-byte header, so this misses no optimization.) Sites that do not call get_page_from_raw() typically need the four-byte alignment. - For now, do not change btree_gist. Its use of four-byte headers in memory is partly entangled with storage of 4-byte headers inside GBT_VARKEY, on disk. - For now, do not change gtrgm_consistent() or gtrgm_distance(). They incorporate the varlena header into a cache, and there are multiple credible implementation strategies to consider.
2017-01-03Update copyright via script for 2017Bruce Momjian
2016-12-11Prevent crash when ts_rewrite() replaces a non-top-level subtree with null.Tom Lane
When ts_rewrite()'s replacement argument is an empty tsquery, it's supposed to simplify any operator nodes whose operand(s) become NULL; but it failed to do that reliably, because dropvoidsubtree() only examined the top level of the result tree. Rather than make a second recursive pass, let's just give the responsibility to dofindsubquery() to simplify while it's doing the main replacement pass. Per report from Andreas Seltenreich. Artur Zakirov, with some cosmetic changes by me. Back-patch to all supported branches. Discussion: https://postgr.es/m/[email protected]
2016-10-30Fix nasty performance problem in tsquery_rewrite().Tom Lane
tsquery_rewrite() tries to find matches to subsets of AND/OR conditions; for example, in the query 'a | b | c' the substitution subquery 'a | c' should match and lead to replacement of the first and third items. That's fine, but the matching algorithm apparently takes about O(2^N) for an N-clause query (I say "apparently" because the code is also both unintelligible and uncommented). We could probably do better than that even without any extra assumptions --- but actually, we know that the subclauses are sorted, indeed are depending on that elsewhere in this very same function. So we can just scan the two lists a single time to detect matches, as though we were doing a merge join. Also do a re-flattening call (QTNTernary()) in tsquery_rewrite_query, just to make sure that the tree fits the expectations of the next search cycle. I didn't try to devise a test case for this, but I'm pretty sure that the oversight could have led to failure to match in some cases where a match would be expected. Improve comments, and also stick a CHECK_FOR_INTERRUPTS into dofindsubquery, just in case it's still too slow for somebody. Per report from Andreas Seltenreich. Back-patch to all supported branches. Discussion: <[email protected]>
2016-03-12Widen query numbers-of-tuples-processed counters to uint64.Tom Lane
This patch widens SPI_processed, EState's es_processed field, PortalData's portalPos field, FuncCallContext's call_cntr and max_calls fields, ExecutorRun's count argument, PortalRunFetch's result, and the max number of rows in a SPITupleTable to uint64, and deals with (I hope) all the ensuing fallout. Some of these values were declared uint32 before, and others "long". I also removed PortalData's posOverflow field, since that logic seems pretty useless given that portalPos is now always 64 bits. The user-visible results are that command tags for SELECT etc will correctly report tuple counts larger than 4G, as will plpgsql's GET GET DIAGNOSTICS ... ROW_COUNT command. Queries processing more tuples than that are still not exactly the norm, but they're becoming more common. Most values associated with FETCH/MOVE distances, such as PortalRun's count argument and the count argument of most SPI functions that have one, remain declared as "long". It's not clear whether it would be worth promoting those to int64; but it would definitely be a large dollop of additional API churn on top of this, and it would only help 32-bit platforms which seem relatively less likely to see any benefit. Andreas Scherbaum, reviewed by Christian Ullrich, additional hacking by me
2016-01-02Update copyright for 2016Bruce Momjian
Backpatch certain files through 9.1
2015-01-06Update copyright for 2015Bruce Momjian
Backpatch certain files through 9.0
2014-01-07Update copyright for 2014Bruce Momjian
Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
2013-05-29pgindent run for release 9.3Bruce Momjian
This is the first run of the Perl-based pgindent script. Also update pgindent instructions.
2013-01-01Update copyrights for 2013Bruce Momjian
Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
2012-01-01Update copyright notices for year 2012.Bruce Momjian
2011-09-01Remove unnecessary #include references, per pgrminclude script.Bruce Momjian
2011-01-01Stamp copyrights for year 2011.Bruce Momjian
2010-09-20Remove cvs keywords from all files.Magnus Hagander
2010-01-02Update copyright for the year 2010.Bruce Momjian
2009-07-28Fix incorrect cleanup of tsquery in ts_rewrite(). Per bug #4933 byTeodor Sigaev
Aaron Marcuse-Kubitza <[email protected]>
2009-07-16Make backend header files C++ safePeter Eisentraut
This alters various incidental uses of C++ key words to use other similar identifiers, so that a C++ compiler won't choke outright. You still (probably) need extern "C" { }; around the inclusion of backend headers. based on a patch by Kurt Harriman <[email protected]> Also add a script cpluspluscheck to check for C++ compatibility in the future. As of right now, this passes without error for me.
2009-01-07Fix executor/spi.h to follow our usual conventions for include files, ie,Tom Lane
not include postgres.h nor anything else it doesn't directly need. Add #includes to calling files as needed to compensate. Per my proposal of yesterday. This should be noted as a source code change in the 8.4 release notes, since it's likely to require changes in add-on modules.
2009-01-01Update copyright for 2009.Bruce Momjian
2008-03-25Simplify and standardize conversions between TEXT datums and ordinary CTom Lane
strings. This patch introduces four support functions cstring_to_text, cstring_to_text_with_len, text_to_cstring, and text_to_cstring_buffer, and two macros CStringGetTextDatum and TextDatumGetCString. A number of existing macros that provided variants on these themes were removed. Most of the places that need to make such conversions now require just one function or macro call, in place of the multiple notational layers that used to be needed. There are no longer any direct calls of textout or textin, and we got most of the places that were using handmade conversions via memcpy (there may be a few still lurking, though). This commit doesn't make any serious effort to eliminate transient memory leaks caused by detoasting toasted text objects before they reach text_to_cstring. We changed PG_GETARG_TEXT_P to PG_GETARG_TEXT_PP in a few places where it was easy, but much more could be done. Brendan Jurd and Tom Lane
2008-01-01Update copyrights in source tree to 2008.Bruce Momjian
2007-11-15Re-run pgindent with updated list of typedefs. (Updated README shouldBruce Momjian
avoid this problem in the future.)
2007-11-15pgindent run for 8.3.Bruce Momjian
2007-11-13Resurrect the code for the rewrite(ARRAY[...]) aggregate function,Tom Lane
and put it into contrib/tsearch2 compatibility module.
2007-10-24Set read_only = TRUE while evaluating input queries for ts_rewrite()Tom Lane
and ts_stat(), per my recent suggestion. Also add a possibly-not-needed- but-can't-hurt check for NULL SPI_tuptable, before we try to dereference same.
2007-10-24Remove the aggregate form of ts_rewrite(), since it doesn't work as desiredTom Lane
if there are zero rows to aggregate over, and the API seems both conceptually and notationally ugly anyway. We should look for something that improves on the tsquery-and-text-SELECT version (which is also pretty ugly but at least it works...), but it seems that will take query infrastructure that doesn't exist today. (Hm, I wonder if there's anything in or near SQL2003 window functions that would help?) Per discussion.
2007-10-23Fix two-argument form of ts_rewrite() so it actually works for cases whereTom Lane
a later rewrite rule should change a subtree modified by an earlier one. Per my gripe of a few days ago.
2007-09-07Improvements from Heikki Linnakangas <[email protected]>Teodor Sigaev
- change the alignment requirement of lexemes in TSVector slightly. Lexeme strings were always padded to 2-byte aligned length to make sure that if there's position array (uint16[]) it has the right alignment. The patch changes that so that the padding is not done when there's no positions. That makes the storage of tsvectors without positions slightly more compact. - added some #include "miscadmin.h" lines I missed in the earlier when I added calls to check_stack_depth(). - Reimplement the send/recv functions, and added a comment above them describing the on-wire format. The CRC is now recalculated in tsquery as well per previous discussion.
2007-09-07Improving various checks by Heikki Linnakangas <[email protected]>Teodor Sigaev
- add code to check that the query tree is well-formed. It was indeed possible to send malformed queries in binary mode, which produced all kinds of strange results. - make the left-field a uint32. There's no reason to arbitrarily limit it to 16-bits, and it won't increase the disk/memory footprint either now that QueryOperator and QueryOperand are separate structs. - add check_stack_depth() call to all recursive functions I found. Some of them might have a natural limit so that you can't force arbitrarily deep recursions, but check_stack_depth() is cheap enough that seems best to just stick it into anything that might be a problem.
2007-09-07Refactoring by Heikki Linnakangas <[email protected]> withTeodor Sigaev
small editorization by me - Brake the QueryItem struct into QueryOperator and QueryOperand. Type was really the only common field between them. QueryItem still exists, and is used in the TSQuery struct as before, but it's now a union of the two. Many other changes fell from that, like separation of pushval_asis function into pushValue, pushOperator and pushStop. - Moved some structs that were for internal use only from header files to the right .c-files. - Moved tsvector parser to a new tsvector_parser.c file. Parser code was about half of the size of tsvector.c, it's also used from tsquery.c, and it has some data structures of its own, so it seems better to separate it. Cleaned up the API so that TSVectorParserState is not accessed from outside tsvector_parser.c. - Separated enumerations (#defines, really) used for QueryItem.type field and as return codes from gettoken_query. It was just accidental code sharing. - Removed ParseQueryNode struct used internally by makepol and friends. push*-functions now construct QueryItems directly. - Changed int4 variables to just ints for variables like "i" or "array size", where the storage-size was not significant.
2007-08-21Tsearch2 functionality migrates to core. The bulk of this work is byTom Lane
Oleg Bartunov and Teodor Sigaev, but I did a lot of editorializing, so anything that's broken is probably my fault. Documentation is nonexistent as yet, but let's land the patch so we can get some portability testing done.