summaryrefslogtreecommitdiff
path: root/src/backend/utils/adt/like_match.c
AgeCommit message (Collapse)Author
2025-01-01Update copyright for 2025Bruce Momjian
Backpatch-through: 13
2024-12-04Fix dead codePeter Eisentraut
from commit 85b7efa1cdd per Coverity report
2024-11-27Support LIKE with nondeterministic collationsPeter Eisentraut
This allows for example using LIKE with case-insensitive collations. There was previously no internal implementation of this, so it was met with a not-supported error. This adds the internal implementation and removes the error. The implementation follows the specification of the SQL standard for this. Unlike with deterministic collations, the LIKE matching cannot go character by character but has to go substring by substring. For example, if we are matching against LIKE 'foo%bar', we can't start by looking for an 'f', then an 'o', but instead with have to find something that matches 'foo'. This is because the collation could consider substrings of different lengths to be equal. This is all internal to MatchText() in like_match.c. The changes in GenericMatchText() in like.c just pass through the locale information to MatchText(), which was previously not needed. This matches exactly Generic_Text_IC_like() below. ILIKE is not affected. (It's unclear whether ILIKE makes sense under nondeterministic collations.) This also updates match_pattern_prefix() in like_support.c to support optimizing the case of an exact pattern with nondeterministic collations. This was already alluded to in the previous code. (includes documentation examples from Daniel Vérité and test cases from Paul A Jungwirth) Reviewed-by: Jian He <[email protected]> Discussion: https://www.postgresql.org/message-id/flat/[email protected]
2024-09-13Remove separate locale_is_c argumentsPeter Eisentraut
Since e9931bfb751, ctype_is_c is part of pg_locale_t. Some functions passed a pg_locale_t and a bool argument separately. This can now be combined into one argument. Since some callers call MatchText() with locale 0, it is a bit confusing whether this is all correct. But it is the case that only callers that pass a non-zero locale object to MatchText() end up checking locale->ctype_is_c. To make that flow a bit more understandable, add the locale argument to MATCH_LOWER() and GETCHAR() in like_match.c, instead of implicitly taking it from the outer scope. Reviewed-by: Jeff Davis <[email protected]> Discussion: https://www.postgresql.org/message-id/[email protected]
2024-01-04Update copyright for 2024Bruce Momjian
Reported-by: Michael Paquier Discussion: https://postgr.es/m/[email protected] Backpatch-through: 12
2023-01-02Update copyright for 2023Bruce Momjian
Backpatch-through: 11
2022-01-08Update copyright for 2022Bruce Momjian
Backpatch-through: 10
2021-01-02Update copyright for 2021Bruce Momjian
Backpatch-through: 9.5
2020-01-01Update copyrights for 2020Bruce Momjian
Backpatch-through: update all files in master, backpatch legal files through 9.4
2019-07-29Fix inconsistencies and typos in the treeMichael Paquier
This is numbered take 8, and addresses again a set of issues with code comments, variable names and unreferenced variables. Author: Alexander Lakhin Discussion: https://postgr.es/m/[email protected]
2019-02-08Add some const decorationsPeter Eisentraut
These mainly help understanding the function signatures better.
2019-01-02Update copyright for 2019Bruce Momjian
Backpatch-through: certain files through 9.4
2018-01-03Update copyright for 2018Bruce Momjian
Backpatch-through: certain files through 9.3
2017-06-21Phase 3 of pgindent updates.Tom Lane
Don't move parenthesized lines to the left, even if that means they flow past the right margin. By default, BSD indent lines up statement continuation lines that are within parentheses so that they start just to the right of the preceding left parenthesis. However, traditionally, if that resulted in the continuation line extending to the right of the desired right margin, then indent would push it left just far enough to not overrun the margin, if it could do so without making the continuation line start to the left of the current statement indent. That makes for a weird mix of indentations unless one has been completely rigid about never violating the 80-column limit. This behavior has been pretty universally panned by Postgres developers. Hence, disable it with indent's new -lpl switch, so that parenthesized lines are always lined up with the preceding left paren. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/[email protected] Discussion: https://postgr.es/m/[email protected]
2017-06-21Phase 2 of pgindent updates.Tom Lane
Change pg_bsd_indent to follow upstream rules for placement of comments to the right of code, and remove pgindent hack that caused comments following #endif to not obey the general rule. Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using the published version of pg_bsd_indent, but a hacked-up version that tried to minimize the amount of movement of comments to the right of code. The situation of interest is where such a comment has to be moved to the right of its default placement at column 33 because there's code there. BSD indent has always moved right in units of tab stops in such cases --- but in the previous incarnation, indent was working in 8-space tab stops, while now it knows we use 4-space tabs. So the net result is that in about half the cases, such comments are placed one tab stop left of before. This is better all around: it leaves more room on the line for comment text, and it means that in such cases the comment uniformly starts at the next 4-space tab stop after the code, rather than sometimes one and sometimes two tabs after. Also, ensure that comments following #endif are indented the same as comments following other preprocessor commands such as #else. That inconsistency turns out to have been self-inflicted damage from a poorly-thought-through post-indent "fixup" in pgindent. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/[email protected] Discussion: https://postgr.es/m/[email protected]
2017-06-21Initial pgindent run with pg_bsd_indent version 2.0.Tom Lane
The new indent version includes numerous fixes thanks to Piotr Stefaniak. The main changes visible in this commit are: * Nicer formatting of function-pointer declarations. * No longer unexpectedly removes spaces in expressions using casts, sizeof, or offsetof. * No longer wants to add a space in "struct structname *varname", as well as some similar cases for const- or volatile-qualified pointers. * Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely. * Fixes bug where comments following declarations were sometimes placed with no space separating them from the code. * Fixes some odd decisions for comments following case labels. * Fixes some cases where comments following code were indented to less than the expected column 33. On the less good side, it now tends to put more whitespace around typedef names that are not listed in typedefs.list. This might encourage us to put more effort into typedef name collection; it's not really a bug in indent itself. There are more changes coming after this round, having to do with comment indentation and alignment of lines appearing within parentheses. I wanted to limit the size of the diffs to something that could be reviewed without one's eyes completely glazing over, so it seemed better to split up the changes as much as practical. Discussion: https://postgr.es/m/[email protected] Discussion: https://postgr.es/m/[email protected]
2017-01-03Update copyright via script for 2017Bruce Momjian
2016-01-02Update copyright for 2016Bruce Momjian
Backpatch certain files through 9.1
2015-10-02Add recursion depth protection to LIKE matching.Tom Lane
Since MatchText() recurses, it could in principle be driven to stack overflow, although quite a long pattern would be needed.
2015-01-06Update copyright for 2015Bruce Momjian
Backpatch certain files through 9.0
2014-01-07Update copyright for 2014Bruce Momjian
Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
2013-04-20Clean up references to SQL92Peter Eisentraut
In most cases, these were just references to the SQL standard in general. In a few cases, a contrast was made between SQL92 and later standards -- those have been kept unchanged.
2013-01-01Update copyrights for 2013Bruce Momjian
Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
2012-01-01Update copyright notices for year 2012.Bruce Momjian
2011-04-09Fix ILIKE to honor collation when working in single-byte encodings.Tom Lane
The original collation patch only fixed the multi-byte code path. This change also ensures that ILIKE's idea of the case-folding rules is exactly the same as str_tolower's.
2011-01-01Stamp copyrights for year 2011.Bruce Momjian
2010-09-20Remove cvs keywords from all files.Magnus Hagander
2010-07-06pgindent run for 9.0, second runBruce Momjian
2010-05-28Fix oversight in the previous patch that made LIKE throw error for \ at theTom Lane
end of the pattern: the code path that handles \ just after % should throw error too. As in the previous patch, not back-patching for fear of breaking apps that worked before.
2010-05-28Rewrite LIKE's %-followed-by-_ optimization so it really works (this timeTom Lane
for sure ;-)). It now also optimizes more cases, such as %_%_. Improve comments too. Per bug #5478. In passing, also rename the TCHAR macro to GETCHAR, because pgindent is messing with the formatting of the former (apparently it now thinks TCHAR is a typedef name). Back-patch to 8.3, where the bug was introduced.
2010-01-02Update copyright for the year 2010.Bruce Momjian
2009-06-118.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef listBruce Momjian
provided by Andrew.
2009-05-24Fix LIKE's special-case code for % followed by _. I'm not entirely sure thatTom Lane
this case is worth a special code path, but a special code path that gets the boundary condition wrong is definitely no good. Per bug #4821 from Andrew Gierth. In passing, clean up some minor code formatting issues (excess parentheses and blank lines in odd places). Back-patch to 8.3, where the bug was introduced.
2009-01-01Update copyright for 2009.Bruce Momjian
2008-09-27Compare escaped chars case insensitively for ILIKE - per gripe from TGL.Andrew Dunstan
2008-09-26Make LIKE throw an error if the escape character is at the end of the patternTom Lane
(ie, has nothing to quote), rather than silently ignoring the character as has been our historical behavior. This is required by SQL spec and should help reduce the sort of user confusion seen in bug #4436. Per discussion. This is not so much a bug fix as a definitional change, and it could break existing applications; so not back-patched. It might deserve being mentioned as an incompatibility in the 8.4 release notes.
2008-03-01Fix unportable usages of tolower(). On signed-char machines, it is necessaryTom Lane
to explicitly cast the output back to char before comparing it to a char value, else we get the wrong result for high-bit-set characters. Found by Rolf Jentsch. Also, fix several places where <ctype.h> functions were being called without casting the argument to unsigned char; this is likewise unportable, but we keep making that mistake :-(. These found by buildfarm member salamander, which I will desperately miss if it ever goes belly-up.
2008-01-01Update copyrights in source tree to 2008.Bruce Momjian
2007-11-15pgindent run for 8.3.Bruce Momjian
2007-09-22Go back to using a separate method for doing ILIKE for single byteAndrew Dunstan
character encodings that doesn't involve calling lower(). This should cure the performance regression in this case complained of by Guillaume Smet. It still leaves the horrid performance for multi-byte encodings introduced in 8.2, but there's no obvious solution for that in sight.
2007-09-21Fix regex, LIKE, and some other second-rank text-manipulation functionsTom Lane
to not cause needless copying of text datums that have 1-byte headers. Greg Stark, in response to performance gripe from Guillaume Smet and ITAGAKI Takahiro.
2007-06-02Improve efficiency of LIKE/ILIKE code, especially for multi-byte charsets,Andrew Dunstan
and most especially for UTF8. Remove unnecessary special cases for bytea processing and single-byte charset ILIKE. a ILIKE b is now processed as lower(a) LIKE lower(b) in all cases. The code is now considerably simpler. All comparisons are now performed byte-wise, and the text and pattern are also advanced byte-wise where it is safe to do so - essentially where a wildcard is not being matched. Andrew Dunstan, from an original patch by ITAGAKI Takahiro, with ideas from Tom Lane and Mark Mielke.
2007-02-27Replace direct assignments to VARATT_SIZEP(x) with SET_VARSIZE(x, len).Tom Lane
Get rid of VARATT_SIZE and VARATT_DATA, which were simply redundant with VARSIZE and VARDATA, and as a consequence almost no code was using the longer names. Rename the length fields of struct varlena and various derived structures to catch anyplace that was accessing them directly; and clean up various places so caught. In itself this patch doesn't change any behavior at all, but it is necessary infrastructure if we hope to play any games with the representation of varlena headers. Greg Stark and Tom Lane
2007-01-05Update CVS HEAD for 2007 copyright. Back branches are typically notBruce Momjian
back-stamped for this.
2006-03-05Update copyright for 2006. Update scripts.Bruce Momjian
2005-10-15Standard pgindent run for 8.1.Bruce Momjian
2005-09-24Suppress signed-vs-unsigned-char warnings.Tom Lane
2004-12-31Tag appropriate files for rc3PostgreSQL Daemon
Also performed an initial run through of upgrading our Copyright date to extend to 2005 ... first run here was very simple ... change everything where: grep 1996-2004 && the word 'Copyright' ... scanned through the generated list with 'less' first, and after, to make sure that I only picked up the right entries ...
2004-08-29Update copyright to 2004.Bruce Momjian
2003-11-29$Header: -> $PostgreSQL Changes ...PostgreSQL Daemon