summaryrefslogtreecommitdiff
path: root/src/backend/regex
AgeCommit message (Collapse)Author
2011-08-26Add markers for skips.Bruce Momjian
2011-06-09Pgindent run before 9.1 beta2.Bruce Momjian
2011-04-10Insert dummy "break"s to silence compiler complaints.Tom Lane
Apparently some compilers dislike a case label with nothing after it. Per buildfarm.
2011-04-10Teach regular expression operators to honor collations.Tom Lane
This involves getting the character classification and case-folding functions in the regex library to use the collations infrastructure. Most of this work had been done already in connection with the upper/lower and LIKE logic, so it was a simple matter of transposition. While at it, split out these functions into a separate source file regc_pg_locale.c, so that they can be correctly labeled with the Postgres project's license rather than the Scriptics license. These functions are 100% Postgres-written code whereas what remains in regc_locale.c is still mostly not ours, so lumping them both under the same copyright notice was getting more and more misleading.
2011-04-10pgindent run before PG 9.1 beta 1.Bruce Momjian
2010-10-29Fix comparisons of pointers with zero to compare with NULL instead.Tom Lane
Per C standard, these are semantically the same thing; but saying NULL when you mean NULL is good for readability. Marti Raudsepp, per results of INRIA's Coccinelle.
2010-09-20Remove cvs keywords from all files.Magnus Hagander
2010-08-02Tweak a couple of macros in the regex code to suppress compiler warningsTom Lane
from "clang". The VERR changes make an assignment unconditional, which is probably easier to read/understand anyway, and one can hardly argue that it's worth shaving cycles off the case of reporting another error when one has already been detected. The INSIST change limits where that macro can be used, but not in a way that creates a problem for any existing call.
2010-02-26pgindent run for 9.0Bruce Momjian
2010-02-01Change regexp engine's ccondissect/crevdissect routines to perform DFATom Lane
matching before recursing instead of after. The DFA match eliminates unworkable midpoint choices a lot faster than the recursive check, in most cases, so doing it first can speed things up; particularly in pathological cases such as recently exhibited by Michael Glaesemann. In addition, apply some cosmetic changes that were applied upstream (in the Tcl project) at the same time, in order to sync with upstream version 1.15 of regexec.c. Upstream apparently intends to backpatch this, so I will too. The pathological behavior could be unpleasant if encountered in the field, which seems to justify any risk of introducing new bugs. Tom Lane, reviewed by Donal K. Fellows of Tcl project
2010-01-30Fix some comments that got mangled by pgindent.Tom Lane
2009-12-01Teach the regular expression functions to do case-insensitive matching andTom Lane
locale-dependent character classification properly when the database encoding is UTF8. The previous coding worked okay in single-byte encodings, or in any case for ASCII characters, but failed entirely on multibyte characters. The fix assumes that the <wctype.h> functions use Unicode code points as the wchar representation for Unicode, ie, wchar matches pg_wchar. This is only a partial solution, since we're still stupid about non-ASCII characters in multibyte encodings other than UTF8. The practical effect of that is limited, however, since those cases are generally Far Eastern glyphs for which concepts like case-folding don't apply anyway. Certainly all or nearly all of the field reports of problems have been about UTF8. A more general solution would require switching to the platform's wchar representation for all regex operations; which is possible but would have substantial disadvantages. Let's try this and see if it's sufficient in practice.
2009-06-118.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef listBruce Momjian
provided by Andrew.
2008-02-19Refactor backend makefiles to remove lots of duplicate codePeter Eisentraut
2008-02-14Sync our regex code with upstream changes since last time we did this, whichTom Lane
was Tcl 8.4.8. The main changes are to remove the never-fully-implemented code for multi-character collating elements, and to const-ify some stuff a bit more fully. In combination with the recent security patch, this commit brings us into line with Tcl 8.5.0. Note that I didn't make any effort to duplicate a lot of cosmetic changes that they made to bring their copy into line with their own style guidelines, such as adding braces around single-line IF bodies. Most of those we either had done already (such as ANSI-fication of function headers) or there is no point because pgindent would undo the change anyway.
2008-01-03Fix assorted security-grade bugs in the regex engine. All of these problemsTom Lane
are shared with Tcl, since it's their code to begin with, and the patches have been copied from Tcl 8.5.0. Problems: CVE-2007-4769: Inadequate check on the range of backref numbers allows crash due to out-of-bounds read. CVE-2007-4772: Infinite loop in regex optimizer for pattern '($|^)*'. CVE-2007-6067: Very slow optimizer cleanup for regex with a large NFA representation, as well as crash if we encounter an out-of-memory condition during NFA construction. Part of the response to CVE-2007-6067 is to put a limit on the number of states in the NFA representation of a regex. This seems needed even though the within-the-code problems have been corrected, since otherwise the code could try to use very large amounts of memory for a suitably-crafted regex, leading to potential DOS by driving the system into swap, activating a kernel OOM killer, etc. Although there are certainly plenty of ways to drive the system into effective DOS with poorly-written SQL queries, these problems seem worth treating as security issues because many applications might accept regex search patterns from untrustworthy sources. Thanks to Will Drewry of Google for reporting these problems. Patches by Will Drewry and Tom Lane. Security: CVE-2007-4769, CVE-2007-4772, CVE-2007-6067
2007-11-15pgindent run for 8.3.Bruce Momjian
2007-10-22Add a useless return statement to suppress a warning seen with someTom Lane
versions of gcc (I'm seeing it with Apple's gcc 4.0.1). I think the reason we did not see this before was that the assert() macros in the regex code were all no-ops till recently.
2007-10-06Make dumpcolors() have tolerable performance when using 32-bit chr,Tom Lane
as we do (and upstream Tcl doesn't). The loop limit might be subject to negotiation if anyone ever tries to do regex debugging in Far Eastern languages, but for now 1000 seems plenty. CHR_MAX was right out :-(
2007-10-06Adjust some regex debugging printouts to not give wrong-format-widthTom Lane
warnings on a 64-bit machine. Noted while chasing a recent regex bug report.
2007-02-01Wording cleanup for error messages. Also change can't -> cannot.Bruce Momjian
Standard English uses "may", "can", and "might" in different ways: may - permission, "You may borrow my rake." can - ability, "I can lift that log." might - possibility, "It might rain today." Unfortunately, in conversational English, their use is often mixed, as in, "You may use this variable to do X", when in fact, "can" is a better choice. Similarly, "It may crash" is better stated, "It might crash".
2005-11-22Re-run pgindent, fixing a problem where comment lines after a blankBruce Momjian
comment line where output as too long, and update typedefs for /lib directory. Also fix case where identifiers were used as variable names in the backend, but as typedefs in ecpg (favor the backend for indenting). Backpatch to 8.1.X.
2005-10-15Standard pgindent run for 8.1.Bruce Momjian
2005-09-24Clean up possibly-uninitialized-variable warnings reported by gcc 4.x.Tom Lane
2005-07-10I made the patch that implements regexp_replace again.Bruce Momjian
The specification of this function is as follows. regexp_replace(source text, pattern text, replacement text, [flags text]) returns text Replace string that matches to regular expression in source text to replacement text. - pattern is regular expression pattern. - replacement is replace string that can use '\1'-'\9', and '\&'. '\1'-'\9': back reference to the n'th subexpression. '\&' : entire matched string. - flags can use the following values: g: global (replace all) i: ignore case When the flags is not specified, case sensitive, replace the first instance only. Atsushi Ogawa
2005-05-25Add parentheses to macros when args are used in computations. WithoutBruce Momjian
them, the executation behavior could be unexpected.
2004-11-24Install Tcl regex fixes to sync our regex engine with Tcl 8.4.8 (up fromTom Lane
8.4.1). This corrects some curious regex bugs, though not the greediness issue I was hoping to find a solution for :-(
2004-05-07Solve the 'Turkish problem' with undesirable locale behavior for caseTom Lane
conversion of basic ASCII letters. Remove all uses of strcasecmp and strncasecmp in favor of new functions pg_strcasecmp and pg_strncasecmp; remove most but not all direct uses of toupper and tolower in favor of pg_toupper and pg_tolower. These functions use the same notions of case folding already developed for identifier case conversion. I left the straight locale-based folding in place for situations where we are just manipulating user data and not trying to match it to built-in strings --- for example, the SQL upper() function is still locale dependent. Perhaps this will prove not to be what's wanted, but at the moment we can initdb and pass regression tests in Turkish locale.
2003-11-29$Header: -> $PostgreSQL Changes ...PostgreSQL Daemon
2003-09-29Fix broken definition of :print: character class, per Bruno Wolff.Tom Lane
Also, make :alnum: character class directly dependent on isalnum() rather than guessing.
2003-08-08Another pgindent run with updated typedefs.Bruce Momjian
2003-08-04pgindent run.Bruce Momjian
2003-02-05Replace regular expression package with Henry Spencer's latest versionTom Lane
(extracted from Tcl 8.4.1 release, as Henry still hasn't got round to making it a separate library). This solves a performance problem for multibyte, as well as upgrading our regexp support to match recent Tcl and nearly match recent Perl.
2002-11-08This patch removes a bunch of superfluous #include directives: ifBruce Momjian
postgres.h or c.h includes a system header (such as stdio.h or stdlib.h), there's no need to specifically include it in any of the .c files in the backend. Neil Conway
2002-09-16Remove retest Makefile entry because it does not compile.Bruce Momjian
2002-09-04pgindent run.Bruce Momjian
2002-09-03Remove all traces of multibyte and locale options. Clean up commentsPeter Eisentraut
referring to "multibyte" where it really means character encoding.
2002-09-02Remove sys/types.h in files that include postgres.h, and hence c.h,Bruce Momjian
because c.h has sys/types.h.
2002-08-29Remove #ifdef MULTIBYTE per hackers list discussion.Tatsuo Ishii
2002-06-11Implement SQL99 OVERLAY(). Allows substitution of a substring in a string.Thomas G. Lockhart
Implement SQL99 SIMILAR TO as a synonym for our existing operator "~". Implement SQL99 regular expression SUBSTRING(string FROM pat FOR escape). Extend the definition to make the FOR clause optional. Define textregexsubstr() to actually implement this feature. Update the regression test to include these new string features. All tests pass. Rename the regular expression support routines from "pg95_xxx" to "pg_xxx". Define CREATE CHARACTER SET in the parser per SQL99. No implementation yet.
2002-05-05Fix code to work when isalpha and friends are macros, not functions.Tom Lane
2002-04-24[ Patch comments in three pieces.]Bruce Momjian
Attached is a pacth against 7.2 which adds locale awareness to the character classes of the regular expression engine. ... > > I still think the xdigit class could be handled the same way the digit > > class is (by enumeration rather than using the isxdigit function). That > > saves you a cicle, and I don't think there's any loss. > > In fact, I will email you when I apply the original patch. I miss that case :-(. Here is the pached patch. ... Here is a patch which addresses Tatsuo's concerns (it does return an static struct instead of constructing it).
2001-11-05New pgindent run with fixes suggested by Tom. Patch manually reviewed,Bruce Momjian
initdb/regression tests pass.
2001-10-28Another pgindent run. Fixes enum indenting, and improves #endifBruce Momjian
spacing. Also adds space for one-line comments.
2001-10-25pgindent run on all C files. Java run to follow. initdb/regressionBruce Momjian
tests pass.
2001-10-25Add do { ... } while (0) to more bad macros.Bruce Momjian
2001-10-04Add dependency for regexec.cTatsuo Ishii
2001-03-22pgindent run. Make it all clean.Bruce Momjian
2001-03-19Make regular-expression error messages a tad less obscure,Tom Lane
per gripe from Josh Berkus.
2001-02-13Clean up portability problems in regexp package: change all routineTom Lane
definitions from K&R to ANSI C style, and fix broken assumption that int and long are the same datatype. This repairs problems observed on Alpha with regexps having between 32 and 63 states.