diff options
author | Tom Lane | 2015-08-05 01:09:12 +0000 |
---|---|---|
committer | Tom Lane | 2015-08-05 01:09:40 +0000 |
commit | dacbdda1092e20507249bade076c859993f5e837 (patch) | |
tree | eb74e6141d2ac77e25539e3403c045c389b97f0d | |
parent | 270a877cca21cf0252ae7c81dd085ae61233ab56 (diff) |
Docs: add an explicit example about controlling overall greediness of REs.
Per discussion of bug #13538.
-rw-r--r-- | doc/src/sgml/func.sgml | 29 |
1 files changed, 28 insertions, 1 deletions
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index f6da1c2ec45..48ddb317b3c 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -4951,10 +4951,37 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); The quantifiers <literal>{1,1}</> and <literal>{1,1}?</> can be used to force greediness or non-greediness, respectively, on a subexpression or a whole RE. + This is useful when you need the whole RE to have a greediness attribute + different from what's deduced from its elements. As an example, + suppose that we are trying to separate a string containing some digits + into the digits and the parts before and after them. We might try to + do that like this: +<screen> +SELECT regexp_matches('abc01234xyz', '(.*)(\d+)(.*)'); +<lineannotation>Result: </lineannotation><computeroutput>{abc0123,4,xyz}</computeroutput> +</screen> + That didn't work: the first <literal>.*</> is greedy so + it <quote>eats</> as much as it can, leaving the <literal>\d+</> to + match at the last possible place, the last digit. We might try to fix + that by making it non-greedy: +<screen> +SELECT regexp_matches('abc01234xyz', '(.*?)(\d+)(.*)'); +<lineannotation>Result: </lineannotation><computeroutput>{abc,0,""}</computeroutput> +</screen> + That didn't work either, because now the RE as a whole is non-greedy + and so it ends the overall match as soon as possible. We can get what + we want by forcing the RE as a whole to be greedy: +<screen> +SELECT regexp_matches('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}'); +<lineannotation>Result: </lineannotation><computeroutput>{abc,01234,xyz}</computeroutput> +</screen> + Controlling the RE's overall greediness separately from its components' + greediness allows great flexibility in handling variable-length patterns. </para> <para> - Match lengths are measured in characters, not collating elements. + When deciding what is a longer or shorter match, + match lengths are measured in characters, not collating elements. An empty string is considered longer than no match at all. For example: <literal>bb*</> |