[ruby-core:85622] [Ruby trunk Bug#14488] Unicode characters prevent [[:punct:]] character class from matching certain characters in subsequent matches

From: pbrinichlanglois@...
Date: 2018-02-18 23:02:41 UTC
List: ruby-core #85622
Issue #14488 has been reported by patbl (Patrick Brinich-Langlois).

----------------------------------------
Bug #14488: Unicode characters prevent [[:punct:]] character class from matching certain characters in subsequent matches 
https://bugs.ruby-lang.org/issues/14488

* Author: patbl (Patrick Brinich-Langlois)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: 2.4.3
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
In 2.3.5, `[[:punct:]]` doesn't match `+`. In 2.5.0, it does. In 2.4.3, it matches, but not after you match against a string containing one or more unicode characters. I would expect 2.4.3 to have the behavior of 2.5.0.

~~~ ruby
puts RUBY_VERSION
p %w[+ 辿 +].grep(/[[:punct:]]/)
~~~

~~~
2.3.5
[]

2.4.3
["+"]

2.5.0
["+", "+"]
~~~

One of the commenters [here](https://stackoverflow.com/questions/48700038/why-do-unicode-characters-prevent-the-punct-character-class-from-matching#comment84405133_48700038) noticed that this behavior may be related to [this issue](https://github.com/k-takata/Onigmo/issues/42). It seems that `[$+<=>^|~]`` are affected by the bug, but other punctuation characters aren't (though I tested only a handful of them).



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

In This Thread

Prev Next