-
Notifications
You must be signed in to change notification settings - Fork 63
Description
Original report by Marcin Wojnarski (Bitbucket: mwojnars, GitHub: mwojnars).
Hi,
Currently, duplicate names are not allowed, for example this code raises an exception because group "a" is defined twice:
>>> regex.match(r'(?<a>here)? or (?<a>here)?', "here or here")
error: duplicate group
I suspect this design is a legacy after standard 're' module which didn't allow multiple values, so it was somehow natural to reject duplicate group names, too. But now, in 'regex' module which can capture repeated values, it would be natural to accept also duplicate group names and merge values extracted from all same-named groups into one list.
This enhancement would allow parsing loose formats, where a given value may appear in any of several different places in the text and we must prepare a regex that has groups in all these places. Usually, we would expect that only one place is matched (groups are optional like in regex above), but we can't say in advance which one and - for convenience - we'd like to use the same name for all these places, to avoid manual merging of several groups afterwards. In other use cases, it may be possible that more than 1 group matches and we want to extract all the matched values as a single list.
I think this enhancement would fit very well to the concept of repeated captures that's already present in 'regex'.
Do any other regex implementations have something like this?
I don't know.