Problem
You need to split a string into fields, but the delimiters aren’t consistent throughout the string.
Solution
There are multiple ways you can split a string or strings of multiple delimiters in python. The most and easy approach is to use the split() method, however, it is meant to handle simple cases.
re.split() is more flexible than the normal `split()` method in handling complex string scenarios.
With re.split() you can specify multiple patterns for the separator. As shown in the solution, the separator is either ahyphen(-), or whitespace( ), or comma(,) followed values. Regular expressions documentation can be found here.
Whenever that pattern is found, the entire match becomes the delimiter between the fields that are on either side of thematch.
Extract only the text between the delimiters (no delimiters).
Example
import re tennis_greats = 'Roger-federer, Rafael nadal, Novak Djokovic,Andy murray' """" #----------------------------------------------------------------------------- # Scenario 1 - Output the players # Input - String with multiple delimiters ( - , white space) # Code - Specify the delimters in [] #----------------------------------------------------------------------------- """ players = re.split(r'[-,\s]\s*',tennis_greats)
output
print(f" The output is - {players}")
The output is -
['Roger', 'federer', 'Rafael', 'nadal', 'Novak', 'Djokovic', 'Andy', 'murray']
Extract the text between the delimiters along with delimiters
Example
import re tennis_greats = 'Roger-federer, Rafael nadal, Novak Djokovic,Andy murray' """" #----------------------------------------------------------------------------- # Scenario 2 - Output the players and the delimiters # Input - String with multiple delimiters ( - , white space) # Code - Specify the delimters between pipe (|) #----------------------------------------------------------------------------- """ players = re.split(r'(-|,|\s)\s*',tennis_greats)
output
print(f" The output is -{players}")
The output is -
['Roger', '-', 'federer', ',', 'Rafael', ' ', 'nadal', ',', 'Novak', ' ', 'Djokovic', ',', 'Andy', ' ', 'murray']