Split on Successions of Newline Characters Using Python Regular Expression



Python's built-in splitlines() method and the split() method with \n as a delimiter are sufficient to split strings based on newline characters. This article will explore different approaches to splitting strings on sequences of newline characters using Python's regular expressions.

Splitting on One or More Newlines

The Python re.split() function uses a regular expression to split a string. We'll use the pattern \n+, which means one or more newlines. The re.split() will find where these newlines are, split the string there, and return a list of the resulting pieces.

The re.split() function then splits the string at each occurrence of this pattern, returning a list of substrings.

Example

The following code splits the text string into a list using one or more newline characters (\n+) as delimiters. re.split finds these delimiters and splits the string at those points. The result is a list of the text segments.

import re

text = "This is the first line.\nThis is the second line.\n\n\nThis is the third line."
result = re.split(r'\n+', text)
print(result)

Following is the output of the above code ?

['This is the first line.', 'This is the second line.', 'This is the third line.']

Splitting with a Maximum Number of Splits

The re.split() function allows us to specify a maxsplit argument, which limits the number of splits performed. This can be useful when you only want to split the string a certain number of times from the beginning and leave the remaining portion. The maxsplit parameter will split the string from left to right.

Example

Let's assume we want to extract the first two text blocks from a larger string, leaving the rest as a single, combined block. Setting maxsplit to 2 will achieve this. The re.split() function will only perform two splits, resulting in a list containing the first two extracted blocks and the remaining portion of the string.

import re

text = "This is the first line.\nThis is the second line.\nThis is the third line.\nThis is the fourth line."
result = re.split(r'\n+', text, maxsplit=2)
print(result)

Following is the output of the above code ?

['This is the first line.', 'This is the second line.', 'This is the third line.\nThis is the fourth line.']

Splitting on Any Newline Character (Including Windows and Mac)

Different operating systems use different newline character representations. Unix-like systems (including macOS) typically use `\n`, Windows uses `\r\n`, and older Macs used `\r`. To handle all these possibilities, we can use the character class '[\r\n]+' in our regular expression. This pattern matches one or more occurrences of either a carriage return (\r) or a newline (\n).

Example

The following example demonstrates how to split on any new line character in both Mac and Windows operating systems.

import re

text = "This is the first line.\r\nThis is the second line.\nThis is the third line.\rThis is the fourth line."
result = re.split(r'[\r\n]+', text)
print(result)

Following is the output of the above code ?

['This is the first line.', 'This is the second line.', 'This is the third line.', 'This is the fourth line.']
Updated on: 2025-04-21T11:04:58+05:30

218 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements