0% found this document useful (0 votes)
48 views

14 Python Regex Finditer Function

The document discusses the Python regex finditer() function. finditer() returns an iterator that yields MatchObject instances for all non-overlapping matches of a regex pattern in a string. It scans the string left-to-right. An example uses finditer() to find all occurrences of "the" in an HTML page and prints each match along with the following word and character position.

Uploaded by

ArvindSharma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

14 Python Regex Finditer Function

The document discusses the Python regex finditer() function. finditer() returns an iterator that yields MatchObject instances for all non-overlapping matches of a regex pattern in a string. It scans the string left-to-right. An example uses finditer() to find all occurrences of "the" in an HTML page and prints each match along with the following word and character position.

Uploaded by

ArvindSharma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Python regex `finditer` function

Python regex nditer() function explained with examples.

WE'LL COVER THE FOLLOWING

• Python string nditer


• Syntax
• Example 1

Python string nditer #


finditer() is a powerful function in the re module. It returns an iterator
yielding MatchObject instances over all non-overlapping matches for the RE
pattern in string.

Syntax #

re.finditer(pattern, string, flags=0)

Here the string is scanned left-to-right, and matches are returned in the order
found. Empty matches are included in the result unless they touch the
beginning of another match.

Example 1 #
Here is a simple example which demonstrates the use of finditer. It reads in a
page of html text, finds all the occurrences of the word “the” and prints “the”
and the following word. It also prints the character position of each match
using the MatchObject’s start() method.

import re
import urllib2

html = urllib2.urlopen('https://docs.python.org/2/library/re.html').read()
pattern = r'\b(the\s+\w+)\s+'
regex = re.compile(pattern, re.IGNORECASE)
for match in regex.finditer(html):
print "%s: %s" % (match.start(), match.group(1))

Once you have the list of tuples, you can loop over it to do some computation
for each tuple.

Expected output:

output

3261: The Python


4210: the backslash
4451: the same
4474: the same
4651: the pattern
4679: the regular
4930: The solution
5937: The functions
6301: the standard
and so on...

You might also like