0% found this document useful (0 votes)

15 views

Module 4 - Regular Expressions

pattern matching

Uploaded by

archanabk.research

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Module 4 - Regular Expressions

pattern matching

Uploaded by

archanabk.research

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 35

Regular expressions

Regular Expressions

In computing, a regular expression, also referred to as “regex” or

“regexp”, provides a concise and flexible means for matching
strings of text, such as particular characters, words, or patterns of
characters.
A regular expression is written in a formal language that can
be interpreted by a regular expression processor.
REGULAR EXPRESSIONS

output would be:

hello, how are you?
how about you?
REGULAR EXPRESSIONS

Regular expressions make use of special characters with specific

meaning. In the following example, we make use of caret (^)
symbol, which indicates beginning of the line.
Character Matching in Regular Expressions
Python provides a list of meta-characters to match search strings. Table shows the
details of few important metacharacters.
Table: List of Important Meta-
Characters
Table: Examples for Regular
Expressions
Example

• Most commonly used metacharacter is dot, which matches any character.

• Consider the following example, where the regular expression is for searching lines which
starts with I and has any two characters (any character represented by two dots) and then
has a character m.
Example

If we don’t know the exact number of characters between two

characters (or strings), we can make use of dot and + symbols
together.
Pattern to extract lines starting with the word
From (or from) and ending with edu.

import re
fhand = open('mbox-short.txt')
for line in fhand:
line = line.rstrip()
pattern = ‘^[Ff]rom.*edu$’
if re.search(pattern, line):
print(line)
Pattern to extract lines ending with
any digit

Replace the pattern by

following string, rest of the
program will remain the same.

pattern = ‘[0-9]$’
Character matching in regular
expressions
Search for lines that start with From and
have an at sign
The findall() Method

• search() will return a match object of the first matched text in the
searched string

• the findall() method will return the strings of every match in the
searched string.

• findall() will not return a match object but a list of strings

• Each string in the list is a piece of the searched text that matched the
regular expression.
The findall() Method

• If there are groups in the regular expression, then findall()

will return a list of tuples

• Each tuple represents a found match

• Its items are the matched strings for each group in the
regex
Extracting data using regular
expressions
Search for lines that have an @ sign
between characters
Combining searching and extracting
Example
Search for lines that start with 'X' followed by any non whitespace
characters and ':' followed by a space and any number. The number
can include a decimal . Then print the number if it is greater than
zero.
Example
Search for lines that start with 'Details: rev=' followed
by numbers and '.' Then print the number if it is greater
than zero
Example
Search for lines that start with From and a character followed by a
two digit number between 00 and 99 followed by ':' Then print the
number if it is greater than zero
Escape character
Start with upper case letters and end with digits

pattern = '^[A-Z].*[0-9]$'

Here, the line should start with capital letters, followed by 0

or more characters, but must end with any digit.

The file mbox-short.txt has lines like:

From [email protected] Sat Jan 5 09:14:16 2008

Here, we would like to extract only the hour 09. That is, we would like
only two digits representing hour. Hence, we need to modify our
expression as:

x = re.findall('^From .* ([0-9][0-9]):', line)

Here, [0-9][0-9] indicates that a digit should appear only two times.
The alternative way of writing this would be:

x = re.findall('^From .* ([0-9]{2}):', line)

Unix/Linux Users
Support for searching files using regular expressions was built into the Unix OS.

There is a command-line program built into Unix

grep (Generalized Regular Expression Parser) that behaves similar to search()

function.

Note that, grep command does not support the non-blank character \S, hence
we need to use [~] indicating not a white-space.
Matching Zero or More with the
Star

• The * (called the star or asterisk) means “match zero or more

• The group that precedes the star can occur any number of times in the text

• It can be completely absent or repeated over and over again

Matching Zero or More with the
Star
Matching One or More with the
Plus

• The + (or plus) means match one or more

• The group preceding a plus must appear at least once
• The group preceding plus is not optional
The findall() Method
Character classes
Character classes
• Used for shortening regular expressions

• The character class [0-5] will match only the numbers 0 to 5

Character classes

ServiceNow - Certified System Administrator CSA - Exam
No ratings yet
ServiceNow - Certified System Administrator CSA - Exam
49 pages
DevOps Resume 75
100% (1)
DevOps Resume 75
5 pages
Module 4 - Regular Expressions1
No ratings yet
Module 4 - Regular Expressions1
37 pages
Module 3 Regular Expressions
No ratings yet
Module 3 Regular Expressions
8 pages
Module3 RegularExpressions
No ratings yet
Module3 RegularExpressions
8 pages
06 - Regular Expressions and Network Programming
No ratings yet
06 - Regular Expressions and Network Programming
55 pages
Module5_RegularExpressions
No ratings yet
Module5_RegularExpressions
10 pages
Python Regex: Re - Match, Re - Search, Re - Findall With Example
No ratings yet
Python Regex: Re - Match, Re - Search, Re - Findall With Example
10 pages
An Introduction To Regular Expressions (9781492082569)
100% (1)
An Introduction To Regular Expressions (9781492082569)
17 pages
COMP3.RegEx
No ratings yet
COMP3.RegEx
10 pages
Chapter - 11 - Regular Expressions
100% (1)
Chapter - 11 - Regular Expressions
10 pages
Sys LW-08EN Regex-Filters
No ratings yet
Sys LW-08EN Regex-Filters
31 pages
Lec 06 - Regular Expression
No ratings yet
Lec 06 - Regular Expression
19 pages
Lecture 9
No ratings yet
Lecture 9
26 pages
Lecture 9 Python
No ratings yet
Lecture 9 Python
8 pages
Supplement Python Regular Expression
No ratings yet
Supplement Python Regular Expression
6 pages
Regular Expressions: Python For Everybody
No ratings yet
Regular Expressions: Python For Everybody
34 pages
How To Write Regular Expressions?: What Is A Regular Expression and What Makes It So Important?
No ratings yet
How To Write Regular Expressions?: What Is A Regular Expression and What Makes It So Important?
2 pages
Regex
No ratings yet
Regex
24 pages
UNIT - 4 REGEX
No ratings yet
UNIT - 4 REGEX
28 pages
Unix Regular Expression
No ratings yet
Unix Regular Expression
7 pages
REGEX in Data Analytics
No ratings yet
REGEX in Data Analytics
5 pages
REGULAR EXPRESSIONS Workbook
No ratings yet
REGULAR EXPRESSIONS Workbook
8 pages
9.RegEx
No ratings yet
9.RegEx
57 pages
Python Regular Expression
100% (1)
Python Regular Expression
31 pages
Lecture03 Regular Expressions 20092024 012539pm
No ratings yet
Lecture03 Regular Expressions 20092024 012539pm
36 pages
Python Regular Expressions
No ratings yet
Python Regular Expressions
14 pages
Lec 07-II-DSFa23
No ratings yet
Lec 07-II-DSFa23
44 pages
3 REGULAR EXPRESSION
No ratings yet
3 REGULAR EXPRESSION
15 pages
Regular Expressions: SESSION - 14 - 15 - 16
No ratings yet
Regular Expressions: SESSION - 14 - 15 - 16
42 pages
Regular Expression
No ratings yet
Regular Expression
18 pages
Regular Expressions
No ratings yet
Regular Expressions
4 pages
Quiz 3 - Unt 2
No ratings yet
Quiz 3 - Unt 2
5 pages
RegEx-in-Python
No ratings yet
RegEx-in-Python
5 pages
Regular Expressions
No ratings yet
Regular Expressions
35 pages
Lecture 6 Re Basics
No ratings yet
Lecture 6 Re Basics
12 pages
Regular Expressions: Python For Everybody
No ratings yet
Regular Expressions: Python For Everybody
34 pages
Python Regex
No ratings yet
Python Regex
8 pages
Regular Expressions Guide and Practice
No ratings yet
Regular Expressions Guide and Practice
21 pages
9.RegEx (1)
No ratings yet
9.RegEx (1)
57 pages
python_reg_expressions
No ratings yet
python_reg_expressions
8 pages
CHAPTER 10
No ratings yet
CHAPTER 10
28 pages
Python Module-41
No ratings yet
Python Module-41
56 pages
Subtitle
No ratings yet
Subtitle
3 pages
UngHuynhPhuc_ITITIU21284_L6
No ratings yet
UngHuynhPhuc_ITITIU21284_L6
13 pages
Lec 07 II Dsfa23
No ratings yet
Lec 07 II Dsfa23
44 pages
Lesson 2: Matching Single Characters
No ratings yet
Lesson 2: Matching Single Characters
7 pages
Regular Expressions: Regular Expression Syntax in Python
No ratings yet
Regular Expressions: Regular Expression Syntax in Python
11 pages
Regular Exp
No ratings yet
Regular Exp
6 pages
Unit-3 - Regular Expression
No ratings yet
Unit-3 - Regular Expression
15 pages
Regular Expressions in Python
No ratings yet
Regular Expressions in Python
12 pages
Python Reg Expressions PDF
No ratings yet
Python Reg Expressions PDF
8 pages
Regex Case Interview Guide
No ratings yet
Regex Case Interview Guide
10 pages
2 Regular Expression
No ratings yet
2 Regular Expression
23 pages
Regex Cheat Sheet
No ratings yet
Regex Cheat Sheet
10 pages
Regular Expression Tutorial: What Regular Expressions Are Exactly - Terminology
No ratings yet
Regular Expression Tutorial: What Regular Expressions Are Exactly - Terminology
42 pages
Regex
100% (1)
Regex
42 pages
Java Regular Expression Final
No ratings yet
Java Regular Expression Final
68 pages
DOC4
No ratings yet
DOC4
67 pages
Ian Talks Regex A-Z
From Everand
Ian Talks Regex A-Z
Ian Eress
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Writing User Interfaces With WPK
No ratings yet
Writing User Interfaces With WPK
12 pages
Job Description
No ratings yet
Job Description
4 pages
TAW10 ABAP Workbench Fundamentals Part I & II
No ratings yet
TAW10 ABAP Workbench Fundamentals Part I & II
23 pages
Fortran
No ratings yet
Fortran
70 pages
Gluware OnePager Intelligent Network Automation
100% (1)
Gluware OnePager Intelligent Network Automation
2 pages
Readme
No ratings yet
Readme
7 pages
SET2 Notes
No ratings yet
SET2 Notes
4 pages
Log Cat 1727594295005
No ratings yet
Log Cat 1727594295005
21 pages
Sap Testing Tutorial
33% (3)
Sap Testing Tutorial
16 pages
Mips 1
No ratings yet
Mips 1
185 pages
Python Postgresql Tutorial
No ratings yet
Python Postgresql Tutorial
47 pages
Aditya 24
No ratings yet
Aditya 24
1 page
Cs1335 Documentation Readthedocs Io en Latest
No ratings yet
Cs1335 Documentation Readthedocs Io en Latest
115 pages
JUnit Eclipse
No ratings yet
JUnit Eclipse
34 pages
Source GuardianUser_Manual
No ratings yet
Source GuardianUser_Manual
138 pages
Week 10 - CSS Preprocessors
No ratings yet
Week 10 - CSS Preprocessors
23 pages
Python Snap7
No ratings yet
Python Snap7
67 pages
Manual Testing
No ratings yet
Manual Testing
36 pages
Pattern Matching and Replacement
No ratings yet
Pattern Matching and Replacement
3 pages
What Is Arduino?
No ratings yet
What Is Arduino?
4 pages
Cara Membuat Aplikasi Hitung Mundur Di VB 6
No ratings yet
Cara Membuat Aplikasi Hitung Mundur Di VB 6
3 pages
Abap 7.4
No ratings yet
Abap 7.4
14 pages
DURGA SOFTWARE SOLUTIONS - Java Contants Topics
No ratings yet
DURGA SOFTWARE SOLUTIONS - Java Contants Topics
1 page
Excel Lesson - Module 2
No ratings yet
Excel Lesson - Module 2
16 pages
Final Set
No ratings yet
Final Set
4 pages
Rdbms 2
No ratings yet
Rdbms 2
15 pages
debian-faq.en
No ratings yet
debian-faq.en
76 pages
Kendriya Vidyalaya, Neist: A Computer Science Project
No ratings yet
Kendriya Vidyalaya, Neist: A Computer Science Project
23 pages

Module 4 - Regular Expressions

Uploaded by

Module 4 - Regular Expressions

Uploaded by

Regular expressions

In computing, a regular expression, also referred to as “regex” or

output would be:

Regular expressions make use of special characters with specific

• Most commonly used metacharacter is dot, which matches any character.

If we don’t know the exact number of characters between two

Replace the pattern by

• findall() will not return a match object but a list of strings

• If there are groups in the regular expression, then findall()

• Each tuple represents a found match

Here, the line should start with capital letters, followed by 0

or more characters, but must end with any digit.

From [email protected] Sat Jan 5 09:14:16 2008

x = re.findall('^From .* ([0-9][0-9]):', line)

x = re.findall('^From .* ([0-9]{2}):', line)

There is a command-line program built into Unix

grep (Generalized Regular Expression Parser) that behaves similar to search()

• The * (called the star or asterisk) means “match zero or more

• It can be completely absent or repeated over and over again

• The + (or plus) means match one or more

• The character class [0-5] will match only the numbers 0 to 5

You might also like