0% found this document useful (0 votes)

7 views37 pages

Module 4 - Regular Expressions1

patterns

Uploaded by

archanabk.research

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views37 pages

Module 4 - Regular Expressions1

patterns

Uploaded by

archanabk.research

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 37

Regular expressions

Regular Expressions

In computing, a regular expression, also referred to as “regex” or

“regexp”, provides a concise and flexible means for matching
strings of text, such as particular characters, words, or patterns of
characters.
A regular expression is written in a formal language that can
be interpreted by a regular expression processor.
REGULAR EXPRESSIONS

output would be:

hello, how are you?
how about you?
REGULAR EXPRESSIONS

Regular expressions make use of special characters with specific

meaning. In the following example, we make use of caret (^)
symbol, which indicates beginning of the line.
Character Matching in Regular Expressions
Python provides a list of meta-characters to match search strings. Table shows the
details of few important metacharacters.
Table: List of Important Meta-
Characters
Table: Examples for Regular
Expressions
Example

• Most commonly used metacharacter is dot, which matches any character.

• Consider the following example, where the regular expression is for searching lines which
starts with I and has any two characters (any character represented by two dots) and then
has a character m.
Example

If we don’t know the exact number of characters between two

characters (or strings), we can make use of dot and + symbols
together.
Pattern to extract lines starting with the word
From (or from) and ending with edu.

import re
fhand = open('mbox-short.txt')
for line in fhand:
line = line.rstrip()
pattern = ‘^[Ff]rom.*edu$’
if re.search(pattern, line):
print(line)
Pattern to extract lines ending with
any digit

Replace the pattern by

following string, rest of the
program will remain the same.

pattern = ‘[0-9]$’
Character matching in regular
expressions
Search for lines that start with From and
have an at sign
The findall() Method

• search() will return a match object of the first matched text in the
searched string

• the findall() method will return the strings of every match in the
searched string.

• findall() will not return a match object but a list of strings

• Each string in the list is a piece of the searched text that matched the
regular expression.
The findall() Method

• If there are groups in the regular expression, then findall()

will return a list of tuples

• Each tuple represents a found match

• Its items are the matched strings for each group in the
regex
Extracting data using regular
expressions
Search for lines that have an @ sign
between characters
Combining searching and extracting
Example
Search for lines that start with 'X' followed by any non whitespace
characters and ':' followed by a space and any number. The number
can include a decimal . Then print the number if it is greater than
zero.
Example
Search for lines that start with 'Details: rev=' followed
by numbers and '.' Then print the number if it is greater
than zero
Example
Search for lines that start with From and a character followed by a
two digit number between 00 and 99 followed by ':' Then print the
number if it is greater than zero
Escape character
Using Not

pattern = ‘^[^a-z0-9]+’
Here, the first ^ indicates we want something to match in the beginning of
a line. Then, the ^ inside square-brackets indicate do not match any single
character within bracket. Hence, the whole meaning would be – line
must be started with anything other than a lower-case alphabets and digits.
Start with upper case letters and end with digits

pattern = '^[A-Z].*[0-9]$'

Here, the line should start with capital letters, followed by 0

or more characters, but must end with any digit.

The file mbox-short.txt has lines like:

From [email protected] Sat Jan 5 09:14:16 2008

Here, we would like to extract only the hour 09. That is, we would like
only two digits representing hour. Hence, we need to modify our
expression as:

x = re.findall('^From .* ([0-9][0-9]):', line)

Here, [0-9][0-9] indicates that a digit should appear only two times.
The alternative way of writing this would be:

x = re.findall('^From .* ([0-9]{2}):', line)

Escape Character
import re

x = 'We just received $10.00 for cookies.’

y = re.findall('\$[0-9.]+',x)

Output:

['$10.00']

Here, we want to extract only the price $10.00. As, $ symbol is

a metacharacter, we need to use \ before it. So that, now $ is
treated as a part of matching string, but not as metacharacter.
Unix/Linux Users
Support for searching files using regular expressions was built into the Unix OS.

There is a command-line program built into Unix

grep (Generalized Regular Expression Parser) that behaves similar to search()

function.

Note that, grep command does not support the non-blank character \S, hence
we need to use [~] indicating not a white-space.
Matching Zero or More with the
Star

• The * (called the star or asterisk) means “match zero or more

• The group that precedes the star can occur any number of times in the text

• It can be completely absent or repeated over and over again

Matching Zero or More with the
Star
Matching One or More with the
Plus

• The + (or plus) means match one or more

• The group preceding a plus must appear at least once
• The group preceding plus is not optional
The findall() Method
Character classes
Character classes
• Used for shortening regular expressions

• The character class [0-5] will match only the numbers 0 to 5

Character classes

História do pensamento chinês 1st Edition Anne Cheng download
100% (5)
História do pensamento chinês 1st Edition Anne Cheng download
39 pages
9.RegEx
No ratings yet
9.RegEx
57 pages
5A - Regex
No ratings yet
5A - Regex
32 pages
Python Module-41
No ratings yet
Python Module-41
56 pages
python unit-3
No ratings yet
python unit-3
23 pages
Lec 07 II Dsfa23
No ratings yet
Lec 07 II Dsfa23
44 pages
9.RegEx (1)
No ratings yet
9.RegEx (1)
57 pages
11.1 - Regular Expressions
No ratings yet
11.1 - Regular Expressions
14 pages
9Python-Simple-Character-Matches
No ratings yet
9Python-Simple-Character-Matches
19 pages
Module 4 - Regular Expressions
No ratings yet
Module 4 - Regular Expressions
35 pages
Regular Expression 01
No ratings yet
Regular Expression 01
48 pages
06 - Regular Expressions and Network Programming
No ratings yet
06 - Regular Expressions and Network Programming
55 pages
Lecture 11 Regular Expressions
No ratings yet
Lecture 11 Regular Expressions
17 pages
Lec 07-II-DSFa23
No ratings yet
Lec 07-II-DSFa23
44 pages
UNIT - 4 REGEX
No ratings yet
UNIT - 4 REGEX
28 pages
OPC PLUS
No ratings yet
OPC PLUS
12 pages
Lecture03 Regular Expressions 20092024 012539pm
No ratings yet
Lecture03 Regular Expressions 20092024 012539pm
36 pages
What Is Crypto-Currency ?
No ratings yet
What Is Crypto-Currency ?
13 pages
Regular Expression l
No ratings yet
Regular Expression l
20 pages
Regular Expressions in Python
No ratings yet
Regular Expressions in Python
12 pages
0004 Dawson ZIM Published Manuscript
No ratings yet
0004 Dawson ZIM Published Manuscript
10 pages
Chapter - 11 - Regular Expressions
100% (1)
Chapter - 11 - Regular Expressions
10 pages
Python Regular Expression
100% (1)
Python Regular Expression
31 pages
Marketing Enviornment, Collecting Information and Demand Forecasting Unit-03
No ratings yet
Marketing Enviornment, Collecting Information and Demand Forecasting Unit-03
14 pages
Regular Expression
No ratings yet
Regular Expression
21 pages
Lecture 9 Python
No ratings yet
Lecture 9 Python
8 pages
Python 201 - (Slightly) Advanced Python Topics
No ratings yet
Python 201 - (Slightly) Advanced Python Topics
69 pages
Data Analysis Using Python Lab Ex3
No ratings yet
Data Analysis Using Python Lab Ex3
27 pages
CHAPTER 10
No ratings yet
CHAPTER 10
28 pages
Regular Expression
No ratings yet
Regular Expression
20 pages
Lec 06 - Regular Expression
No ratings yet
Lec 06 - Regular Expression
19 pages
Regular Expression
No ratings yet
Regular Expression
18 pages
Regular Expressions: Python For Everybody
No ratings yet
Regular Expressions: Python For Everybody
34 pages
Module5_RegularExpressions
No ratings yet
Module5_RegularExpressions
10 pages
8 SS7 vulnerabilities you need to know about - Cellusys
No ratings yet
8 SS7 vulnerabilities you need to know about - Cellusys
6 pages
W Land Holdings Inc. vs. Starwood Hotels and Resorts Worldwide Inc. Ipc No. 14 2009 00143 May 11 2012
No ratings yet
W Land Holdings Inc. vs. Starwood Hotels and Resorts Worldwide Inc. Ipc No. 14 2009 00143 May 11 2012
11 pages
RegEx-in-Python
No ratings yet
RegEx-in-Python
5 pages
Wendy's Seasoned Potatoes Copycat Recipe
No ratings yet
Wendy's Seasoned Potatoes Copycat Recipe
3 pages
Regular
No ratings yet
Regular
9 pages
Sys LW-08EN Regex-Filters
No ratings yet
Sys LW-08EN Regex-Filters
31 pages
Chapter Two Education and Health in Economic Development
No ratings yet
Chapter Two Education and Health in Economic Development
13 pages
Mittal Steel in 2006: Changing The Global Steel Game: Presented By: Group 12
No ratings yet
Mittal Steel in 2006: Changing The Global Steel Game: Presented By: Group 12
6 pages
Python Reg Expressions PDF
No ratings yet
Python Reg Expressions PDF
8 pages
Design and Implementation of Encrypted SDR and Study of Noise in High Level System Architecture Using MATLAB
No ratings yet
Design and Implementation of Encrypted SDR and Study of Noise in High Level System Architecture Using MATLAB
6 pages
Subtitle
No ratings yet
Subtitle
3 pages
python_reg_expressions
No ratings yet
python_reg_expressions
8 pages
Lecture 6 Re Basics
No ratings yet
Lecture 6 Re Basics
12 pages
COMP3.RegEx
No ratings yet
COMP3.RegEx
10 pages
PP - Chapter - 4
No ratings yet
PP - Chapter - 4
15 pages
IGTM (7 - 10 Year Treasury Bonds GBP)
No ratings yet
IGTM (7 - 10 Year Treasury Bonds GBP)
4 pages
Tiger Times: Principals' Message
No ratings yet
Tiger Times: Principals' Message
8 pages
An Introduction To Regular Expressions (9781492082569)
100% (1)
An Introduction To Regular Expressions (9781492082569)
17 pages
Regular Expressions
No ratings yet
Regular Expressions
5 pages
Entrance Exam
No ratings yet
Entrance Exam
3 pages
Station Designer HOneywell HC900 PLC
No ratings yet
Station Designer HOneywell HC900 PLC
388 pages
45 The Matching Characters
No ratings yet
45 The Matching Characters
3 pages
Regular Expressions: Regular Expression Syntax in Python
No ratings yet
Regular Expressions: Regular Expression Syntax in Python
11 pages
Progress Report - ReadTheory-Jordan Garcia Anderson
No ratings yet
Progress Report - ReadTheory-Jordan Garcia Anderson
2 pages
Kevin Rodriguez: Cell: 510-693-0125
No ratings yet
Kevin Rodriguez: Cell: 510-693-0125
2 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
Module3 RegularExpressions
No ratings yet
Module3 RegularExpressions
8 pages
Unit-3 - Regular Expression
No ratings yet
Unit-3 - Regular Expression
15 pages
AAHS v56n1 B-58 Story v2
100% (6)
AAHS v56n1 B-58 Story v2
10 pages
Mis Audit of Microsoft: System Analysis Project
No ratings yet
Mis Audit of Microsoft: System Analysis Project
19 pages
Regex
No ratings yet
Regex
24 pages
Thermodynamics-Ii: Lab Manual
No ratings yet
Thermodynamics-Ii: Lab Manual
11 pages
Regular Expression Python
No ratings yet
Regular Expression Python
23 pages
Signal Systems Lab Manual PDF
50% (2)
Signal Systems Lab Manual PDF
72 pages
Assignment 1 - HOTE 66
No ratings yet
Assignment 1 - HOTE 66
5 pages
Regex Case Interview Guide
No ratings yet
Regex Case Interview Guide
10 pages
Regular Exp
No ratings yet
Regular Exp
6 pages
Regular Expressions Python
No ratings yet
Regular Expressions Python
26 pages
Howto Regex PDF
No ratings yet
Howto Regex PDF
20 pages
Python Regex: Re - Match, Re - Search, Re - Findall With Example
No ratings yet
Python Regex: Re - Match, Re - Search, Re - Findall With Example
10 pages
65 An Empirical Study of The Impact of Knowledge Management On Organizational Performance Almashari 2002
No ratings yet
65 An Empirical Study of The Impact of Knowledge Management On Organizational Performance Almashari 2002
10 pages
Regular Expressions: Python For Everybody
No ratings yet
Regular Expressions: Python For Everybody
34 pages
REGEX in Data Analytics
No ratings yet
REGEX in Data Analytics
5 pages
General ACD
No ratings yet
General ACD
41 pages
How To Write Regular Expressions?: What Is A Regular Expression and What Makes It So Important?
No ratings yet
How To Write Regular Expressions?: What Is A Regular Expression and What Makes It So Important?
2 pages
Python Regex
No ratings yet
Python Regex
8 pages
Two Exercises About The is-LM Model
No ratings yet
Two Exercises About The is-LM Model
2 pages
Birthday Party: Crossword
No ratings yet
Birthday Party: Crossword
3 pages
Module 3 Regular Expressions
No ratings yet
Module 3 Regular Expressions
8 pages
Designing A Container Terminal Yard
100% (5)
Designing A Container Terminal Yard
108 pages
Python Regular Expressions
No ratings yet
Python Regular Expressions
14 pages
Microteaching
No ratings yet
Microteaching
12 pages
De2435 5780
No ratings yet
De2435 5780
3 pages
Epson Printer Stress Test Image
No ratings yet
Epson Printer Stress Test Image
1 page
Ian Talks Regex A-Z
From Everand
Ian Talks Regex A-Z
Ian Eress
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)

Module 4 - Regular Expressions1

Uploaded by

Module 4 - Regular Expressions1

Uploaded by

Regular expressions

In computing, a regular expression, also referred to as “regex” or

output would be:

Regular expressions make use of special characters with specific

• Most commonly used metacharacter is dot, which matches any character.

If we don’t know the exact number of characters between two

Replace the pattern by

• findall() will not return a match object but a list of strings

• If there are groups in the regular expression, then findall()

• Each tuple represents a found match

Here, the line should start with capital letters, followed by 0

or more characters, but must end with any digit.

From [email protected] Sat Jan 5 09:14:16 2008

x = re.findall('^From .* ([0-9][0-9]):', line)

x = re.findall('^From .* ([0-9]{2}):', line)

x = 'We just received $10.00 for cookies.’

Here, we want to extract only the price $10.00. As, $ symbol is

There is a command-line program built into Unix

grep (Generalized Regular Expression Parser) that behaves similar to search()

• The * (called the star or asterisk) means “match zero or more

• It can be completely absent or repeated over and over again

• The + (or plus) means match one or more

• The character class [0-5] will match only the numbers 0 to 5

You might also like