Lecture19 12PM

The document provides an overview of regular expressions (regex) and finite state machines (FSM), highlighting their importance in pattern matching for web applications and programming languages. It explains the concepts of formal languages, automata, and the equivalence between regex and FSM, along with practical applications and utilities in Unix. Additionally, it covers regex grammar, character classes, and basic commands for text processing languages like awk and sed.

Uploaded by

harendraj2ee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views38 pages

Lecture19 12PM

Uploaded by

harendraj2ee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Regular Expressions

with a brief intro to FSM

15-123
Systems Skills in C and Unix
Case for regular expressions
• Many web applications require pattern
matching
– look for <a href> tag for links
– Token search
• A regular expression
– A pattern that defines a class of strings
– Special syntax used to represent the class
• Eg; *.c - any pattern that ends with .c
Formal Languages
• Formal language consists of
– An alphabet
– Formal grammar

• Formal grammar defines

– Strings that belong to language

• Formal languages with formal semantics

generates rules for semantic specifications of
programming languages
Automaton
• An automaton (or automata in plural) is a
machine that can recognize valid strings
generated by a formal language.
• A finite automata is a mathematical model of
a finite state machine (FSM), an abstract
model under which all modern computers are
built.
Automaton
• A FSM is a machine that consists of a set of
finite states and a transition table.

• The FSM can be in any one of the states and

can transit from one state to another based on
a series of rules given by a transition function.
Example
What does this machine represents? Describe the kind of
strings it will accept.
Exercise
• Draw a FSM that accepts any string with even
number of A’s. Assume the alphabet is {A,B}
Build a FSM
• Stream: “Ilovecatsandmorecatsandbigcats ”
• Pattern: “cat”
Regular Expressions
Regex versus FSM
• A regular expressions and FSM’s are
equivalent concepts.
• Regular expression is a pattern that can be
recognized by a FSM.
• Regex is an example of how good theory leads
to good programs
Regular Expression
• regex defines a class of patterns
– Patterns that ends with a “*”
• Regex utilities in unix
– grep, awk, sed
• Applications
– Pattern matching (DNA)

– Web searches
Regex Engine
• A software that can process a string to find regex
matches.
• Regex software are part of a larger piece of
software
– grep, awk, sed, php, python, perl, java etc..
• We can write our own regex engine that
recognizes all “caa” in a strings
– See democode folder
• Different regex engines may not be compatible
with each other
– Perl 5 is a popular one to learn
Regex machines
• Perl can do a “decent” job with simple regex’s
• But it can fail in cases where expressions can
be of the form ____________
• One of the best regex machines was written in
C by Ken Thompson in the 70’s
– 400 lines of C code
– Superior to perl, python and other
implementations when working with real world
applications
Unix grep utility
The grep command
Simple grep examples
• grep “<a href” guna.html > output.txt
• ls | grep “guna”
• grep ‘regex’ filename
• man grep
– For more info
regex grammer
Regular Expression Grammar
• Regex grammar defines a set of rules for
finding patterns. Grammar categories
– Alternation

– Grouping

– quantification
Regular Expression Grammar
• Alternation
• The vertical bar is used to describe alternating
choices among two or more choices.
– the notation a | b | c indicates that we can choose
a or b or c as part of the string.
– Another example is that “(c|s)at” describes the
expressions “cat” or “sat”. n
Regular Expression Grammar
Grouping
Parenthesis can be used to describe the scope
and precedence of operators.
In the example above (c|s) indicates that we
can either begin with c or s but must
immediately follow by “at”
Regular Expression Grammar
• Quantification
– Quantification is the notation used to define the
number of symbols that could appear in the string.
• The most common quantifiers are
– ?, * and +
– The ? mark indicates that there is zero or one of the
previous expression.
– The “*” indicates that zero or more of the previous
expression can be accepted.
– The “+” indicates that one or more of the previous
expression can be accepted.
Examples of *, ? , +
Other facts
• . matches a single character
• .* matches any string
• [a-zA-Z]* matches any string of alphabetic
characters
• [ag].* matches any string that starts with a or g
• [a-d].* matches any string that starts with a,b,c or
d
• ^(ab) matches any string that begins with ab. In
general, to match all lines that begins with any
string use ^string
• (ab)$ matches any string that ends with ab
Finding non-matches
• To exclude a pattern
– [^class]
– Eg: [^0-9]

Group Matches
– grep ‘<h$[1-4]$>.*h$[1-3]$>’ filename
• What patterns match?
– grep ‘h$[1-4]$.*h\1’ filename
• Back-reference
Character Classes
• \d digit [0-9]
• \D non-digit [^0-9]
• \w word character [0-9a-z_A-Z]
• \W non-word character [^0-9a-z_A-Z]
• \s a whitespace character [ \t\n\r\f]
• \S a non-whitespace character [^ \t\n\r\f]
More regex notation
• {n,m} at least n but not more than m times

• {n,} – match at least n times

• {n} – match exactly n times

More examples of regex
• Find all files that begins with “guna”

• Find all files that does not begins with “guna”

• Find all files that ends with guna

• Find all directories in current folder. Write

them to an external file.
Exercise
• An email address must begin with an alpha character and can
have any combination of alpha characters and characters from
{0..9, %, _, +, -} followed by @ and a domain name {alpha-
numeric} followed by {.} and any token from the set {edu,
com, us, org, net}. Write a regex to describe this.
Summarized Facts about regex
• Two regular expressions may be
concatenated; the resulting regular expression
matches any string formed by concatenating
two substrings that respectively match the
concatenated sub expressions.
• Two regular expressions may be joined by the
infix operator | the resulting regular
expression matches any string matching either
sub expression
Summarized Facts about regex
• Repetition takes precedence over concatenation, which
in turn takes precedence over alternation. A whole sub
expression may be enclosed in parentheses to override
these precedence rules
• The backreference \n, where n is a single digit, matches
the substring previously matched by the nth
parenthesized sub expression of the regular
expression.
• In basic regular expressions the metacharacters ?, +, {,
|, (, and ) lose their special meaning; instead use the
backslashed versions \?, \+, \{, \|, $, and $.
Text Processing Languages
• awk
– Text processing language
– awk ‘/pattern/’ somefile
– awk '{if ($3 < 1980) print $3, " ",$5,$6,$7,$8}' somefile
• sed
– A stream editor
– sed s/moon/sun/ < moon.txt >sun.txt
• Perl
– A powerful scripting language
– We will discuss this next
Basics of sed
sed basics
• sed is a stream editor
• > sed ‘s/guna/foo/’ filename
– Replaces guna by foo in the file
• first occurrence on each line
– output sent to stdout
• > sed ‘s/guna/foo/g’ filename
– Globally replaces guna by foo in the file
• If you have special characters {.*[]^$\ }
– Precede with \
– eg: sed ‘s/guna\[me\.him\]/foobar/g’ filename
sed basics
• Replacing more than one token
– sed -e ‘s/guna/foo/g’ -e ‘s/color/colour/g’
filename
• What if / is part of the string to replace?
– Replace all afs/andrew with afs/cs
– Solution: any character immediately following s is
the delimiter
– sed ‘s#afs/andrew#afs/cs’ filename
Basics of awk
Basics of awk
• Uses
– Use information from text files to create reports
– Translating files from one format to another
– Adding functionality to “vi”
– Mathematical operations on numeric files
• awk also has a basic interpreted programming
language
• Basic commands
– General form:
• awk ‘<search pattern> {<program actions>} ‘
– awk ‘/guna/ file -- prints all lines with guna
– awk ‘/guna/’ {print $1,$2,$3} ‘ file
– awk -F',' '{if ($5=="MCS") print $2}' roster.txt
exercises
• Download an index.html file from your
favorite website
– use wget
• Change all URL’s for example, www.cnn.com to
www.foxnews.com
– use sed
Coding Examples

20 Data Annotation Interview Questions and Answers
0% (1)
20 Data Annotation Interview Questions and Answers
4 pages
Use Vim Like A Pro: Tim Ottinger
No ratings yet
Use Vim Like A Pro: Tim Ottinger
33 pages
Regex Slides PDF
No ratings yet
Regex Slides PDF
435 pages
130-Linux Shell Scripting
No ratings yet
130-Linux Shell Scripting
27 pages
$address M/ (/D . ) /N ( (A-Z) (2) ) (/D (5) ) - ? (/D (0,5) )
No ratings yet
$address M/ (/D . ) /N ( (A-Z) (2) ) (/D (5) ) - ? (/D (0,5) )
98 pages
Unit 3 Linux Regular Expression
No ratings yet
Unit 3 Linux Regular Expression
15 pages
Regex Cheatsheet
No ratings yet
Regex Cheatsheet
6 pages
Regular Expressions: Exceptions in A Character Set
No ratings yet
Regular Expressions: Exceptions in A Character Set
10 pages
UNIX Special Characters
No ratings yet
UNIX Special Characters
6 pages
Unit-3 Usp
No ratings yet
Unit-3 Usp
82 pages
Week 5 Bash
No ratings yet
Week 5 Bash
63 pages
Using Grep, TR and Sed With Regular Expressions
No ratings yet
Using Grep, TR and Sed With Regular Expressions
7 pages
Unix Regular Expression
No ratings yet
Unix Regular Expression
7 pages
Regular Expressions
100% (5)
Regular Expressions
94 pages
Regular Expressions and Sed & Awk
No ratings yet
Regular Expressions and Sed & Awk
14 pages
Regular Expressions and Sed & Awk
No ratings yet
Regular Expressions and Sed & Awk
13 pages
Linux Regular Expression Tutorial - Grep Regex Example
No ratings yet
Linux Regular Expression Tutorial - Grep Regex Example
8 pages
Chapter 4 - Regular Expression
No ratings yet
Chapter 4 - Regular Expression
6 pages
DOC4
No ratings yet
DOC4
67 pages
Software Carpentry
No ratings yet
Software Carpentry
83 pages
To Ooooooooo C
No ratings yet
To Ooooooooo C
19 pages
To Ooooooooo C
No ratings yet
To Ooooooooo C
19 pages
Assignment 7
No ratings yet
Assignment 7
4 pages
Learning REGEX
No ratings yet
Learning REGEX
94 pages
Chapter Two
No ratings yet
Chapter Two
72 pages
Regex Cheat Sheet
No ratings yet
Regex Cheat Sheet
10 pages
Sys LW-08EN Regex-Filters
No ratings yet
Sys LW-08EN Regex-Filters
31 pages
Lecture 9
No ratings yet
Lecture 9
26 pages
Perl Training Regex
No ratings yet
Perl Training Regex
27 pages
Regex in A Nutshell
No ratings yet
Regex in A Nutshell
2 pages
Session10 Advanced Filters
No ratings yet
Session10 Advanced Filters
10 pages
WT - Regular Expression
No ratings yet
WT - Regular Expression
22 pages
Mod 2
No ratings yet
Mod 2
49 pages
Matching This or That: ' - ' Dog Cat Dog - Cat Dog Dog Cat Cat
No ratings yet
Matching This or That: ' - ' Dog Cat Dog - Cat Dog Dog Cat Cat
7 pages
Lab 8
No ratings yet
Lab 8
6 pages
L5 - Reg Exp
No ratings yet
L5 - Reg Exp
38 pages
David Wang Computing Science and Information Technology: Info 1211 - Operating System'S Principles and Applications
No ratings yet
David Wang Computing Science and Information Technology: Info 1211 - Operating System'S Principles and Applications
73 pages
Reg Expressions
No ratings yet
Reg Expressions
5 pages
Final Study Notes
No ratings yet
Final Study Notes
36 pages
Chapter 8: Regular Expressions
No ratings yet
Chapter 8: Regular Expressions
24 pages
Linux Regular Expression
No ratings yet
Linux Regular Expression
3 pages
Bash Regex
No ratings yet
Bash Regex
53 pages
Know Your Regular Expresions
No ratings yet
Know Your Regular Expresions
15 pages
Module 4 - Regular Expressions
No ratings yet
Module 4 - Regular Expressions
35 pages
Regular Expressions & Automata
No ratings yet
Regular Expressions & Automata
62 pages
Regex
No ratings yet
Regex
24 pages
Andrei's Regex Clinic - PHP Quebec 2009
100% (2)
Andrei's Regex Clinic - PHP Quebec 2009
209 pages
Python RegEx
No ratings yet
Python RegEx
8 pages
Lecture 2
No ratings yet
Lecture 2
70 pages
REGULAR EXPRESSIONS Workbook
No ratings yet
REGULAR EXPRESSIONS Workbook
8 pages
Perl Re Quick
No ratings yet
Perl Re Quick
9 pages
DAC - COS - Last Day Slides
No ratings yet
DAC - COS - Last Day Slides
73 pages
Chapter 10
No ratings yet
Chapter 10
28 pages
Sheet 01
No ratings yet
Sheet 01
2 pages
l6 Latest
No ratings yet
l6 Latest
16 pages
PCD Lab Manual
No ratings yet
PCD Lab Manual
28 pages
Pattern Matching - Part 01
No ratings yet
Pattern Matching - Part 01
25 pages
NLP Chapter 5
No ratings yet
NLP Chapter 5
70 pages
Grep and Sed Commands
No ratings yet
Grep and Sed Commands
17 pages
Regular Expressions in Grep Command With 10 Examples - Part I
No ratings yet
Regular Expressions in Grep Command With 10 Examples - Part I
5 pages
Java 8 Interview Question Answer For Experience
No ratings yet
Java 8 Interview Question Answer For Experience
6 pages
Return RSS Ticket
No ratings yet
Return RSS Ticket
2 pages
TrisectSolvedAll Java Prorgams
No ratings yet
TrisectSolvedAll Java Prorgams
177 pages
Java ImageDoc
No ratings yet
Java ImageDoc
9 pages
Unit 4 Python
No ratings yet
Unit 4 Python
27 pages
Icc - Basic Commands
No ratings yet
Icc - Basic Commands
73 pages
Lecture02 Scanning 1
No ratings yet
Lecture02 Scanning 1
72 pages
QP - TCS
No ratings yet
QP - TCS
3 pages
Python Learn Python Regular Expressions FAST The Ultimate Crash Course To Learning The Basics of Python Regular Expressions - (Acodemy)
100% (1)
Python Learn Python Regular Expressions FAST The Ultimate Crash Course To Learning The Basics of Python Regular Expressions - (Acodemy)
127 pages
Unit-I Introduction To Compilers: CS6660-Compiler Design Department of CSE &IT 2016-2017
No ratings yet
Unit-I Introduction To Compilers: CS6660-Compiler Design Department of CSE &IT 2016-2017
95 pages
Compiler Design Lab 2
No ratings yet
Compiler Design Lab 2
15 pages
There Are 5 Legacy Classes in Java:: 9ector Stack Properties Hashtable Dictionary
No ratings yet
There Are 5 Legacy Classes in Java:: 9ector Stack Properties Hashtable Dictionary
20 pages
Implementation of Symbol Table Using Flex On Unix Environment
No ratings yet
Implementation of Symbol Table Using Flex On Unix Environment
19 pages
Chapter 6 Homework
No ratings yet
Chapter 6 Homework
7 pages
Python Beginner
100% (1)
Python Beginner
106 pages
Ora-10g Reg. Expressions
No ratings yet
Ora-10g Reg. Expressions
3 pages
PHP Handout PF
100% (1)
PHP Handout PF
18 pages
TOC Questions
No ratings yet
TOC Questions
6 pages
Automata Theory and Logic: Regular Language & Regular Expression
No ratings yet
Automata Theory and Logic: Regular Language & Regular Expression
41 pages
TOC Assignment 1 2025-26
No ratings yet
TOC Assignment 1 2025-26
14 pages
Tivoli: Netcool/Omnibus Probe For Ca Aprisma Spectrum
No ratings yet
Tivoli: Netcool/Omnibus Probe For Ca Aprisma Spectrum
24 pages
Ruby Programming
No ratings yet
Ruby Programming
2 pages
CPanel User Guide
No ratings yet
CPanel User Guide
96 pages
Regex Recipes: by Steven R. Brandt
No ratings yet
Regex Recipes: by Steven R. Brandt
60 pages
(CS402) Solution Shanza
No ratings yet
(CS402) Solution Shanza
3 pages
CSS Final Practical QB
No ratings yet
CSS Final Practical QB
17 pages
RH033
No ratings yet
RH033
1 page
MQTT Manual
No ratings yet
MQTT Manual
5 pages
Voice Translation Rules v2
No ratings yet
Voice Translation Rules v2
21 pages
Regular Expression
No ratings yet
Regular Expression
89 pages
Linux Cheat Sheet
No ratings yet
Linux Cheat Sheet
2 pages
Unit 5
No ratings yet
Unit 5
16 pages

Lecture19 12PM

Uploaded by

Lecture19 12PM

Uploaded by

Regular Expressions

with a brief intro to FSM

• Formal grammar defines

• Formal languages with formal semantics

• The FSM can be in any one of the states and

• {n,} – match at least n times

• {n} – match exactly n times

• Find all files that does not begins with “guna”

• Find all files that ends with guna

• Find all directories in current folder. Write

You might also like