0% found this document useful (0 votes)
18 views67 pages

DOC4

Regular expressions (regex) are patterns used to search and manipulate text strings, enhancing text processing capabilities. They come in two types: Basic Regular Expressions (BRE) and Extended Regular Expressions (ERE), each with specific meta characters and functionalities. Regex is commonly utilized in Unix/Linux commands like grep, sed, and awk for text searching and manipulation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views67 pages

DOC4

Regular expressions (regex) are patterns used to search and manipulate text strings, enhancing text processing capabilities. They come in two types: Basic Regular Expressions (BRE) and Extended Regular Expressions (ERE), each with specific meta characters and functionalities. Regex is commonly utilized in Unix/Linux commands like grep, sed, and awk for text searching and manipulation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

Regular Expression

Palani Karthikeyan
[email protected]
What are regular expressions?
● A regular expression is a pattern that describes a
set of strings.
● Regular expressions are used to search and
manipulate the text, based on the patterns.
● A regular expression, often shortened to “regex” or
“regexp”.
● Regexes enhance the ability to meaningfully
process text content, especially when combined
with other commands.
grep ,sed,awk
● Usually, regular expressions are included in the
grep,sed and awk in the following format:
● grep [options] [regexp] [inputfile]
● In sed : sed [option] '/[regexp]action/' [inputfile]
● In awk: awk [option] '/[regexp]{Action}' [inputfile]
BRE & ERE
● Two types of regular expression feature in
unix/Linux shell
● Basic Regular Expression – BRE
● Extended Regular Expression – ERE
BRE
● BRE – following meta characters are used
● . (dot) Matches any single character.
● ^ match expression at the start of a line, as ^PATTERN
● $ match expression at the end of a line, as in PATTERN$.
● \ (Back Slash) = turn off the special meaning of the next character, as in \^
● [ ] (Brackets)=match any one of the enclosed characters
● [^ ]= match any one character except those enclosed in [ ]
● * (Asterisk) = match zero or more of the preceding character or expression
● ^PATTERN$ = match PATTERN only in single line
● [-]=Character ranges as [A-Z] [0-9] [a-z] [A-Za-z0-9]
ERE
● ERE – Following meta characters are used.
● ? means that the preceding item is optional, and if found, will be matched at the
most, once.
● + means the preceding item will be matched one or more times.
● {n} means the preceding item is matched exactly n times

{n,} means the item is matched n or more times.

{n,m} means that the preceding item is matched at least n times, but not more
than m times.

{,m} means that the preceding item is matched, at the most, m times.
● | (alternation) operator means that the pattern containing this operator separately
matches the parts on either side of it; if either one is found, the line containing it is
a match.

( ) Grouping means that ( ) to group several patterns to behave as one.
ERE
● In general ERE supports following operations
– Alternative Match Patterns
– Grouping Alternatives
– Quantifiers
Alternative Match Patterns

● Alternative Match Pattern means that you can


specify a series of alternatives for a pattern
using | to separate them.
● |(called alternation) is equivalent to an “or” in
regular expression.
● Alternatives are checked from left to right, so
the first alternative that matches is the one
that’s used.
Grouping Alternatives

● Grouping “( ) “ allows parts of a regular


expression to be treated as a single unit.
● Parts of a regular expression are grouped by
enclosing them in parentheses.
● Used to group similar terms by their common
characters and only specified the differences.
● The pairs of parentheses are numbered from left
to right by the positions of the left parentheses.
Quantifiers

● Quantifiers says how many times something


may match,instead of the default of matching
just once.
● You can use quantifier to specify that a pattern
must match a specific number of times.
● Quantifiers in a regular expression are like
loops in a program.
Quantifiers (Contd..)
character Description

* It indicates that the string Immediately to


match 0 or more times the left should
be matched zero or more times in order to
be evaluated
as a true.
Example:-
$var =~ /st*/ # Will match for the strings
like
“st”, ”sttr”, “ sts ”, “star”, “son “....
The regexp “a*” will search for a followed
by either “a” or any other
character.
It matches all strings which
contain the character “a”
Quantifiers (Contd..)
character Description

+ It indicates that the string Immediately to


the left should
match 1 or more times be matched one or more times in order to
be evaluated as
a true.

Example:-
$var =~ /st+/ # Will match for the strings
like “st”,”sttr”, “sts” ,”star “, but not “son”.
Quantifiers (Contd..)
character Description

? It indicates that the string Immediately to


the left should be matched zero or one
times in order to be evaluated as a true.
match 1 or 0 times Example : -

$var =~ /st?r/ # will match either “star” or


“sttr”.

$var =~ /comm?a/ # will match either


“coma” or “comma”
Quantifiers (Contd..)
character Description

{} It indicates that how many times the string


immediately to the left should be matched.

Example : -
{n} - should match exactly n times.
{n,} - should match at least n times
{n, m} - Should match at least n times but
not more than m times.
Example :
$var =~ /mn{2,4}p/ # will match “mnnp”,
“mnnnp”, ”mnnnnp” .
Making Quantifiers Less Greedy

● To make Quantifiers less greedy –that is ,to match the


minimum number of times possible –you follow the
quantifier with a ?
● *? Matches zero or more times.
● +? Matches one or more times.
● ?? Matches zero or one times.
● {n}? Matches n times.
● {n,}? Matches at least n times
● {m,n} Matches at least n times but more than m times.
BRE vs ERE
● In basic regular expressions the
metacharacters "?", "+", "{", "|", "(", and ")" lose
their special meaning; instead use the
backslashed versions "\?", "\+", "\{", "\|", "\(",
and "\)".
● In ERE options
● grep -E
● sed -r
Examples using grep
● we now exclusively want to display lines starting with
the string "root":
● grep ^root /etc/passwd
● root:x:0:0:root:/root:/bin/bash
● If we want to see which accounts have no shell
assigned whatsoever, we search for lines ending in ":"
● grep :$ /etc/passwd
● news:x:9:13:news:/var/spool/news:
Character classes
● grep [yf] /etc/group
● sys:x:3:root,bin,adm
● tty:x:5:
● mail:x:12:mail,postfix
● ftp:x:50:
● nobody:x:99:
● floppy:x:19:
● xfs:x:43:
● nfsnobody:x:65534:
● postfix:x:89:


dog matches the string "dog"

[dog]matches matches one character: a "d" an "o" or a "g"

[dog]* matches matches a string of zero or more characters from the set {"d" an "o" or a "g"}

(dog|cat) matches the string "dog" or the string "cat"

dog.*cat matches the string "dog" followed by the string "cat" somewhere later in the string

x(dog|cat)x matches the string "dog" or the string "cat" between two "x"s

xx* matches a string of one or more "x"s

x+matches a string of one or more "x"s

x(dog|cat)?x matches two "x"s with optionally the string "dog" or the string "cat" between the "x"'s

[aeiou] matches a single vowel

[A-Z]+ matches a string of one or more uppercase characters

[az-]+ matches a string of one characters from the set or three characters "a", "z", "-"

[^a-z]+ matches a string of one or more characters that are not lowercaase letters

"[a-z]" in flex matches exactly the five character string "[a-z]"

[a-zA-Z][a-zA-Z0-9]*matches a letter optionally followed by letters or digits

[1-9][0-9]*|0 matches a positive integer with no leading zero except when the number is zero

[+-]?[0-9]+ matches an integer with optional sign (note that leading zeroes are allowed

([0-9].)*matches an even number of characters where every odd numbered character is a digit

[+-]?[1-9][0-9]*|0 matches an integer with no leading zero except when the number is zero. The
number may have an optional sign

[\^\+\-\:\*\]] matches one of the 6 characters: "^", "+", "-", ":", "*", "]"
Regx Snaps
Thank you

You might also like