0% found this document useful (0 votes)
19 views16 pages

RegularExpressions

This document discusses regular expressions (regex or regexp), which are patterns used to match character combinations in strings. It covers basic regex patterns like character sets, quantifiers, anchors. It also discusses more advanced concepts like grouping, alternation, backreferences and substitution.

Uploaded by

saisuraj1510
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views16 pages

RegularExpressions

This document discusses regular expressions (regex or regexp), which are patterns used to match character combinations in strings. It covers basic regex patterns like character sets, quantifiers, anchors. It also discusses more advanced concepts like grouping, alternation, backreferences and substitution.

Uploaded by

saisuraj1510
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Regular Expressions

Regular Expressions
• A regular expression, regex or regexp is in theoretical computer science and
formal language theory.

• In computer science, RE is a language used for specifying text search string.


Regular Expressions
• String: Any sequence of alphanumeric characters
– Letters, numbers, spaces, tabs, punctuation marks

• Regular Expression: Formula in algebraic notation for specifying a


set of strings.

• Regular Expression Search


– Pattern: specifying the regular expression for set of strings we want to
search for.
– Corpus: the texts we want to search through
– E.g., the UNIX grep command
Regular Expressions
Basic Regular Expression Patterns

✓ /abc/ - String containing substring

✓ The use of the brackets [] to specify a disjunction of characters.

✓ The use of the brackets [] plus the dash - to specify a range.


Regular Expressions
Basic Regular Expression Patterns

✓ Uses of the caret ^ for negation or just to mean ^

✓ The question-mark ? marks optionality of the previous expression.

✓ The use of period . to specify any character


Regular Expressions

✓ Disjunction
/cat|dog/

✓ Grouping
/pupp(y|ies)/
Regular Expressions
Advanced Operators

Some characters that need to be backslashed


Regular Expressions
Advanced Operators

Regular expression operators for counting

[a-z]*
[0-9]+ GO{2}D
BOOKS? GO{1,2}D
Operator Precedence Hierarchy

1. Parentheses ()
2. Counters * + ? {}
3. Sequence of Anchors ^my end$
4. Disjunction |
Regular Expressions
A Simple Example

• To find the English article the


/the/

/[tT]he/

/\b[tT]he\b/

/[^a-zA-Z][tT]he[^a-zA-Z]/
Regular Expressions
A More Complex Example

“Any PC with more than 500 MHz and 32 Gb of disk space for less than $1000”

/$[0-9]+/

/$[0-9]+\.[0-9][0-9]/

/\b$[0-9]+(\.[0-9][0-9])?\b/
Regular Expressions
A More Complex Example
Any PC with more than 500 MHz and 32 Gb of disk space for less than $1000”

/\b[0-9]+(MHz|[Mm]egahertz|GHz|[Gg]igahertz)\b/

/\b[0-9]+(\.[0-9]+)? (Gb|[Gg]igabytes?)\b/

/\b(Mac|Macintosh|Apple)\b/

/\b[0-9]+ (Mb|[Mm]egabytes?)\b/
Regular Expressions
Advanced Operators

Aliases for common sets of characters

\d{3}
\d{2,3}
Regular Expressions
Regular Expression Memory

“The bigger they were, the bigger they will be”,

NOT “The bigger they were, the faster they will be”

/the (.*)er they were, the \1er they will be/


Regular Expressions
Regular Expression Memory

“The bigger they were, the bigger they were”,

NOT “The bigger they were, the bigger they will be”

/the (.*)er they (.*), the\1er they \2/


Memory
Regular Expressions
Regular Expression Substitution

s/regexp1/regexp2/

E.g. the 35 boxes → the <35> boxes

s/([0-9]+)/<\1>/

You might also like