Lecture 06
Lecture 06
Lecture 6: Regex
CSE 374: Intermediate
Programming Concepts and
Tools
1
Administrivia
Sorry the poll everywhere closed over the weekend
A trailing i at the end of a regex (after the closing /) signifies a case-insensitive match
● /cal/i matches "Pascal", "California", "GCal", etc.
Quantifiers: *, +, ?
* means 0 or more occurrences
● /abc*/ matches "ab", "abc", "abcc", "abccc", ...
● /a(bc)*/ matches "a", "abc", "abcbc", "abcbcbc", ...
● /a.*a/ matches "aa", "aba", "a8qa", "a!?xyz__9a", ...
+ means 1 or more occurrences
● /Hi!+ there/ matches "Hi! there", "Hi!!! there!", ...
● /a(bc)+/ matches "abc", "abcbc", "abcbcbc", ...
? means 0 or 1 occurrences
● /a(bc)?/ matches only "a" or "abc"
Regex special characters
\ - escape following character () – groups patterns for order of operations
. – matches any single character at least once [] – contains literals to be matched, single or range
• c.t matches {cat, cut, cota} • [a-b] matches all lowercase letters
| - or, enables multiple patterns to match against ^ - anchors to beginning of line
• a|b matches {a} or {b} • ^// matches lines that start with //
* - matches 0 or more of the previous pattern (greedy match)
$ - anchors to end of line
• a* matches {, a, aa, aaa, …} • ;$ matches lines that end with ;
? – matches 0 or 1 of the previous pattern
• a? matches {, a}
+ - matches one or more of previous pattern
• a+ matches {a, aa, aaa, …}
{n} – matches exactly n repetitions of the preceding
• a{3} matches {aaa}
Character ranges: [start-end]
Inside a character set, specify a range of characters with -
● /[a-z]/ matches any lowercase letter
● /[a-zA-Z0-9]/ matches any lowercase or uppercase letter or digit
Inside a character set, - must be escaped to be matched
● /[+\-]?[0-9]+/ matches an optional + or -, followed by at least one digit
Practice: Write a regex for Student ID numbers that are exactly 7 digits and start with a 1
-- Pass --
1234567
-- Fail --
7654321
123abcd
123
1[0-9]{6}
grep with options
grep [options] [pattern] [file]
- grep “^hello” file1 #Match all lines that start with ‘hello’
- grep “done$” file1 #Match all lines that end with ‘done’
- grep “[a-e]” file1 #Match all lines that contain any of the letters a-e
- grep “ *[0-9]” file1 #Match all lines that start with a digit following
zero or more spaces. E.g: “ 1.” or “2.”
- match all lines that start with a capital letter and end with either period or comma
- .* matches any number of any character
https://stackoverflow.com/questions/17130299/whats-the-difference-between-grep-e-and-grep-e 12
Grep regex demo
CSE 374 AU 20 - KASEY CHAMPION 13