0% found this document useful (0 votes)
37 views

Quick RegEx Notes

This document discusses regular expression (regex) atoms and syntax used to define patterns for matching text. It covers special characters, character classes, quantifiers, word/string boundaries, lookarounds, conditionals, and backreferences that can be used in regex to define complex matching rules in a concise way. It also provides examples of how regex can be used for text replacement purposes.

Uploaded by

Ankur8989
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Quick RegEx Notes

This document discusses regular expression (regex) atoms and syntax used to define patterns for matching text. It covers special characters, character classes, quantifiers, word/string boundaries, lookarounds, conditionals, and backreferences that can be used in regex to define complex matching rules in a concise way. It also provides examples of how regex can be used for text replacement purposes.

Uploaded by

Ankur8989
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

ATOMs:-

\ used to mentioning special (meta) characters e.g \*, \+, \[, \]


\d Any digit (same as [0-9])
\D Any nondigit (same as [^0-9])
\w Any alphanumeric character in upper- or lower-case and
underscore (same as [a-zA-Z0-9_])
\W Any nonalphanumeric or underscore character (same as [^a-zA-
Z0-9_])
\s Any whitespace character (same as [\f\n\r\t\v])
\S Any nonwhitespace character (same as [^\f\n\r\t\v])
\Q....\E any thing in between \Q and \E will be exactly as it is
written
[\b] Backspace
\f Form feed
\n Line feed
\r Carriage return
\t Tab
\v Vertical tab
\cA represent control+A
\x1A represent ASCII value 1A in hexadecimal
\o67 represent ASCII value 67 in octal
Note:- window uses \r\n sequence for end-of-line while unix uses only \n

. any single character e.g ... any three letter word

[] any one character inside it e.g 1[ab?4-6] matches '1a',


'ab', '1?', '14', '15', '16'

^ used inside brackets [] to specify every thing except. ^ must be the


first character inside []
e.g [^0-9a-z] matches everything except numbers and lower case
alphabet

() to define an subexpression so that an repetition operator can be used as


whole.
nesting of subexpression can be done.
e.g zz(ab)+ matches zzab, zzabab, zzabababab

A subexpression value can be referenced later in a the regex using \no.


Where no. denotes position of imidiate subexpression in regex.
e.g. ([ab]+)zz\1 will match azza, bbzzbb, aaazzaaa but not aazzbb
\n can also used in replace string to denote n th substring value.
This is called back-referencing and is very useful in situations where
matching next
characters is dependent on the output what is already matched.

Branching Atoms:-
| use this to branch atoms e.g ((ab)+)|(a+)|(b+)

Specifing Repetetion of Atom(used after an Atom):-


* 0 or more e.g a*
+ 1 or more e.g a+
? 0 or 1 e.g a?
{6} looks for 6 occurrence exactly e.g [abc]{3} matches aaa, bbb, ccc
{2,6} looks for occurences b/w 2 to 6
{3,} specifies at least 3 occurences

* , + and {} are greedy operators and look for longest match.


to make them lazy so that they look for smallest match use ? symbol. like *?,
+?, {3,}?

Specifing Position:-
Specifing Word Boundries:-
\b used to match the start or end of a word. It matches only position
not any character or whitespace
e.g cat\b matches 'cat' at end of word cataaacataacat.
\B To specifically not match at a word boundary
e.g \Bcat\B matches 'cat' not at the start or end of word
cataaacataacat.

Specifieng String Boundries:-


^ matches start of string
$ matches end of string. E.g to list all rar files -- ls -R | grep
“rar$”
(?m) starting a regular expression with this symbol imposes multi mode
match.
in this mode a match is looked with in a line

Look ahead and look behind:- (this is not supported by all


implementations)
positive look-ahead or look-behind looks after or before the regex for text
that matchs the specified pattern.
What ever is looked ahead or behind is not included in match.
(?=) positive look ahead a+(?=@) matches aaa@nchf@aanbfaa@aa Note:
@ is not included in match
(?<=) positive look behind (?<=@)a+ matches aaa@nchf@aanbfaa@aa
negative look-ahead or look-behind looks after or before the regex for text
that does not matches the specified pattern.
(?!) negetive look ahead
(?<!) negative look behind

Condition based check:-(this is not supported by all


implementations)
syntax:- (?(backreference number)regex_if_backreference_found|
regex_if_not_found).
this type of regex can be used to specify which sub-regex to use based on if
previous subexpression is
found or not.
e.g (\d)[a-z]*(?(1)@|!) matches 1ashdjha@, gasfdg!

For Replacement:-
The complete matched string is represented by $& in some implementations
A portion of found string is represented by \no e.g \1
where the no. represents the n th substring
where $& is not support you can put the regex inside () brackets and the
reference the whole string as \1

Some regex implementations support the use of conversion operations via the
metacharacters listed below
\l Convert next character to lowercase
\u Convert next character to uppercase
\L Convert all characters up to \E to lowercase
\U Convert all characters up to \E to uppercase
\E Terminate \L or \U conversion

purpose find string


replace string:
to remove initial line no. from ACL "( *[0-9]+)( +)((permit)|(deny))"
"\3"
remove a complete line and line feed for notepad ++ only in extended mode
only --- use "\r\n<exact key>" replace it with NULL

to remove space b/w lines

to remove leading spaces


to remove trailing spaces

You might also like