www.netskills.ac.
uk
Regular Expressions
Web Pages: Behaviour
CSC1014
Regular Expressions
A "standardised" pattern matching syntax for text
Define pattern test against input ...and second... and third...
Can appear baffling at first!
Are actually pretty logical and (relatively) straightforward to use
/^[\w]+([\.\w-]*)?@[\w]+(\.[\w-]+)*(\.[a-z]{2,3})(\.[a-z]{2,3})*?$/i
something(.something)@something.xx(or.xxx)(.xx or .xxx)
JISC Netskills
CSC1014
Regular Expressions: Building patterns
Square brackets []
Match any one of the characters or ranges in the brackets
[ae] matches one of a or e [a-z] matches any one of the lower case letters [0-9] matches any one of the digits
Caret ^ negates a range (match anything but)
[^a-z] anything but the lower case letters [^5-9] anything but the digits 5, 6, 7, 8, 9
Escape special characters with \
[\\[\\]] matches opening or closing square bracket [\.a-z] matches a dot (.) followed by a single lower case letter
JISC Netskills
CSC1014
Regular Expressions: Meta-characters
Shorthand for common ranges
Meta-character Matches . \d \D \s \S \w \W
JISC Netskills
Equivalent range N/A [0-9] [^0-9] [ \t\n\x0B\f\r] [^\s] [a-zA-Z0-9_] [^a-zA-Z0-9_]
Any character A digit A non-digit A whitespace character A non-whitespace character A word character A non-word character
CSC1014
Regular Expressions: Quantifiers
Quantifier Effect [a-z]? A letter, zero or one time [a-z]* A letter, zero or more times [a-z]+ A letter, one or more times [a-z]{n} A letter, exactly n times [a-z]{n,} A letter, at least n times [a-z]{n,m} A letter, between n and m times
JISC Netskills
CSC1014
Regular Expressions: Greediness
Quantifiers are greedy by default
Matching as many times as possible until end of string before backtracking to conclude pattern
some <b>bold</b> text
Try matching the opening <b> tag in... A simple pattern should work...
/<.+>/
But the quantifier + is greedy and keeps matching until it reaches the end of the string to find...
<b>bold</b> text
Then backtracks to finally match...
<b>bold</b> text
JISC Netskills
Backtrack to find the end of the pattern i.e. the last >
CSC1014
Regular expressions: Laziness
Append ? to the quantifier to make it lazy
/<.+?>/
Match as few times as possible before backtracking to conclude pattern
Now it backtracks after each match to complete the pattern...meaning a match occurs after the first > character
<b> <b>bold</b> text Backtrack each time to find the end of the pattern i.e. the first >
http://www.regular-expressions.info/repeat.html
JISC Netskills
CSC1014
Regular Expressions: Anchors & flags
Anchors fix expression to start/end of string or boundaries between word/non-word characters
Anchor ^ $ \b \B Matches The beginning of a string The end of a string A word boundary A non-word boundary
Flags are appended the end of an expression
Flag i g m
JISC Netskills
Matches Use case-insensitive matching Global matching (instead of stopping at first match) Multiline mode
CSC1014
Regular Expressions: JavaScript
JavaScript supports regular expressions in a couple of ways:
via the RegExp object (more powerful) via the String object (simple but less options)
The RegExp object is defined as a pattern to match RegExp object methods use/test/apply pattern where needed
var pattern = /^[a-z]+$/i; if (pattern.test(someString)){ alert("Yay") } else { alert("Boo"); } Tests string against the expression
JISC Netskills
Creates a RegExp object
CSC1014
Regular Expressions: JavaScript RegExp Methods
JavaScript regExp object two methods
Method Purpose first match (or null if no match) contains a match
thisPattern.exec(someString); Return an array of info about the thisPattern.test(someString); Return true or false if string
JISC Netskills
CSC1014
Regular Expressions: JavaScript String Methods
JavaScript string object can use regular expressions in three string matching methods
Method
someString.search(/^[a-z]+$/i); someString.replace(/^[a-z]+$/i,"X"); someString.match(/^[a-z]+$/i);
Purpose Return position of first substring match (-1 if no match) Replace the text matched by expression with string in second parameter Return and array containing all the matches for the expression
JISC Netskills
CSC1014
Regular Expressions: JavaScript Form Validation
Check for no input Pattern for letters only
if (thisForm.inputbox.value != ""){ var pattern = /^[a-z]+$/i; if (pattern.test(thisForm.inputbox.value)){ //SUCCESS :-) } else { //FAILURE :-( } }
JISC Netskills
Check user input against pattern
Act on outcome
CSC1014
Regular Expressions: Testing Tools
Constructing regular expressions can be fiddly
Try and avoid doing it in live code!
Online testing tools are very useful copy/paste final expression
Try and use a test tool using the correct expression engine i.e. JavaScript, PHP, Perl etc.
JavaScript-based
http://tools.netshiftmedia.com/regexlibrary http://www.regular-expressions.info/javascriptexample.html
General purpose (PHP-based)
JISC Netskills
http://www.spaweditor.com/scripts/regex/
CSC1014
Regular Expressions: Reference & tutorials
http://www.regular-expressions.info/tutorial.html http://www.regular-expressions.info/examples.html http://www.regular-expressions.info/reference.html http://lawrence.ecorp.net/inet/samples/regexp-intro.php http://mochikit.com/examples/mochiregexp/index.html
JISC Netskills