100% found this document useful (1 vote)
237 views43 pages

Your Oracle R - : Regular Expressions in An Oracle World

Regular expressions provide a concise and flexible means for identifying strings of text of interest. They are used to search strings and find matching patterns, which can range from simple to complex. Regular expressions are found in text editors, Unix utilities, programming languages, SQL, and Oracle Application Express. They use special characters to define patterns to match, and quantifiers to define repetition. Oracle supports regular expression functions like REGEXP_LIKE, REGEXP_INSTR, REGEXP_SUBSTR, and REGEXP_REPLACE to work with regular expressions in SQL and PL/SQL.

Uploaded by

vcgundi
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
237 views43 pages

Your Oracle R - : Regular Expressions in An Oracle World

Regular expressions provide a concise and flexible means for identifying strings of text of interest. They are used to search strings and find matching patterns, which can range from simple to complex. Regular expressions are found in text editors, Unix utilities, programming languages, SQL, and Oracle Application Express. They use special characters to define patterns to match, and quantifiers to define repetition. Oracle supports regular expression functions like REGEXP_LIKE, REGEXP_INSTR, REGEXP_SUBSTR, and REGEXP_REPLACE to work with regular expressions in SQL and PL/SQL.

Uploaded by

vcgundi
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 43

Your Oracle RX -

Regular Expressions in an Oracle World

Gravenstein, Rebholtz
Definition
 regular expressions provide a concise and
flexible means for identifying strings of text of
interest, such as particular characters, words,
or patterns of characters
Regular Expressions
 Used to search strings to find matching
patterns
 Match patterns can be fairly simple to
extremely difficult
 It’s much easier to understand your own
expressions!
 Search for match starts at beginning of string
and stops when first match is found
Found In
 Text Editors
 Unix Utilities
 ed (text editor)
 Grep

 Programming Languages
 Perl
 Tcl
 And since Oracle 10g
 SQL

 Application Express
Patterns
 Generally characters and patterns represent
themselves
 Special Characters

. Matches any character


\ Escapes special characters
\n New line
\r Carriage Return
\t Tab
\s space
Sample Patterns
 Pattern
 ‘800’
 matches 800
 ‘ORA’
 matches ORA
 ‘…..-….’
 Matches
 44313-2323
 A4313-d3r3
 444-313-23234
Simple Repetition
 Quantifiers

? Makes the previous item optional – 0 or 1


times
+ Repeats the previous item 1 or more times

* Repeats the previous item 0 or more times


Quiz
 Given a 9 digit zip code will the pattern
match?
Yes

'…..-…. '
Yes

'.?-.+'
Yes

'.*-.+'
Quantifies
 {count} defines an exact repetition count of
the prior object

 A{5} matches AAAAA and AAAAAAAA

 Zip code could be defined as .{5}-.{4}


More on Quantifiers …
 {m,n} defines an exact repetition count of the
prior object
 m minimum number of matches
 n maximum number of matches
A{1,5} matches A and AAA and AAAAAA

 {m,} defines a repetition count of m or more


Sample Patterns
Does the pattern 216-.{3,}-.{4,4} match?

Yes
Yes
216-588-5023
216-5888888-5023
Yes – skip the 011 and you
011-216-588-5023 get the match

011-216--58-5-23 Yes – . Matches anything

216-5888-888-5023
Yes – {3,) eats all charters to
the -5023
Anchors
 Anchors
^ Start of line
$ End of line
No Yes
Does ^…..-….$ match?
44313-1234 441313-1234 4431--abcd
44313-1234c a44313-1234
Yes
No No
Alternate and Grouping
[char] character list
| alternation (boolean or operator)
() group subexpression
Character Expression
[abc] defines a list of characters that can be
matched to a single a or b or c

Our zip code match can be expressed as

[0123456789]{5}-[0123456789]{4}
4 of the
Any 5 of the A dash prior pattern
number prior
pattern
Any
number
More on alternation
[9]|[1] 9 or 1

What is this string matching?


Group 1 Group 2 Order makes no difference

([8]{1})|([9]{1})[1234567890]{2}-[1234567890]
{3}-[1234567890]{4}

Or the
groups 900-555-5125
Or
823-123-4567
Ranges
[a-z] matches any letter from a to z
[0-9] matches any digit from 0 to 9
Order is important – must start with lower
and go to higher
[a-zA-Z] matches any letter from a to Z

Does [a0-3a-zA-Z] match


1A 45a ?*34 9
Yes Yes Yes No
Predefined Character Classes
(restricted to character lists)

[:alnum:] alphanumeric characters


[:alpha:] alphabetic characters
[:blank:] Blank space characters
[:cntrl:] control characters (non-printing)
[:digit:] any numeric
[:lower:] lower case alphabetic characters
[:print:] printable characters
[:punct:] punctuation characters
[:space:] space and non-printing (newline...)
[:upper:] upper case alphabetic characters
New Zip Code
Does [[:digit:]]{5}-[[:digit:]]{4} match
Yes Yes No

44313-1234 441313-1234 4431--abcd


44313-1234c a44313-1234
Yes Yes
Groups
What is?:

(^[[:digit:]]{5})(-[[:digit:]]{4})?$

Note:
[^[:digit:]]
This is a special case of ^ as it is in a character
list
When ^ is not at the beginning of a
string it negates
“Greediness”
Regular expression operators are greedy, they match
the maximum set
[a-z]+ will match the entire string
abcedef

[a-z]+? – add ? after the quantity, and you get the lazy
or minimum match

Will match just the letter a in 123abce


Greedy and Eager
“Eagerness” are not the same

Oracle’s regular expression parser is eager.


What that means is that the parser will stop
looking for a match once one is found in an
alternation group.

e.g. given pattern ‘regex|regex not’


And text ‘regex not’

Eager will return ‘regex’


What are these?
^[[:digit:]]{3}-[[:digit:]]{2}-[[:digit:]]{4,4}$

Matches SSN 302-77-1234

[-+]?([0-9]+.?[0-9]*|.[0-9]+)([eE][-+]?[0-9]+)?

Matches fp number +23.344E+12


SELECT REGEXP_REPLACE(
'23.2323e+12','[-+]?([0-9]+.?[0-9]*|.[0-9]+)([eE][-+]?[0-9]+)?','match')
FROM dual;
Amtrust Example Matched
10-MAR-2008

What’s the functional difference between


[ORA|SP2|EXP|IMP|KUP|DBV|LCD|QSM|RMAN|LRM|
LFI|PLS|AMD|TNS|NNC|NNO|NNL|NPL|NNF|NMP|
NCR|NZE|MOD|O2F|O2I|O2U|PCB|PCF|PCC|SQL|
EPC]-[0-9][0-9][0-9][0-9]|Sql\*Loader-[0-9][0-9]|
SQL\*Loader-[0-9][0-9] Escape *
and
(ORA|SP2|EXP|IMP|KUP|DBV|LCD|QSM|RMAN|LRM|
LFI|PLS|AMD|TNS|NNC|NNO|NNL|NPL|NNF|NMP|
NCR|NZE|MOD|O2F|O2I|O2U|PCB|PCF|PCC|SQL|
EPC)-[0-9][0-9][0-9][0-9]|Sql\*Loader-[0-9][0-9]|
SQL\*Loader-[0-9][0-9]
Does not match
10-MAR-2008
Amtrust Example
What’s the functional difference between
[ORA|SP2|EXP|IMP|KUP|DBV|LCD|QSM|RMAN|LRM|
LFI|PLS|AMD|TNS|NNC|NNO|NNL|NPL|NNF|NMP|
NCR|NZE|MOD|O2F|O2I|O2U|PCB|PCF|PCC|SQL|
EPC]-[0-9][0-9][0-9][0-9]|Sql\*Loader-[0-9][0-9]|
SQL\*Loader-[0-9][0-9]
and
(ORA|SP2|EXP|IMP|KUP|DBV|LCD|QSM|RMAN|LRM|
LFI|PLS|AMD|TNS|NNC|NNO|NNL|NPL|NNF|NMP|
NCR|NZE|MOD|O2F|O2I|O2U|PCB|PCF|PCC|SQL|
EPC)-[0-9][0-9][0-9][0-9]|Sql\*Loader-[0-9][0-9]|
SQL\*Loader-[0-9][0-9]
Group instead
of Character
List
Groups
Specified with ( )
Oracle supports up to 9
\1 is defined by the first open (
\2 is defined by the second open (
\3 is defined by the third open (
And so on through to \9

Once defined they can be used later in a


pattern – back reference
Back References
Round bracket create groups which can be referenced
later in the same pattern or in a replacement pattern.
Must start with a capital Opening tag again,
letter possibly followed must match exactly the
HTML tag: by additional letters text in the first parens
<([A-Z][A-Z0-9]*)\b*([^>])*>.*</\1>
Optional group can
occur 0 or more times
these are attributes

A blank followed by
zero or more letters not
>
Back References
Given pattern (([0-9]){3}-([0-9]){3}-([0-9]{4}))

And string “My number, 216-588-5023, is working”

What are:
\1 = 216-588-5023
\2 = 6
\3 = 8
\4 = 5023
Oracle Reg Expression Functions
10g provides these regular expression analogs for
existing string functions

REGEXP_LIKE
REGEXP_INSTR
REGEXP_SUBSTR
REGEXP_REPLACE

Source text can be CHAR, VARCHAR2, NCHAR,


NVARCHAR2, CLOB or NCLOB
Oracle 10g Regular Expressions
REGEXP_LIKE returns TRUE if a match found,
otherwise FALSE

REGEXP_INSTR returns character position of first


match, otherwise 0

REGEXP_SUBSTR returns first matched string, null


if no matches found

REGEXP_REPLACE replaces all matched strings,


returns original string if no matches found
Function Match Parameters
i case-insensitive matching

c Case sensitive matching, the default

n allow the . (period) to match new line char

m treats the source as multiple lines so that ^ and $


anchors work on each line
x ignores white space characters
REGEXP_LIKE
REGEXP_LIKE(
source,
pattern, -- Match pattern
match_parameters -- Function parameters
) RETURN BOOLEAN
REGEXP_LIKE
u31125@DEDW> SELECT 'Yes this is a match'
2 FROM dual
3 WHERE regexp_like('44313-2345',
4 '[[:digit:]]{5}-[[:digit:]]{4}'
5 );

'YESTHISISAMATCH'
-------------------
Yes this is a match
REGEXP_INSTR
REGEXP_INSTR(
source,
pattern,
start_position, --start searching from here
occurrence, --which occurrence should be ret
return_position, -- 0 start of occurrence
-- 1 end of occurrence
match_parameters
) RETURN NUMBER
REGEXP_INSTR
U31125@DEDW>SELECT
2 REGEXP_INSTR(
3 'The quick red fox jumped over the lazy brown dog.',
4 'quick',
5 1, --start searching from here
6 1, --which occurrence should be ret
7 0, -- 0 start of occurrence
8 -- 1 end of occurrence
9 NULL -- match_parameters
10 ) match_pos
11 FROM dual;

MATCH_POS
----------
5

1 row selected.
REGEXP_INSTR
U31125@DEDW>SELECT
2 REGEXP_INSTR(
3 'The quick red fox jumped over the lazy brown dog.',
4 'quick',
5 1, --start searching from here
6 1, --which occurrence should be ret
7 1, -- 0 start of occurrence
8 -- 1 end of occurrence
9 NULL -- match_parameters
10 ) match_pos
11 FROM dual;

MATCH_POS
----------
10

1 row selected.
REGEXP_REPLACE
REGEXP_REPLACE(
source,
pattern,
rep_string, -- replaces matched text
position, -- search start position
occurrence, -- match occurrence
-- 0 replaces all matches
match_parameters
) RETURN VARCHAR2
REGEXP_REPLACE
 Compress two or more spaces
SELECT
REGEXP_REPLACE(
'500 Oracle Parkway, Redwood Shores, CA',
'( ){2,}',
' '
) "REGEXP_REPLACE"
FROM DUAL;

REGEXP_REPLACE

------------------------------------------
500 Oracle Parkway, Redwood Shores, CA
REGEXP_REPLACE
Provide better error message

SELECT REGEXP_REPLACE (
‘Error on employee <s1> whose name is <s2>',
'(.*)<s1>(.*)<s2>',
'\1A31124\2RGravenstein'
) AS "REGEXP_REPLACE"
FROM DUAL;

REGEXP_REPLACE

--------------------------------------------------
Error on employee A31124 whose name is RGravenstein

1 row selected.
REGEXP_SUBSTR
REGEXP_SUBSTR(
source,
pattern,
position, -- search start position
occurrence, -- which occurrence should
-- be found and sub-stringed
match_parameter
) RETURN VARCHAR2
Parsing Example
set serveroutput on First non-colon
DECLARE found
x VARCHAR2(20);
y VARCHAR2(20);
c VARCHAR2(40) := '1:3,4:6,8:10,3:4,7:6,11:12';
BEGIN
x := REGEXP_SUBSTR(c,'[^:]+', 1, 1); Starting after the
y := REGEXP_SUBSTR(c,'[^,]+', 1, 2); first non-comma
through to the
dbms_output.put_line('<'||x||'-'||y||'>'); second comma
END;
/
<1-4:6>
PL/SQL procedure successfully completed.
Where to use Regular Expression
 Validation of
Lots of examples
 e-mail addresses can be found on
the internet
 Credit card numbers
 SSN
 …
 Complicated string parsing
 Where the standard Oracle functions LIKE,
INSTR, REPLACE and SUBSTR can’t do the
job without a lot of work
Performance Considerations
 Regular expressions are more compute
intensive than non-regular expression
equivalents.
 Most database processes are IO bound and
therefore some additional cpu load is
normally not an issue
References The best reference

http://www.regular-expressions.info/reference.html

http://www.oracle.com/technology/oramag/webcolumns/2003/techarticles/ri
schert_regexp_pt1.html

http://www.psoug.org/reference/regexp.html

http://www.dba-oracle.com/t_regular_expressions.htm

http://rootshell.be/~yong321/computer/OracleRegExp.html

http://www.databasejournal.com/features/oracle/article.php/3501826

John Garmany – “Being Regular with Regular Expresssions” Collaborate08

You might also like