0% found this document useful (0 votes)
5 views49 pages

1.regular Expressions

Uploaded by

mochammad.agri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views49 pages

1.regular Expressions

Uploaded by

mochammad.agri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 49

Session id: 40105

Introducing
Oracle Regular Expressions
Jonathan Gennick, O'Reilly & Associates
Peter Linsley, Oracle Corporation
What are Regular
Expressions?
 A language, or syntax, you can use to describe
patterns in text
 Example: [0-9]{3}-[0-9]{4}
 That which you can describe, you can find and
manipulate
 Unix ed, grep, perl, and now everywhere!
What are Regular
Expressions?
 Follow the script for build database and table
– CREATE DATABASE RE
– CREATE TABLE RE (DESCRIPTION VARCHAR2(6)
– INSERT INTO RE VALUES (‘652’),(‘217’),
(‘113');
Why Describe Patterns?

Humans have long worked with patterns:


– Postal and email addresses
– URLs
– Phone numbers
Often it’s not the data that’s important, but the
pattern:
– Bioinformatics
– Validate format of URLs and email addresses
– Correct formatting of phone numbers
Pre-Oracle Database 10g

Find parks with acreage in their descriptions:

SELECT *
FROM park
WHERE description LIKE '%acre%';

Finds '217-acre' and '27 acres', but also ‘few acres’,


‘more acres than all other parks’, 'the location of a
massacre', etc.
Pre-Oracle Database 10g cont.
Pattern matching with LIKE
– Limited to only two operators: % and _
OWA_PATTERN
– No support for alternation, ASCII only, relatively
poor performance
Non-native solutions
– External Procedures
– Difficult to deploy, maintain, and support
Client based solutions
– Pull all that data down across the network
Oracle Database 10g

Four regular expression functions


– REGEXP_LIKE does pattern match?
– REGEXP_INSTR where does it match?
– REGEXP_SUBSTR what does it match?
– REGEXP_REPLACE replace what matched.
POSIX Extended Regular Expressions
– UNIX Regular Expressions
– Backreference support added
– Longest match not supported
REGEXP_LIKE

Determine whether a pattern exists in a string


Revisiting the acreage problem:
SELECT *
FROM park
WHERE REGEXP_LIKE(description,
'[0-9]+(-| )acre');
Finds '217-acre' and '27 acres'
REJECTS ‘few acres’, ‘more acres than all
other parks’, 'the location of a massacre', etc.
Useful for Constraints

Filter allowable data with check constraint


Only allow alphabetical characters:
CREATE TABLE t1 (c1 VARCHAR2(20),
CHECK (REGEXP_LIKE(c1,
'^[[:alpha:]]+$')));

INSERT INTO t1 VALUES ('newuser');


 1 row created.

INSERT INTO t1 VALUES ('newuser1');


 ORA-02290: check constraint
violated
Metacharacters
Operator Description
. match any character
a? match 'a' zero or one time
a* match 'a' zero or more times
a+ match 'a' one or more times
a|b match either 'a' or 'b'
a{m,n} match 'a' between m and n times
[abc] match either 'a' or 'b' or 'c'
(abc) match group 'abc'
\n match nth group
[:cc:] match character class
[.ce.] match collation element
[=ec=] match equivalence class
REGEXP_INSTR
Find out where a match occurs:

SELECT REGEXP_INSTR(description,
'[0-9]+(-| )acre')
FROM park;

REGEXP_INSTR(DESCRIPTION,'[0-9]+…
---------------------------------
6
20
0

REGEXP_SUBSTR
Determine what text matched:

SELECT REGEXP_SUBSTR(description,
'[0-9]+(-| )acre')
FROM park;

REGEXP_SUBSTR(DESCRIPT
----------------------
217-acre
27 acre

REGEXP_SUBSTR Cont
 To extract just the acreage value:

SELECT REGEXP_SUBSTR(
REGEXP_SUBSTR(description,
'[0-9]+(-| )acre'),'[0-9]+')
FROM park;

REGEXP_SUBSTR(REGEXP
--------------------
217
27
REGEXP_REPLACE

Convert acres to hectares:


UPDATE park
SET description = REGEXP_REPLACE(
description,'([0-9]+)(-| )acre',
TO_CHAR(0.4047 * TO_NUMBER(
REGEXP_SUBSTR(
REGEXP_SUBSTR(description,
'[0-9]+(-| )acre'),'[0-9]+')))
|| '\2' || 'hectare');
REGEXP_REPLACE Cont.

Convert acres to hectares:


UPDATE park
SET description = REGEXP_REPLACE(
description,'([0-9]+)(-| )acre',
TO_CHAR(0.4047 * TO_NUMBER(
REGEXP_SUBSTR(
REGEXP_SUBSTR(description,
'[0-9]+(-| )acre'),'[0-9]+')))
|| '\2' || 'hectare');
REGEXP_REPLACE Cont.
This 217-acre park is wonderful.

UPDATE park
SET description = REGEXP_REPLACE(
description,'([0-9]+)(-| )acre',
TO_CHAR(0.4047 * TO_NUMBER(
REGEXP_SUBSTR(
REGEXP_SUBSTR(description,
'[0-9]+(-| )acre'),'[0-9]+')))
|| '\2' || 'hectare');
REGEXP_REPLACE Cont.
This 217-acre park is wonderful.
217-acre

UPDATE park
SET description = REGEXP_REPLACE(
description,'([0-9]+)(-| )acre',
TO_CHAR(0.4047 * TO_NUMBER(
REGEXP_SUBSTR(
REGEXP_SUBSTR(description,
'[0-9]+(-| )acre'),'[0-9]+')))
|| '\2' || 'hectare');
REGEXP_REPLACE Cont.
This 217-acre park is wonderful.
217-acre
217

UPDATE park
SET description = REGEXP_REPLACE(
description,'([0-9]+)(-| )acre',
TO_CHAR(0.4047 * TO_NUMBER(
REGEXP_SUBSTR(
REGEXP_SUBSTR(description,
'[0-9]+(-| )acre'),'[0-9]+')))
|| '\2' || 'hectare');
REGEXP_REPLACE Cont.
This 217-acre park is wonderful.
217-acre
217
217 * 0.4047 = 87.8199

UPDATE park
SET description = REGEXP_REPLACE(
description,'([0-9]+)(-| )acre',
TO_CHAR(0.4047 * TO_NUMBER(
REGEXP_SUBSTR(
REGEXP_SUBSTR(description,
'[0-9]+(-| )acre'),'[0-9]+')))
|| '\2' || 'hectare');
REGEXP_REPLACE Cont.
This 217-acre park is wonderful.
217-acre
217
217 * 0.4047 = 87.8199
87.8199\2hectare

UPDATE park
SET description = REGEXP_REPLACE(
description,'([0-9]+)(-| )acre',
TO_CHAR(0.4047 * TO_NUMBER(
REGEXP_SUBSTR(
REGEXP_SUBSTR(description,
'[0-9]+(-| )acre'),'[0-9]+')))
|| '\2' || 'hectare');
REGEXP_REPLACE Cont.
This 217-acre park is wonderful.
217-acre
217
217 * 0.4047 = 87.8199
87.8199\2hectare
87.8199-hectare

1 2
UPDATE park
SET description = REGEXP_REPLACE(
description,'([0-9]+)(-| )acre',
TO_CHAR(0.4047 * TO_NUMBER(
REGEXP_SUBSTR(
REGEXP_SUBSTR(description,
'[0-9]+(-| )acre'),'[0-9]+')))
|| '\2' || 'hectare');
REGEXP_REPLACE Cont.
This 217-acre park is wonderful.
217-acre
217
217 * 0.4047 = 87.8199
87.8199\2hectare
87.8199-hectare
This 87.8199-hectare park is wonderful.

UPDATE park
SET description = REGEXP_REPLACE(
description,'([0-9]+)(-| )acre',
TO_CHAR(0.4047 * TO_NUMBER(
REGEXP_SUBSTR(
REGEXP_SUBSTR(description,
'[0-9]+(-| )acre'),'[0-9]+')))
|| '\2' || 'hectare');
D E M O N S T R A T I O N

Oracle Regular
Expressions
Performance

Pattern matching can be complex


– Need to compile to state machine
– Lex and parse
– Examine all possible branches until match found
Compiled once per statement
– Can be faster than LIKE for complex scenarios
– Usually faster than PL/SQL equivalent
ZIP code checking 5 times faster
Performance Cont.

Some poorly-performing expressions:


– 'a{2}' will be slower than 'aa'
– '.*b' on input that doesn't contain a 'b' can
also be quite time-consuming

Mastering Regular Expressions


By Jeffrey Friedl

Chapter 6, Crafting an Efficient Expression


Using with Indexes

Use function-based indexes:


CREATE INDEX acre_ind
ON park (REGEXP_SUBSTR(
REGEXP_SUBSTR(description,
'[0-9]+(-| )acre'),'[0-9]+'));
To support regular expression queries:
SELECT * FROM park
WHERE REGEXP_SUBSTR(REGEXP_SUBSTR(description,
'[0-9]+(-| )acre'),'[0-9]+') = 217;
Using with Views

Hide the complexity from users:


CREATE VIEW park_acreage as
SELECT park_name,
REGEXP_SUBSTR(
REGEXP_SUBSTR(
description,
'[0-9]+(-| )acre'),
'[0-9]+') acreage
FROM park;
Using with PL/SQL

REGEXP_LIKE acts as a Boolean function in


PL/SQL:
IF REGEXP_LIKE(description,
'[0-9]+(-| )acre') THEN
acres := REGEXP_SUBSTR(
REGEXP_SUBSTR(description,
'[0-9]+(-| )acre'),'[0-9]+');
...
All other functions act identically in PL/SQL
and SQL.
Longest Match vs Greediness

Greediness = each element matches as much


as possible. For example:

SELECT REGEXP_SUBSTR(
'In the beginning','.+[[:space:]]')
FROM dual;
 In the
Longest Match vs Greediness

Longest match = find the variations resulting


in the greatest number of matching
characters:
 SELECT REGEXP_SUBSTR('bbb','b|bb') FROM
dual;
 b
 SELECT REGEXP_SUBSTR('bbb','bb|b') FROM
dual;
 bb
Optional Parameters

All but REGEXP_LIKE take optional


parameters for starting position and
occurrence:
REGEXP_INSTR (source, pattern, start, occurrence, match)
REGEXP_SUBSTR (source, pattern, start, occurrence, match)
REGEXP_REPLACE(source, pattern, replace, start, occurrence,
match)

For example:
REGEXP_SUBSTR('description','[^[:space:]]+',1,10)
Match Parameter

All functions take an optional match


parameter:
– Is matching case sensitive?
– Does period (.) match newlines?
– Is the source string one line or many?
The match parameter comes last
Case-sensitivity

Case-insensitive search:
SELECT *
FROM park
WHERE REGEXP_LIKE(
description,
'[0-9]+(-| )acre',
'i');
Newline matching

INSERT INTO park VALUES ('Park 6',


'640' || CHR(10) || 'ACRE');

SELECT *
FROM park
WHERE REGEXP_LIKE(
description,
'[0-9]+.acre',
'in');
String anchors

INSERT INTO employee (surname)


VALUES ('Ellison' || CHR(10) ||
'Gennick');

SELECT * FROM
EMPLOYEE
WHERE REGEXP_LIKE(
Yes!
surname,'^Ellison');
String anchors

INSERT INTO employee (surname)


VALUES ('Ellison' || CHR(10) ||
'Gennick')

SELECT * FROM
EMPLOYEE
WHERE REGEXP_LIKE(
No!
surname,'^Gennick');
String anchors

INSERT INTO employee (surname)


VALUES ('Ellison' || CHR(10) ||
'Gennick')

SELECT * FROM
EMPLOYEE
WHERE REGEXP_LIKE(
Yes!
surname,'^Gennick','m');
Locale Support

Full Locale Support


– All character sets
– All languages
Case and accent insensitive searching
Linguistic range
Character classes
Collation elements
Equivalence classes
Character Sets and Languages

For example, you can search for Ukrainian


names beginning with Ґ and ending with к:
SELECT *
FROM employee
WHERE REGEXP_LIKE(
surname,
'^Ґ[[:alpha:]]*к$','n');
Case- and Accent-Insensitive
Searching
Respect for NLS settings:
ALTER SESSION
SET NLS_SORT = GENERIC_BASELETTER;
With this sort, case won't matter and an
expression such as:
REGEXP_INSTR(x,'resume')
will find "resume", "résumé", "Résume", etc.
Linguistic Range

Ranges respect NLS_SORT settings:

NLS_SORT=GERMAN a,b,c…z

[a-z]

NLS_SORT=GERMAN_CI a,A,b,B,c,C…z,Z
Character Classes

Character classes such as [:alpha:] and


[:digit:] encompass more than just Latin
characters.
For example, [:digit:] matches:
– Latin 0 through 9
– Arabic-Indic٠through ٩
– And more
Collation Elements

ALTER SESSION SET NLS_SORT=XSPANISH;


SELECT REGEXP_SUBSTR(
'El caballo, Chico come la tortilla.',
'[[:alpha:]]*[ch][[:alpha:]]*',
1,1,'i')
FROM dual;

caballo
Collation Elements

ALTER SESSION SET NLS_SORT=XSPANISH;


SELECT REGEXP_SUBSTR(
'El caballo, Chico come la tortilla.',
'[[:alpha:]]*[[.ch.]][[:alpha:]]*',
1,1,'i')
FROM dual;

Chico
Equivalence Classes

Ignore case and accents without changing


NLS_SORT:
REGEXP_INSTR(x,'r[[=e=]]sum[[=e=]]')
Finds 'resume', 'résumé', and 'rEsumE'
Conclusion

String searching and manipulation is at the


heart of a great many applications
Oracle Regular Expressions provide versatile
string manipulation in the database instead of
externalized in middle tier logic
They are Locale sensitive and support
character large objects
Available in both SQL and PL/SQL
Next Steps….
 Recommended sessions
– Session #40088 New SQL Capabilities
– Session #40202 Oracle HTML DB
 Recommended demos and/or hands-on labs
– Database Globalization Pod R
 See Your Business in Our Software
– Visit the DEMOgrounds for a customized architectural review, see
a customized demo with Solutions Factory, or receive a
personalized proposal. Visit the DEMOgrounds for more
information.
 Relevant web sites to visit for more information
– http://www.opengroup.org/onlinepubs/007904975/
basedefs/xbd_chap09.html
Shameless Plug

Oracle Regular Expressions


Pocket Reference

Jonathan Gennick
& Peter Linsley

Free! At the O'Reilly &


Associaties Booth

You might also like