0% found this document useful (0 votes)

5 views49 pages

1.regular Expressions

Uploaded by

mochammad.agri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views49 pages

1.regular Expressions

Uploaded by

mochammad.agri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 49

Session id: 40105

Introducing
Oracle Regular Expressions
Jonathan Gennick, O'Reilly & Associates
Peter Linsley, Oracle Corporation
What are Regular
Expressions?
 A language, or syntax, you can use to describe
patterns in text
 Example: [0-9]{3}-[0-9]{4}
 That which you can describe, you can find and
manipulate
 Unix ed, grep, perl, and now everywhere!
What are Regular
Expressions?
 Follow the script for build database and table
– CREATE DATABASE RE
– CREATE TABLE RE (DESCRIPTION VARCHAR2(6)
– INSERT INTO RE VALUES (‘652’),(‘217’),
(‘113');
Why Describe Patterns?

Humans have long worked with patterns:

– Postal and email addresses
– URLs
– Phone numbers
Often it’s not the data that’s important, but the
pattern:
– Bioinformatics
– Validate format of URLs and email addresses
– Correct formatting of phone numbers
Pre-Oracle Database 10g

Find parks with acreage in their descriptions:

SELECT *
FROM park
WHERE description LIKE '%acre%';

Finds '217-acre' and '27 acres', but also ‘few acres’,

‘more acres than all other parks’, 'the location of a
massacre', etc.
Pre-Oracle Database 10g cont.
Pattern matching with LIKE
– Limited to only two operators: % and _
OWA_PATTERN
– No support for alternation, ASCII only, relatively
poor performance
Non-native solutions
– External Procedures
– Difficult to deploy, maintain, and support
Client based solutions
– Pull all that data down across the network
Oracle Database 10g

Four regular expression functions

– REGEXP_LIKE does pattern match?
– REGEXP_INSTR where does it match?
– REGEXP_SUBSTR what does it match?
– REGEXP_REPLACE replace what matched.
POSIX Extended Regular Expressions
– UNIX Regular Expressions
– Backreference support added
– Longest match not supported
REGEXP_LIKE

Determine whether a pattern exists in a string

Revisiting the acreage problem:
SELECT *
FROM park
WHERE REGEXP_LIKE(description,
'[0-9]+(-| )acre');
Finds '217-acre' and '27 acres'
REJECTS ‘few acres’, ‘more acres than all
other parks’, 'the location of a massacre', etc.
Useful for Constraints

Filter allowable data with check constraint

Only allow alphabetical characters:
CREATE TABLE t1 (c1 VARCHAR2(20),
CHECK (REGEXP_LIKE(c1,
'^[[:alpha:]]+$')));

INSERT INTO t1 VALUES ('newuser');

 1 row created.

INSERT INTO t1 VALUES ('newuser1');

 ORA-02290: check constraint
violated
Metacharacters
Operator Description
. match any character
a? match 'a' zero or one time
a* match 'a' zero or more times
a+ match 'a' one or more times
a|b match either 'a' or 'b'
a{m,n} match 'a' between m and n times
[abc] match either 'a' or 'b' or 'c'
(abc) match group 'abc'
\n match nth group
[:cc:] match character class
[.ce.] match collation element
[=ec=] match equivalence class
REGEXP_INSTR
Find out where a match occurs:

SELECT REGEXP_INSTR(description,
'[0-9]+(-| )acre')
FROM park;

REGEXP_INSTR(DESCRIPTION,'[0-9]+…
---------------------------------
6
20
0
…
REGEXP_SUBSTR
Determine what text matched:

SELECT REGEXP_SUBSTR(description,
'[0-9]+(-| )acre')
FROM park;

REGEXP_SUBSTR(DESCRIPT
----------------------
217-acre
27 acre
…
REGEXP_SUBSTR Cont
 To extract just the acreage value:

SELECT REGEXP_SUBSTR(
REGEXP_SUBSTR(description,
'[0-9]+(-| )acre'),'[0-9]+')
FROM park;

REGEXP_SUBSTR(REGEXP
--------------------
217
27
REGEXP_REPLACE

Convert acres to hectares:

UPDATE park
SET description = REGEXP_REPLACE(
description,'([0-9]+)(-| )acre',
TO_CHAR(0.4047 * TO_NUMBER(
REGEXP_SUBSTR(
REGEXP_SUBSTR(description,
'[0-9]+(-| )acre'),'[0-9]+')))
|| '\2' || 'hectare');
REGEXP_REPLACE Cont.
This 217-acre park is wonderful.

1 2
UPDATE park
SET description = REGEXP_REPLACE(
description,'([0-9]+)(-| )acre',
TO_CHAR(0.4047 * TO_NUMBER(
REGEXP_SUBSTR(
REGEXP_SUBSTR(description,
'[0-9]+(-| )acre'),'[0-9]+')))
|| '\2' || 'hectare');
REGEXP_REPLACE Cont.
This 217-acre park is wonderful.
217-acre
217
217 * 0.4047 = 87.8199
87.8199\2hectare
87.8199-hectare
This 87.8199-hectare park is wonderful.

Oracle Regular
Expressions
Performance

Pattern matching can be complex

– Need to compile to state machine
– Lex and parse
– Examine all possible branches until match found
Compiled once per statement
– Can be faster than LIKE for complex scenarios
– Usually faster than PL/SQL equivalent
ZIP code checking 5 times faster
Performance Cont.

Some poorly-performing expressions:

– 'a{2}' will be slower than 'aa'
– '.*b' on input that doesn't contain a 'b' can
also be quite time-consuming

Mastering Regular Expressions

By Jeffrey Friedl

Chapter 6, Crafting an Efficient Expression

Using with Indexes

Use function-based indexes:

CREATE INDEX acre_ind
ON park (REGEXP_SUBSTR(
REGEXP_SUBSTR(description,
'[0-9]+(-| )acre'),'[0-9]+'));
To support regular expression queries:
SELECT * FROM park
WHERE REGEXP_SUBSTR(REGEXP_SUBSTR(description,
'[0-9]+(-| )acre'),'[0-9]+') = 217;
Using with Views

Hide the complexity from users:

CREATE VIEW park_acreage as
SELECT park_name,
REGEXP_SUBSTR(
REGEXP_SUBSTR(
description,
'[0-9]+(-| )acre'),
'[0-9]+') acreage
FROM park;
Using with PL/SQL

REGEXP_LIKE acts as a Boolean function in

PL/SQL:
IF REGEXP_LIKE(description,
'[0-9]+(-| )acre') THEN
acres := REGEXP_SUBSTR(
REGEXP_SUBSTR(description,
'[0-9]+(-| )acre'),'[0-9]+');
...
All other functions act identically in PL/SQL
and SQL.
Longest Match vs Greediness

Greediness = each element matches as much

as possible. For example:

SELECT REGEXP_SUBSTR(
'In the beginning','.+[[:space:]]')
FROM dual;
 In the
Longest Match vs Greediness

Longest match = find the variations resulting

in the greatest number of matching
characters:
 SELECT REGEXP_SUBSTR('bbb','b|bb') FROM
dual;
 b
 SELECT REGEXP_SUBSTR('bbb','bb|b') FROM
dual;
 bb
Optional Parameters

All but REGEXP_LIKE take optional

parameters for starting position and
occurrence:
REGEXP_INSTR (source, pattern, start, occurrence, match)
REGEXP_SUBSTR (source, pattern, start, occurrence, match)
REGEXP_REPLACE(source, pattern, replace, start, occurrence,
match)

For example:
REGEXP_SUBSTR('description','[^[:space:]]+',1,10)
Match Parameter

All functions take an optional match

parameter:
– Is matching case sensitive?
– Does period (.) match newlines?
– Is the source string one line or many?
The match parameter comes last
Case-sensitivity

Case-insensitive search:
SELECT *
FROM park
WHERE REGEXP_LIKE(
description,
'[0-9]+(-| )acre',
'i');
Newline matching

INSERT INTO park VALUES ('Park 6',

'640' || CHR(10) || 'ACRE');

SELECT *
FROM park
WHERE REGEXP_LIKE(
description,
'[0-9]+.acre',
'in');
String anchors

INSERT INTO employee (surname)

VALUES ('Ellison' || CHR(10) ||
'Gennick');

SELECT * FROM
EMPLOYEE
WHERE REGEXP_LIKE(
Yes!
surname,'^Ellison');
String anchors

INSERT INTO employee (surname)

VALUES ('Ellison' || CHR(10) ||
'Gennick')

SELECT * FROM
EMPLOYEE
WHERE REGEXP_LIKE(
No!
surname,'^Gennick');
String anchors

INSERT INTO employee (surname)

VALUES ('Ellison' || CHR(10) ||
'Gennick')

SELECT * FROM
EMPLOYEE
WHERE REGEXP_LIKE(
Yes!
surname,'^Gennick','m');
Locale Support

Full Locale Support

– All character sets
– All languages
Case and accent insensitive searching
Linguistic range
Character classes
Collation elements
Equivalence classes
Character Sets and Languages

For example, you can search for Ukrainian

names beginning with Ґ and ending with к:
SELECT *
FROM employee
WHERE REGEXP_LIKE(
surname,
'^Ґ[[:alpha:]]*к$','n');
Case- and Accent-Insensitive
Searching
Respect for NLS settings:
ALTER SESSION
SET NLS_SORT = GENERIC_BASELETTER;
With this sort, case won't matter and an
expression such as:
REGEXP_INSTR(x,'resume')
will find "resume", "résumé", "Résume", etc.
Linguistic Range

Ranges respect NLS_SORT settings:

NLS_SORT=GERMAN a,b,c…z

[a-z]

NLS_SORT=GERMAN_CI a,A,b,B,c,C…z,Z
Character Classes

Character classes such as [:alpha:] and

[:digit:] encompass more than just Latin
characters.
For example, [:digit:] matches:
– Latin 0 through 9
– Arabic-Indic٠through ٩
– And more
Collation Elements

ALTER SESSION SET NLS_SORT=XSPANISH;

SELECT REGEXP_SUBSTR(
'El caballo, Chico come la tortilla.',
'[[:alpha:]]*[ch][[:alpha:]]*',
1,1,'i')
FROM dual;

caballo
Collation Elements

ALTER SESSION SET NLS_SORT=XSPANISH;

SELECT REGEXP_SUBSTR(
'El caballo, Chico come la tortilla.',
'[[:alpha:]]*[[.ch.]][[:alpha:]]*',
1,1,'i')
FROM dual;

Chico
Equivalence Classes

Ignore case and accents without changing

NLS_SORT:
REGEXP_INSTR(x,'r[[=e=]]sum[[=e=]]')
Finds 'resume', 'résumé', and 'rEsumE'
Conclusion

String searching and manipulation is at the

heart of a great many applications
Oracle Regular Expressions provide versatile
string manipulation in the database instead of
externalized in middle tier logic
They are Locale sensitive and support
character large objects
Available in both SQL and PL/SQL
Next Steps….
 Recommended sessions
– Session #40088 New SQL Capabilities
– Session #40202 Oracle HTML DB
 Recommended demos and/or hands-on labs
– Database Globalization Pod R
 See Your Business in Our Software
– Visit the DEMOgrounds for a customized architectural review, see
a customized demo with Solutions Factory, or receive a
personalized proposal. Visit the DEMOgrounds for more
information.
 Relevant web sites to visit for more information
– http://www.opengroup.org/onlinepubs/007904975/
basedefs/xbd_chap09.html
Shameless Plug

Oracle Regular Expressions

Pocket Reference

Jonathan Gennick
& Peter Linsley

Free! At the O'Reilly &

Associaties Booth

Oracle SQL Cheatsheet
No ratings yet
Oracle SQL Cheatsheet
2 pages
Core Java - Black Book by Nageswara Rao (2) An
No ratings yet
Core Java - Black Book by Nageswara Rao (2) An
32 pages
MySQL Quick Reference Sheet
100% (5)
MySQL Quick Reference Sheet
2 pages
Blazor - A Beginners Guide
100% (1)
Blazor - A Beginners Guide
121 pages
Lab 04: SQL Functions & Regular Expressions: CSC-252: Database Management System
No ratings yet
Lab 04: SQL Functions & Regular Expressions: CSC-252: Database Management System
6 pages
DBMS Lab Manual
No ratings yet
DBMS Lab Manual
43 pages
MYSQL REGEX Details
No ratings yet
MYSQL REGEX Details
13 pages
Module3 ALL
No ratings yet
Module3 ALL
17 pages
Module2 Chapter2
100% (1)
Module2 Chapter2
50 pages
Practical File: Database Management System
No ratings yet
Practical File: Database Management System
27 pages
Ora-10g Reg. Expressions
No ratings yet
Ora-10g Reg. Expressions
3 pages
Oow Getting Regular With Regular Expressions
100% (1)
Oow Getting Regular With Regular Expressions
62 pages
Oracle Searching Matching
No ratings yet
Oracle Searching Matching
27 pages
Mysql Queries
No ratings yet
Mysql Queries
9 pages
Apna Collage Final
50% (2)
Apna Collage Final
116 pages
PostgreSQL Notes For Professionals+
100% (1)
PostgreSQL Notes For Professionals+
72 pages
Lab 04
No ratings yet
Lab 04
5 pages
Your Oracle R - : Regular Expressions in An Oracle World
100% (1)
Your Oracle R - : Regular Expressions in An Oracle World
43 pages
PHP Topics
No ratings yet
PHP Topics
5 pages
SQL
No ratings yet
SQL
6 pages
SQL 2
No ratings yet
SQL 2
30 pages
MYSQL Reference Sheet
No ratings yet
MYSQL Reference Sheet
2 pages
Ism Practical File
No ratings yet
Ism Practical File
32 pages
Oracle REGEXP - REPLACE
No ratings yet
Oracle REGEXP - REPLACE
6 pages
סיכום של SQL
No ratings yet
סיכום של SQL
6 pages
Python Tutorial 28
No ratings yet
Python Tutorial 28
3 pages
Mysql Class 12 SN
No ratings yet
Mysql Class 12 SN
3 pages
SQL
No ratings yet
SQL
28 pages
Mysql Operations
No ratings yet
Mysql Operations
7 pages
ETL SQL Training Wk2 Where
No ratings yet
ETL SQL Training Wk2 Where
44 pages
Email Validation in SQL
No ratings yet
Email Validation in SQL
1 page
Session 4 BIZ (After) 2
No ratings yet
Session 4 BIZ (After) 2
54 pages
OracleSQLCode190708 2
No ratings yet
OracleSQLCode190708 2
58 pages
Postgresql - List of Commands: August 20, 2019
No ratings yet
Postgresql - List of Commands: August 20, 2019
54 pages
String 1
No ratings yet
String 1
22 pages
MySQL String Functions
No ratings yet
MySQL String Functions
17 pages
Standard SQL Functions Cheat Sheet Letter
No ratings yet
Standard SQL Functions Cheat Sheet Letter
2 pages
String Manipulation and Regular Expressions
No ratings yet
String Manipulation and Regular Expressions
40 pages
DMA-chapter No2
No ratings yet
DMA-chapter No2
35 pages
CSE302 Lab04
No ratings yet
CSE302 Lab04
3 pages
SQL Notes1
No ratings yet
SQL Notes1
19 pages
SQL Notes
No ratings yet
SQL Notes
25 pages
Dbms Module SQL
No ratings yet
Dbms Module SQL
79 pages
Data Types: Filtering, Functions, Subqueries
No ratings yet
Data Types: Filtering, Functions, Subqueries
3 pages
SQL Like Opt.
No ratings yet
SQL Like Opt.
3 pages
Oracle SQL Cheatsheet
No ratings yet
Oracle SQL Cheatsheet
2 pages
ESQUEMAy SINTAXISv 3 Eng
No ratings yet
ESQUEMAy SINTAXISv 3 Eng
2 pages
PHP String and Regular Expressions
No ratings yet
PHP String and Regular Expressions
40 pages
Session 2 BIZ
No ratings yet
Session 2 BIZ
58 pages
SQL Functions
No ratings yet
SQL Functions
18 pages
SQL Notes
No ratings yet
SQL Notes
10 pages
Using Regular Expressions With PHP
No ratings yet
Using Regular Expressions With PHP
6 pages
Chaper No.2 Ravindra Babasaheb Nagare DMA
No ratings yet
Chaper No.2 Ravindra Babasaheb Nagare DMA
34 pages
What Are Regular Expressions?//: SELECT Statements... WHERE Fieldname REGEXP 'Pattern'
No ratings yet
What Are Regular Expressions?//: SELECT Statements... WHERE Fieldname REGEXP 'Pattern'
6 pages
Handy Mysql Commands Description Command
No ratings yet
Handy Mysql Commands Description Command
3 pages
Standard SQL Functions Cheat Sheet A4
No ratings yet
Standard SQL Functions Cheat Sheet A4
2 pages
DBMS
No ratings yet
DBMS
26 pages
Data Structures Foundation-2021 Batch - Class Notes
No ratings yet
Data Structures Foundation-2021 Batch - Class Notes
208 pages
MySQL String Methods
No ratings yet
MySQL String Methods
7 pages
How To Customize PSPad For MX
No ratings yet
How To Customize PSPad For MX
5 pages
CS702 Handouts Lecture 1
No ratings yet
CS702 Handouts Lecture 1
5 pages
Functions
No ratings yet
Functions
40 pages
Viva Voice Daa
No ratings yet
Viva Voice Daa
6 pages
Cse2006 Programming-In-java LP 1.0 8 Cse2006-Programming-In-java LP 1.0 1 Programming in Java
No ratings yet
Cse2006 Programming-In-java LP 1.0 8 Cse2006-Programming-In-java LP 1.0 1 Programming in Java
4 pages
Programming Languages - Ananya
No ratings yet
Programming Languages - Ananya
22 pages
Android App Development Tools
No ratings yet
Android App Development Tools
2 pages
Chapter - 5
No ratings yet
Chapter - 5
51 pages
Fortran Tutorial 6
No ratings yet
Fortran Tutorial 6
9 pages
Oop ASS 2
No ratings yet
Oop ASS 2
9 pages
Architecture Java Runtime Environment
No ratings yet
Architecture Java Runtime Environment
12 pages
DLD Lecture 6
No ratings yet
DLD Lecture 6
38 pages
Advanced Topics in Types and Programming Languages 1st Edition Benjamin C. Pierce Download
No ratings yet
Advanced Topics in Types and Programming Languages 1st Edition Benjamin C. Pierce Download
64 pages
Using The UVM Libraries With Questa Verification Horizons BLOG
No ratings yet
Using The UVM Libraries With Questa Verification Horizons BLOG
11 pages
Non-Divisible Subset - ENSAH-IC-001 1573745597 Question - Contests - HackerRank
No ratings yet
Non-Divisible Subset - ENSAH-IC-001 1573745597 Question - Contests - HackerRank
4 pages
NN Tool Example
No ratings yet
NN Tool Example
3 pages
Encapsulation: Jin L.C. Guo
No ratings yet
Encapsulation: Jin L.C. Guo
45 pages
Sorting Algorithms Sorting Algorithms: Biostatistics 615/815
No ratings yet
Sorting Algorithms Sorting Algorithms: Biostatistics 615/815
43 pages
SPCC Oral Questions
No ratings yet
SPCC Oral Questions
10 pages
C-Mex Training: Control Engineering Laboratory University of Indonesia
No ratings yet
C-Mex Training: Control Engineering Laboratory University of Indonesia
19 pages
CMPE 011 Topic 1
No ratings yet
CMPE 011 Topic 1
58 pages
1 s2.0 S0045790623000320 Main
No ratings yet
1 s2.0 S0045790623000320 Main
15 pages
Activity2 Exploring EDA Playground
No ratings yet
Activity2 Exploring EDA Playground
4 pages
CDL in CDS
No ratings yet
CDL in CDS
14 pages
CSE 4108 Lab10 1B
No ratings yet
CSE 4108 Lab10 1B
3 pages
C++ Primer: CSE225: Data Structures and Algorithms
No ratings yet
C++ Primer: CSE225: Data Structures and Algorithms
11 pages
Mridul Manocha SDEResume 2023
No ratings yet
Mridul Manocha SDEResume 2023
1 page
Beginning C# and .NET
From Everand
Beginning C# and .NET
Benjamin Perkins
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
DBMS Lab Manual
From Everand
DBMS Lab Manual
Jitendra Patel
1.5/5 (3)
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet

1.regular Expressions

Uploaded by

1.regular Expressions

Uploaded by

Session id: 40105

Humans have long worked with patterns:

Find parks with acreage in their descriptions:

Finds '217-acre' and '27 acres', but also ‘few acres’,

Four regular expression functions

Determine whether a pattern exists in a string

Filter allowable data with check constraint

INSERT INTO t1 VALUES ('newuser');

INSERT INTO t1 VALUES ('newuser1');

Convert acres to hectares:

Convert acres to hectares:

Pattern matching can be complex

Some poorly-performing expressions:

Mastering Regular Expressions

Chapter 6, Crafting an Efficient Expression

Use function-based indexes:

Hide the complexity from users:

REGEXP_LIKE acts as a Boolean function in

Greediness = each element matches as much

Longest match = find the variations resulting

All but REGEXP_LIKE take optional

All functions take an optional match

INSERT INTO park VALUES ('Park 6',

INSERT INTO employee (surname)

INSERT INTO employee (surname)

INSERT INTO employee (surname)

Full Locale Support

For example, you can search for Ukrainian

Ranges respect NLS_SORT settings:

Character classes such as [:alpha:] and

ALTER SESSION SET NLS_SORT=XSPANISH;

ALTER SESSION SET NLS_SORT=XSPANISH;

Ignore case and accents without changing

String searching and manipulation is at the

Oracle Regular Expressions

Free! At the O'Reilly &

You might also like