02 Regular Expressions in Practical NLP 6-04

Uploaded by

idhitappu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views3 pages

02 Regular Expressions in Practical NLP 6-04

Uploaded by

idhitappu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 3

In recent [inaudible] these days there's

always a lot of talk of probabilistic

models and machine mining. But if you
actually look at large systems under the
hood what you'll almost always find is
that they also make quite a bit of use of
regular expressions in various places. And
for many tasks it turns out that regular
expressions are just a very practical and
capable way of specifying various kinds of
natural language patterns. I'm gonna show
you one example of this now by showing how
we use regular expressions for the English
tokenizer inside. Stanford and OP tools
such as the passiron part of speech tagger
or for the coranal P suite overall. Okay
here we are with the code for the Stanford
English [inaudible]. So what it is, is
it's a large determinate stick. Regular
expression. So what is written in is with
a tool called J-Flex. So J-Flex belongs to
a family of what are commonly called in
computer science, lexors, which is just
another word for tokenizers, which take a
sequence of characters and cut pieces one
token at a time off the front of it. So
that was the original lexor part of the
unix, and then flex. And then this is
J-Flex, which is a java compatable
version. Let's scroll down to where some
of the regular expressions are used to
define character classes. Often what you
find is that many of the regular
expressions aren't actually very
complicated, that they're really nothing
more than lists that are being put into
regular expressions by putting verticle
bars in between for alternation. And so,
for example, we see that in several places
here. So here we have one for abbreviated
months and here we have one for
abbreviated days of the week, and that
continues on for some of these other ones,
like American states and various other
kinds of person, name, title, acronym.
[inaudible] Down here. But lets go on a
little bit further to one that's a bit
more interesting than that. Okay. So,
here's one of phone numbers. This is the
kind of ill-documented regular expression
that's a little bit hard to actually get
your head around, but are much used in
practice. So, at the very top level of
this regular expression, things are
divided up by this alternation right here,
and. The right hand side of the
ordination, there's a [inaudible], where
the separator is being used as dots. And
so that one's separated out as consistent
use of dots, cuz otherwise it's easy for
the regular expression to go wrong and
also recognize various kinds of
[inaudible] numbers and other patterns.
And so that part of it is actually the
easier part. So we can have at the
beginning, optionally, the use of plus
signs, which are used in Europe and most
of the rest of the world as an
International prefix by county codes. And
then we. Have the country code here which
is just numbers of the range two to four.
And then all of that is optional. And then
after that. We've got a first set of
numbers, which can be the area code, the
dot then the second set of numbers, which
I guess historically is the exchange and
then finally the third set of numbers and
so these sets of numbers are then being
given a length. So this has to be between
three and four numbers, this has to be
three and five numbers and the area code
has to be two and four numbers and so
those ranges are chosen so that they'll
work with the phone numbers of a bunch of
the countries around the world. But if you
know well your international phone. You'll
realize there actually are some cases that
won't still be recognized by those. So
what then if we go. To the left-hand side
of the regular expression. It's
effectively doing the same thing but just
more complex. So that the first part of it
is again going to recognize. Things like
optional country codes, so you can see the
same piece over here. [sound].
[inaudible]. Country code. But it's
allowing in some other possibilities. So
here we've escaped [inaudible] so you can
actually sort of have some numbers that
are put inside of parentheses and further
on we have got this character class. We're
allowing a variety of separators apart
from period. So we can have dash which
again needs to be escaped. There can just
be a space or there can be a non breaking
space [inaudible] for a non braking space.
So overall this. Will allow it to
recognize a bunch of formats for phone
numbers. So it'll recognize almost all
American phone numbers, and generally does
pretty well with things like UK and
Australian phone numbers. If you want an
example of where it doesn't work, the
normal phone number format in France is
you just have pairs of digits with spaces
in between them. And that's not included
here. And the difficulty. Isn't sort of
writing a regular expression that matches
that. It's in this context of [inaudible].
Make their expression match her of
managing to write one, which wanted to alo
wrongly match various other things, such
as numbers that just appearing as a
sequence of numbers for some other reason.
Well I hope that's given you some idea of
the use of regular expressions and NLP
systems. If you polk around another NLP
system, I'm sure you'll find lots of other
examples. Commonly when people want to
match particular patterns, whether they be
patterns at the level of words or patterns
at the level of parts of speech, they can
just be very convenient and practical
methods to solve many practical tasks.

01 Regular Expressions 11-25
No ratings yet
01 Regular Expressions 11-25
5 pages
Natural Language Processing - Session 3 - Regular Expressions
No ratings yet
Natural Language Processing - Session 3 - Regular Expressions
39 pages
Module 2 Chap1
No ratings yet
Module 2 Chap1
92 pages
Network Security - 4.2 Reg Ex Primer
No ratings yet
Network Security - 4.2 Reg Ex Primer
3 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
Howto Regex PDF
No ratings yet
Howto Regex PDF
20 pages
Subtitle
No ratings yet
Subtitle
3 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
Introduction To Regular Expressions in Python - Transcript
No ratings yet
Introduction To Regular Expressions in Python - Transcript
4 pages
Solution-Assignment 1
No ratings yet
Solution-Assignment 1
5 pages
Module2 NLP BAD613B Notes
100% (1)
Module2 NLP BAD613B Notes
16 pages
Regular Expressions With C#
100% (2)
Regular Expressions With C#
12 pages
Regular Expression HOWTO: Guido Van Rossum and The Python Development Team
No ratings yet
Regular Expression HOWTO: Guido Van Rossum and The Python Development Team
18 pages
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
No ratings yet
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
18 pages
Mod 2
No ratings yet
Mod 2
49 pages
Howto Regex
No ratings yet
Howto Regex
17 pages
Lecture # 06
No ratings yet
Lecture # 06
27 pages
CPSC 388 - Compiler Design and Construction: Scanners - Regular Expressions
No ratings yet
CPSC 388 - Compiler Design and Construction: Scanners - Regular Expressions
20 pages
Regex
No ratings yet
Regex
24 pages
Regular Expression HOWTO: Guido Van Rossum and The Python Development Team
No ratings yet
Regular Expression HOWTO: Guido Van Rossum and The Python Development Team
20 pages
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
No ratings yet
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
18 pages
Python How To Regex
No ratings yet
Python How To Regex
19 pages
Module 1 - Part 3 Regex Fa
No ratings yet
Module 1 - Part 3 Regex Fa
30 pages
Regular Expression Python
No ratings yet
Regular Expression Python
23 pages
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
100% (1)
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
18 pages
03 Regular Expressions and Grammars Parser Generators 16102023 041542pm
No ratings yet
03 Regular Expressions and Grammars Parser Generators 16102023 041542pm
32 pages
45 The Matching Characters
No ratings yet
45 The Matching Characters
3 pages
Regular Expressions
No ratings yet
Regular Expressions
5 pages
Regex Quick - Reference
No ratings yet
Regex Quick - Reference
16 pages
Lecture 2
No ratings yet
Lecture 2
70 pages
NLP Module 2 - 1
No ratings yet
NLP Module 2 - 1
86 pages
Lecture 6 Re Basics
No ratings yet
Lecture 6 Re Basics
12 pages
2 - Python Strings
No ratings yet
2 - Python Strings
23 pages
NLP Chapter 5
No ratings yet
NLP Chapter 5
70 pages
Regular Expression Syntax
No ratings yet
Regular Expression Syntax
4 pages
Chapter 10
No ratings yet
Chapter 10
28 pages
Token, Lexemes and Regular Expression
No ratings yet
Token, Lexemes and Regular Expression
22 pages
Natural Language Processing 5
No ratings yet
Natural Language Processing 5
13 pages
Regular Expressions, Tok-Enization, Edit Distance
No ratings yet
Regular Expressions, Tok-Enization, Edit Distance
29 pages
Lec - 2. Scanning (Lexical Analysis) Part 1
No ratings yet
Lec - 2. Scanning (Lexical Analysis) Part 1
37 pages
W10A Full
No ratings yet
W10A Full
40 pages
L02 - Programming - RE PLC
No ratings yet
L02 - Programming - RE PLC
35 pages
2 Regular Expression
No ratings yet
2 Regular Expression
23 pages
(CSC221 2024-02-08) Regular Expressions
No ratings yet
(CSC221 2024-02-08) Regular Expressions
21 pages
Regular Expressions
No ratings yet
Regular Expressions
35 pages
JavaScript Regular Expressions - Sample Chapter
No ratings yet
JavaScript Regular Expressions - Sample Chapter
22 pages
Regular Expression Howto: A.M. Kuchling
No ratings yet
Regular Expression Howto: A.M. Kuchling
20 pages
Pattern Matching With Regular Expressions - by Zohaib Shahzad - The Startup - Medium
No ratings yet
Pattern Matching With Regular Expressions - by Zohaib Shahzad - The Startup - Medium
8 pages
3 Regular Expression
No ratings yet
3 Regular Expression
15 pages
Common Regular Expression 2
No ratings yet
Common Regular Expression 2
26 pages
Chapter 5 Css
No ratings yet
Chapter 5 Css
52 pages
Regular Expression
No ratings yet
Regular Expression
15 pages
Compiler Construction Notes
No ratings yet
Compiler Construction Notes
21 pages
Lecture 5
No ratings yet
Lecture 5
24 pages
Grammar for Fiction Writers Who Don't Like Grammar: A Quick Guide
From Everand
Grammar for Fiction Writers Who Don't Like Grammar: A Quick Guide
Rebecca Ivey
No ratings yet
I Used to Know That: English
From Everand
I Used to Know That: English
Patrick Scrivenor
2.5/5 (2)
Syntax and Sentence Structure in Linguistics
From Everand
Syntax and Sentence Structure in Linguistics
Aadinath Guha
No ratings yet
The Everything Essential Latin Book: All You Need to Learn Latin in No Time
From Everand
The Everything Essential Latin Book: All You Need to Learn Latin in No Time
Richard E Prior
5/5 (1)
Easy English!
From Everand
Easy English!
Alex Shepard
No ratings yet
02 The Noisy Channel Model of Spelling 19-30
No ratings yet
02 The Noisy Channel Model of Spelling 19-30
12 pages
03 Real-Word Spelling Correction 9-19
No ratings yet
03 Real-Word Spelling Correction 9-19
4 pages
05 Smoothing - Add-One 6-30
No ratings yet
05 Smoothing - Add-One 6-30
3 pages
08 Kneser-Ney Smoothing 8-59
No ratings yet
08 Kneser-Ney Smoothing 8-59
3 pages
05 Sentence Segmentation 5-31
No ratings yet
05 Sentence Segmentation 5-31
3 pages
TA1 English - Mini Excavator
No ratings yet
TA1 English - Mini Excavator
15 pages
Homework Unit #3
No ratings yet
Homework Unit #3
2 pages
Brms Final
0% (1)
Brms Final
2 pages
UNIT 11 - BT MLH 11 - Test 2
No ratings yet
UNIT 11 - BT MLH 11 - Test 2
3 pages
Hazardous Substance Fact Sheet: Right To Know
No ratings yet
Hazardous Substance Fact Sheet: Right To Know
6 pages
Whitley Penn NY Trump Crap
No ratings yet
Whitley Penn NY Trump Crap
10 pages
Intel® Architecture Instruction Set Extensions and Future Features Programming Reference
No ratings yet
Intel® Architecture Instruction Set Extensions and Future Features Programming Reference
145 pages
Lesson 2 - Rights and Obligations of Parties
No ratings yet
Lesson 2 - Rights and Obligations of Parties
9 pages
Term One Edited
No ratings yet
Term One Edited
70 pages
Contact Us - WBM International Online Shopping in Pakistan
No ratings yet
Contact Us - WBM International Online Shopping in Pakistan
1 page
FQ P1YIydaRO5Vamw3Z8XJDmy3y9
No ratings yet
FQ P1YIydaRO5Vamw3Z8XJDmy3y9
6 pages
Nutrition in Plants All Sets Quiz
No ratings yet
Nutrition in Plants All Sets Quiz
8 pages
P425/1 Pure Mathematics Paper 1 July/August 2016 3 Hours Uganda Advanced Certificate of Education Mock Examinations Pure Mathematics P425/1 3 Hours
No ratings yet
P425/1 Pure Mathematics Paper 1 July/August 2016 3 Hours Uganda Advanced Certificate of Education Mock Examinations Pure Mathematics P425/1 3 Hours
4 pages
Electrical Technology
No ratings yet
Electrical Technology
24 pages
An Investigation of Cranial Motion Through A Review of Biomechanically Based Skull Deformation Literature
No ratings yet
An Investigation of Cranial Motion Through A Review of Biomechanically Based Skull Deformation Literature
8 pages
Curves Lecture 5 Kdu Sri Lanka
No ratings yet
Curves Lecture 5 Kdu Sri Lanka
43 pages
Isp98 Confirming Undertaking
No ratings yet
Isp98 Confirming Undertaking
5 pages
Belvilla en - Rent Out Your Holiday Home Successfully
No ratings yet
Belvilla en - Rent Out Your Holiday Home Successfully
4 pages
Practice Math AA HL Paper1
100% (2)
Practice Math AA HL Paper1
12 pages
Fortum Investor Presentation May 2019 0
No ratings yet
Fortum Investor Presentation May 2019 0
56 pages
Module5 Quiz
100% (1)
Module5 Quiz
34 pages
Honors Electric Vehicles 2019 Course
No ratings yet
Honors Electric Vehicles 2019 Course
8 pages
The Role of Catestatin in Pree
No ratings yet
The Role of Catestatin in Pree
18 pages
Knook Sampler Scarf
No ratings yet
Knook Sampler Scarf
6 pages
Corporate and Academic Services: Part 1: Basic Data
No ratings yet
Corporate and Academic Services: Part 1: Basic Data
3 pages
Living and Non Living Things DLP Final Exam Prepared by Jessica Carolino...
No ratings yet
Living and Non Living Things DLP Final Exam Prepared by Jessica Carolino...
5 pages
Super m2 New Offshore Rig
No ratings yet
Super m2 New Offshore Rig
50 pages
Lesson 1 Intro To Orgl Behavior
No ratings yet
Lesson 1 Intro To Orgl Behavior
19 pages
Official Resume
No ratings yet
Official Resume
1 page
Search vs. Hashing
No ratings yet
Search vs. Hashing
55 pages

02 Regular Expressions in Practical NLP 6-04

Uploaded by

02 Regular Expressions in Practical NLP 6-04

Uploaded by

In recent [inaudible] these days there's

always a lot of talk of probabilistic

You might also like