0% found this document useful (0 votes)

53 views

Lesson 3: Matching Sets of Characters

This document discusses character sets in regular expressions. Character sets defined with [ and ] match any single character within the set. Ranges using - can concisely define sets of sequential characters. Negated sets using ^ match any character not in the set. Together, character sets provide flexible matching of specific characters or ranges while avoiding unintended matches.

Uploaded by

Me Its

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views

Lesson 3: Matching Sets of Characters

Uploaded by

Me Its

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Lesson 3

Matching Sets of Characters

In this lesson you’ll learn how to work with sets of characters. Unlike the ., which matches any single character (as
you learned in the previous lesson), sets enable you to match specific characters and character ranges.

MATCHING ONE OF SEVERAL CHARACTERS

As you learned in the previous lesson, . matches any one character (as does any literal character). In the final
example in that lesson, .a was used to match both na and sa, . matched both the n and s. But what if there was a
file (containing Canadian sales data) named ca1.xls as well, and you still wanted to match
only na and sa)? . would also match c, and so that filename would also be matched.

Text

Click here to view code image

sales1.xls
orders3.xls
sales2.xls
sales3.xls
apac1.xls
europe2.xls
na1.xls
na2.xls
sa1.xls
ca1.xls

RegEx

.a.\.xls

Result

Click here to view code image

sales1.xls
orders3.xls
sales2.xls
sales3.xls
apac1.xls
europe2.xls
na1.xls
na2.xls
sa1.xls
ca1.xls
To find n or s you would not want to match any character, you would want to match just those two characters. In
regular expressions a set of characters is defined using the metacharacters [ and ]. [ and ] define a character set,
everything between them is part of the set, and any one of the set members must match (but not all).

Here is a revised version of that example from the previous lesson:

Text

Click here to view code image

sales1.xls
orders3.xls
sales2.xls
sales3.xls
apac1.xls
europe2.xls
na1.xls
na2.xls
sa1.xls
ca1.xls

RegEx

[ns]a.\.xls

Result

Click here to view code image

sales1.xls
orders3.xls
sales2.xls
sales3.xls
apac1.xls
europe2.xls
na1.xls
na2.xls
sa1.xls
ca1.xls

Analysis
The regular expression used here starts with [ns]; this matches either n or s (but not c or any other character). The
opening [ and closing ] do not match any characters—they define the set. The literal a matches a, . matches any
character, \. matches the ., and the literal xls matches xls. When you use this pattern, only the three desired
filenames are matched.
Note
Actually, [ns]a.\.xls is not quite right either. If a file named usa1.xls existed, it would match, too (the
opening u would be ignored and sa1.xls would match). The solution to this problem involves position matching,
which will be covered in Lesson 6, “Position Matching.”
Tip
As you can see, testing regular expressions can be tricky. Verifying that a pattern matches what you want is pretty
easy. The real challenge is in verifying that you are not also getting matches that you don’t want.

Character sets are frequently used to make searches (or specific parts thereof) not case sensitive. For example:

Text

Click here to view code image

The phrase "regular expression" is often
abbreviated as RegEx or regex.

RegEx

[Rr]eg[Ee]x

Result

Click here to view code image

The phrase "regular expression" is often
abbreviated as RegEx or regex.

Analysis
The pattern used here contains two character sets: [Rr] matches R and r, and [Ee] matches E and e. This
way, RegEx and regex are both matched. REGEX, however, would not match.
Tip
If you are using matching that is not case sensitive, this technique would be unnecessary. This type of matching is
used only when performing case-sensitive searches that are partially not case sensitive.

USING CHARACTER SET RANGES

Let’s take a look at the file list example again. The last used pattern, [ns]a.\.xls, has another problem. What if a
file was named sam.xls? It, too, would be matched because the . matches all characters, not just digits.

Character sets can solve this problem as follows:

Text

Click here to view code image

sales1.xls
orders3.xls
sales2.xls
sales3.xls
apac1.xls
europe2.xls
sam.xls
na1.xls
na2.xls
sa1.xls
ca1.xls
RegEx

[ns]a[0123456789]\.xls

Result

Click here to view code image

sales1.xls
orders3.xls
sales2.xls
sales3.xls
apac1.xls
europe2.xls
sam.xls
na1.xls
na2.xls
sa1.xls
ca1.xls

Analysis
In this example, the pattern has been modified so that the first character would have to be either n or s, the second
character would have to be a, and the third could be any digit (specified as [0123456789]). Notice that
file sam.xls was not matched, because m did not match the list of allowed characters (the 10 digits).
When working with regular expressions, you will find that you frequently specify ranges of characters
(0 through 9, A through Z, and so on). To simplify working with character ranges, regex provides a special
metacharacter: - (hyphen) is used to specify a range.

Following is the same example, this time using a range:

Text

Click here to view code image

sales1.xls
orders3.xls
sales2.xls
sales3.xls
apac1.xls
europe2.xls
sam.xls
na1.xls
na2.xls
sa1.xls
ca1.xls

RegEx

[ns]a[0-9]\.xls
Result

Click here to view code image

sales1.xls
orders3.xls
sales2.xls
sales3.xls
apac1.xls
europe2.xls
sam.xls
na1.xls
na2.xls
sa1.xls
ca1.xls

Analysis
Pattern [0-9] is functionally equivalent to [0123456789], and so the results are identical to those in the
previous example.

Ranges are not limited to digits. The following are all valid ranges:

A-Z matches all uppercase characters from A to Z.

a-z matches all lowercase characters from a to z.
A-F matches only uppercase characters A to F.
A-z matches all characters between ASCII A to ASCII z (you should probably never use this pattern, because it also
includes characters such as [ and ^, which fall between Z and a in the ASCII table).

Any two ASCII characters may be specified as the range start and end. In practice, however, ranges are usually made
up of some or all digits and some or all alphabetic characters.

Tip
When you use ranges, be careful not to provide an end range that is less than the start range (like [3-1]). This will not
work, and it will often prevent the entire pattern from working.
Note
- (hyphen) is a special metacharacter because it is only a metacharacter when used between [ and ]. Outside of a set,
- is a literal and will match only -. As such, - does not need to be escaped.

Multiple ranges may be combined in a single set. For example, the following pattern matches any alphanumeric
character in uppercase or lowercase, but not anything that is neither a digit nor an alphabetic character:

[A-Za-z0-9]

This pattern is shorthand for

Click here to view code image

[ABCDEFGHIJKLMNOPQRSTUVWXYZabcde
➥fghijklmnopqrstuvwxyz01234567890]
As you can see, ranges make regex syntax much cleaner.

Following is one more example, this time finding RGB values (colors specified in a hexadecimal notation representing
the amount of red, green, and blue used to create the color). In Web pages, RGB values are specified
as #000000 (black), #ffffff (white), #ff0000 (red), and so on. RGB values may be specified in uppercase or
lowercase, and so #FF00ff (magenta) is legal, too. Here is an example taken from a CSS file:

Text

Click here to view code image

body {
background-color: #fefbd8;
}
h1 {
background-color: #0000ff;
}
div {
background-color: #d0f4e6;
}
span {
background-color: #f08970;
}

RegEx

Click here to view code image

#[0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f]

Result

Click here to view code image

body {
background-color: #fefbd8;
}
h1 {
background-color: #0000ff;
}
div {
background-color: #d0f4e6;
}
span {
background-color: #f08970;
}

Analysis
The pattern used here contains # as literal text and then the character set [0-9A-Fa-f] repeated six times. This
matches # followed by six characters, each of which must be a digit or A through F (in either uppercase or lowercase).
“ANYTHING BUT” MATCHING
Character sets are usually used to specify a list of characters of which any must match. But occasionally, you’ll want
the reverse—a list of characters that you don’t want to match. In other words, anything but the list specified here.
Rather than having to enumerate every character you want (which could get rather lengthy if you want all but a few),
character sets can be negated using the ^metacharacter. Here’s an example:

Text

Click here to view code image

sales1.xls
orders3.xls
sales2.xls
sales3.xls
apac1.xls
europe2.xls
sam.xls
na1.xls
na2.xls
sa1.xls
ca1.xls

RegEx

[ns]a[^0-9]\.xls

Result

Click here to view code image

sales1.xls
orders3.xls
sales2.xls
sales3.xls
apac1.xls
europe2.xls
sam.xls
na1.xls
na2.xls
sa1.xls
ca1.xls

Analysis
The pattern used in this example is the exact opposite of the one used previously. [0-9] matches all digits (and only
digits). [^0-9] matches anything by the specified range of digits. As such, [ns]a[^0-
9]\.xls matches sam.xls but not na1.xls, na2.xls, or sa1.xls.
Note
^ negates all characters or ranges in a set, not just the character or range that it precedes.

SUMMARY
Metacharacters [ and ] are used to define sets of characters, any one of which must match (OR in contrast to AND).
Character sets may be enumerated explicitly or specified as ranges using the – metacharacter. Character sets may be
negated using ^; this forces a match of anything but the specified characters.

Subaru XV Manual
No ratings yet
Subaru XV Manual
8 pages
Lesson 2: Matching Single Characters
No ratings yet
Lesson 2: Matching Single Characters
7 pages
Regex
No ratings yet
Regex
24 pages
Lecture 9
No ratings yet
Lecture 9
26 pages
Regular Expression For Excel
No ratings yet
Regular Expression For Excel
16 pages
Supplement Python Regular Expression
No ratings yet
Supplement Python Regular Expression
6 pages
VBA - Regular Expressions in VBScript
No ratings yet
VBA - Regular Expressions in VBScript
4 pages
Lesson 1: An Introduction, and The Abcs
No ratings yet
Lesson 1: An Introduction, and The Abcs
2 pages
COMP3.RegEx
No ratings yet
COMP3.RegEx
10 pages
Lecture 06
No ratings yet
Lecture 06
13 pages
REGULAR EXPRESSIONS Workbook
No ratings yet
REGULAR EXPRESSIONS Workbook
8 pages
Java Regular Expression Final
No ratings yet
Java Regular Expression Final
68 pages
Ian Talks Regex A-Z
From Everand
Ian Talks Regex A-Z
Ian Eress
No ratings yet
Network Security - 4.2 Reg Ex Primer
No ratings yet
Network Security - 4.2 Reg Ex Primer
3 pages
Regular Expressions
No ratings yet
Regular Expressions
35 pages
An Introduction To Regular Expressions (9781492082569)
100% (1)
An Introduction To Regular Expressions (9781492082569)
17 pages
MYSQL REGEX Details
No ratings yet
MYSQL REGEX Details
13 pages
Regular Expression Syntax
No ratings yet
Regular Expression Syntax
9 pages
Regex
100% (1)
Regex
42 pages
Regular Expression Tutorial: What Regular Expressions Are Exactly - Terminology
No ratings yet
Regular Expression Tutorial: What Regular Expressions Are Exactly - Terminology
42 pages
Lecture03 Regular Expressions 20092024 012539pm
No ratings yet
Lecture03 Regular Expressions 20092024 012539pm
36 pages
Regular Expressions Guide and Practice
No ratings yet
Regular Expressions Guide and Practice
21 pages
Regular Expressions
No ratings yet
Regular Expressions
24 pages
Module 4 - Regular Expressions
No ratings yet
Module 4 - Regular Expressions
35 pages
Jan Goyvaerts - All About Regular Expressions-Https - WWW - Regular-Expressions - Info - (2019)
No ratings yet
Jan Goyvaerts - All About Regular Expressions-Https - WWW - Regular-Expressions - Info - (2019)
206 pages
Learning Regular Expressions -- Ben Forta -- 2018 -- Addison-Wesley Professional -- 9780134757049 -- 409eaaca22d2f947dd9dfb71ab7d495e -- Anna’s Archive
No ratings yet
Learning Regular Expressions -- Ben Forta -- 2018 -- Addison-Wesley Professional -- 9780134757049 -- 409eaaca22d2f947dd9dfb71ab7d495e -- Anna’s Archive
144 pages
How To Write Regular Expressions?: What Is A Regular Expression and What Makes It So Important?
No ratings yet
How To Write Regular Expressions?: What Is A Regular Expression and What Makes It So Important?
2 pages
RegularExpressions
No ratings yet
RegularExpressions
16 pages
Howto Regex
No ratings yet
Howto Regex
19 pages
2 Regular Expression
No ratings yet
2 Regular Expression
23 pages
Oow Getting Regular With Regular Expressions
100% (1)
Oow Getting Regular With Regular Expressions
62 pages
Regular Expressions
No ratings yet
Regular Expressions
9 pages
Python RegEx
No ratings yet
Python RegEx
8 pages
Regular Expressions
No ratings yet
Regular Expressions
4 pages
Regular Expression Overview
No ratings yet
Regular Expression Overview
5 pages
Regular Expressions (Slides)
No ratings yet
Regular Expressions (Slides)
20 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
Regex Cheat Sheet
No ratings yet
Regex Cheat Sheet
10 pages
Howto Regex PDF
No ratings yet
Howto Regex PDF
20 pages
CHAPTER 10
No ratings yet
CHAPTER 10
28 pages
L4 (2)
No ratings yet
L4 (2)
73 pages
Regular Expression
No ratings yet
Regular Expression
15 pages
06 - Regular Expressions and Network Programming
No ratings yet
06 - Regular Expressions and Network Programming
55 pages
Howto Regex
No ratings yet
Howto Regex
17 pages
howto-regex
No ratings yet
howto-regex
20 pages
3 REGULAR EXPRESSION
No ratings yet
3 REGULAR EXPRESSION
15 pages
REGEX in Data Analytics
No ratings yet
REGEX in Data Analytics
5 pages
Module 4 - Regular Expressions1
No ratings yet
Module 4 - Regular Expressions1
37 pages
Regular Expression HOWTO: Guido Van Rossum and The Python Development Team
No ratings yet
Regular Expression HOWTO: Guido Van Rossum and The Python Development Team
20 pages
101 PDFsam Matlab Prog
No ratings yet
101 PDFsam Matlab Prog
20 pages
Text-Processing-For-NLP-Understanding-Regex (7)
No ratings yet
Text-Processing-For-NLP-Understanding-Regex (7)
16 pages
Java Lect 17
No ratings yet
Java Lect 17
24 pages
Regular-Expressions-Cheat-Sheet
No ratings yet
Regular-Expressions-Cheat-Sheet
5 pages
RegEx in Python (4)
No ratings yet
RegEx in Python (4)
6 pages
1111 23 Regex
No ratings yet
1111 23 Regex
17 pages
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Perl One-Liners: 130 Programs That Get Things Done
From Everand
Perl One-Liners: 130 Programs That Get Things Done
Peteris Krumins
4/5 (3)
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
GREP in InDesign: A practical guide to designers
From Everand
GREP in InDesign: A practical guide to designers
Iván Gómez
No ratings yet
GREP in InDesign
From Everand
GREP in InDesign
Iván Gómez
No ratings yet
Ultimate Electrical Technical Office Course Agenda
No ratings yet
Ultimate Electrical Technical Office Course Agenda
11 pages
Test Android Apk File With Robotium
No ratings yet
Test Android Apk File With Robotium
7 pages
F3 Science Online Lesson 25 (B)
No ratings yet
F3 Science Online Lesson 25 (B)
20 pages
Tesla Patent
No ratings yet
Tesla Patent
14 pages
Jurnal
No ratings yet
Jurnal
12 pages
The Effects of Portfolio Assessment On Writing of EFL Students
No ratings yet
The Effects of Portfolio Assessment On Writing of EFL Students
11 pages
General Data Types
No ratings yet
General Data Types
2 pages
Unit-II Modules Programming Excercises 240426 180444
No ratings yet
Unit-II Modules Programming Excercises 240426 180444
10 pages
Summative Test 1 - ELS
No ratings yet
Summative Test 1 - ELS
1 page
Data Science Team 7 Report 1
No ratings yet
Data Science Team 7 Report 1
29 pages
Problem Solving
No ratings yet
Problem Solving
8 pages
Computer Studies Notes Form 1
No ratings yet
Computer Studies Notes Form 1
7 pages
Vacuum Cannon - Reverse Engineering Project Report
No ratings yet
Vacuum Cannon - Reverse Engineering Project Report
5 pages
Chapter 3 Final
No ratings yet
Chapter 3 Final
10 pages
High Power Fibre Lasers and Amplifiers S
No ratings yet
High Power Fibre Lasers and Amplifiers S
134 pages
FPGA Based Parallel Computation Techniques For Bioinformatics Applications
No ratings yet
FPGA Based Parallel Computation Techniques For Bioinformatics Applications
5 pages
Emerging Technology Iot and Ot Overview Security Threats Attacks and Countermeasures IJERTV10IS070060
No ratings yet
Emerging Technology Iot and Ot Overview Security Threats Attacks and Countermeasures IJERTV10IS070060
8 pages
Fov (Field of View) Optimization To Image Quality
No ratings yet
Fov (Field of View) Optimization To Image Quality
5 pages
About Dial Gaug-WPS Office
No ratings yet
About Dial Gaug-WPS Office
6 pages
Don Benito Agro
No ratings yet
Don Benito Agro
7 pages
RMG-Py and Arkane Documentation
No ratings yet
RMG-Py and Arkane Documentation
225 pages
Vitamins and Minerals
No ratings yet
Vitamins and Minerals
4 pages
Musical Notes and Time Value
100% (1)
Musical Notes and Time Value
11 pages
Magic Squares!!!: by John Burton
No ratings yet
Magic Squares!!!: by John Burton
20 pages
Data Mining - Practical Machine Learning Tools AndTechniques With Java Implementations
No ratings yet
Data Mining - Practical Machine Learning Tools AndTechniques With Java Implementations
3 pages
Learning and Instruction: Ming-Te Wang, Jennifer A. Fredricks, Feifei Ye, Tara L. Hofkens, Jacqueline Schall Linn
No ratings yet
Learning and Instruction: Ming-Te Wang, Jennifer A. Fredricks, Feifei Ye, Tara L. Hofkens, Jacqueline Schall Linn
11 pages
Cube&Dice
No ratings yet
Cube&Dice
14 pages
Chemistry 11 Schemes 2024
No ratings yet
Chemistry 11 Schemes 2024
13 pages
My Gear Ill A Help and Legal
No ratings yet
My Gear Ill A Help and Legal
11 pages

Lesson 3: Matching Sets of Characters

Uploaded by

Lesson 3: Matching Sets of Characters

Uploaded by

Lesson 3

Matching Sets of Characters

MATCHING ONE OF SEVERAL CHARACTERS

Click here to view code image

Click here to view code image

Here is a revised version of that example from the previous lesson:

Click here to view code image

Click here to view code image

Click here to view code image

Click here to view code image

USING CHARACTER SET RANGES

Character sets can solve this problem as follows:

Click here to view code image

Click here to view code image

Following is the same example, this time using a range:

Click here to view code image

Click here to view code image

A-Z matches all uppercase characters from A to Z.

This pattern is shorthand for

Click here to view code image

Click here to view code image

Click here to view code image

Click here to view code image

Click here to view code image

Click here to view code image

You might also like