Re Expression

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 23

C# - Regular Expressions

Advertisements

Previous Page
Next Page
A regular expression is a pattern that could be matched against an input text.
The .Net framework provides a regular expression engine that allows such
matching. A pattern consists of one or more character literals, operators, or
constructs.

Constructs for Defining Regular Expressions


There are various categories of characters, operators, and constructs that lets
you to define regular expressions. Click the follwoing links to find these
constructs.

Character escapes
Character classes
Anchors
Grouping constructs
Quantifiers
Backreference constructs
Alternation constructs
Substitutions
Miscellaneous constructs

The Regex Class


The Regex class is used for representing a regular expression.
The Regex class has the following commonly used methods:
S.N Methods & Description
1

public bool IsMatch( string input )


Indicates whether the regular expression specified in the Regex
constructor finds a match in a specified input string.

public bool IsMatch( string input, int startat )


Indicates whether the regular expression specified in the Regex
constructor finds a match in the specified input string, beginning at
the specified starting position in the string.

public static bool IsMatch( string input, string pattern )


Indicates whether the specified regular expression finds a match
in the specified input string.

public MatchCollection Matches( string input )


Searches the specified input string for all occurrences of a regular
expression.

public string Replace( string input, string replacement )


In a specified input string, replaces all strings that match a regular
expression pattern with a specified replacement string.

public string[] Split( string input )


Splits an input string into an array of substrings at the positions
defined by a regular expression pattern specified in the Regex
constructor.

For the complete list of methods and properties, please read the Microsoft
documentation on C#.

Example 1
The following example matches words that start with 'S':
using System;
using System.Text.RegularExpressions;
namespace RegExApplication
{
class Program
{
private static void showMatch(string text, string expr)
{
Console.WriteLine("The Expression: " + expr);
MatchCollection mc = Regex.Matches(text, expr);
foreach (Match m in mc)
{
Console.WriteLine(m);
}
}
static void Main(string[] args)
{
string str = "A Thousand Splendid Suns";
Console.WriteLine("Matching words that start with 'S': ");
showMatch(str, @"\bS\S*");
Console.ReadKey();
}
}
}

When the above code is compiled and executed, it produces following result:

Matching words that start with 'S':


The Expression: \bS\S*
Splendid
Suns

Example 2
The following example matches words that start with 'm' and ends with 'e':
using System;
using System.Text.RegularExpressions;
namespace RegExApplication
{
class Program
{
private static void showMatch(string text, string expr)
{
Console.WriteLine("The Expression: " + expr);
MatchCollection mc = Regex.Matches(text, expr);
foreach (Match m in mc)
{
Console.WriteLine(m);
}
}
static void Main(string[] args)
{
string str = "make maze and manage to measure it";
Console.WriteLine("Matching words start with 'm' and ends with 'e':");
showMatch(str, @"\bm\S*e\b");
Console.ReadKey();
}
}
}

When the above code is compiled and executed, it produces following result:
Matching words start with 'm' and ends with 'e':
The Expression: \bm\S*e\b
make
maze
manage
measure

Example 3
This example replaces extra white space:
using System;
using System.Text.RegularExpressions;
namespace RegExApplication
{
class Program
{
static void Main(string[] args)

{
string input = "Hello
World
";
string pattern = "\\s+";
string replacement = " ";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);
Console.WriteLine("Original String: {0}", input);
Console.WriteLine("Replacement String: {0}", result);
Console.ReadKey();
}
}
}

When the above code is compiled and executed, it produces following result:
Original String: Hello
World
Replacement String: Hello World

C# Regex: Checking for a-z and A-Z

up I want to check if a string inputted in a character between a-z or A-Z.


vote8do Somehow my regular expression doesn't seem to pick it up. It always
wn vote returns true. I am not sure why, I gather it has to do with how I am
2
writing my regular expression. Any help would be appreciated.
favorite

private static bool isValid(String str)


{
bool valid = false;
Regex reg = new Regex((@"a-zA-Z+"));
if (reg.Match(str).Success)
valid = false;
else
valid = true;

return valid;
}
c# regex

share|improve this question

edited May 16 '11 at


13:04

asked May 16 '11 at


12:58

jlafay
Sophie Ker
3,85632250
4313
1 You're setting it to false after it matches. jlafay May 16 '11 at 13:03
A TIP: Rather than writing a-zA-Z you can use ?i to make your regex pattern case
insensitive and then just write a-zwhere ever required. NeverHopeless Nov 12 '12 at
13:02
3 Answers
activeoldestvotes

up The right way would be like so:


vote5do
wn vote private static bool isValid(String str)
accepted

{
return Regex.IsMatch(str, @"^[a-zA-Z]+$");
}
This code has the following benefits:

Using the static method instead of creating a new instance every


time: The static method caches the regular expression
Fixed the regex. It now matches any string that consists of one or
more of the characters a-z or A-Z. No other characters are allowed.
Much shorter and readable.
edited May 16 '11 at
answered May 16 '11
share|improve this answer
13:08

at 13:03

Daniel Hilgarth
75.9k1078158
Because of the anchors ^ and $, ^[a-zA-Z]+$ will match a string if it is entirely composed
of letters (probably what the OT intends, but you should update the explanation).
Ekkehard.Horner May 16 '11 at 13:10
@Ekkehard: IMHO, my explanation states exactly that... Daniel Hilgarth May 16 '11 at
13:11

up Use
vote5do
wn vote Regex.IsMatch(@"^[a-zA-Z]+$");
edited May 16 '11 at
share|improve this answer
13:45

answered May 16 '11


at 12:59

mathieu
15.2k12361
up Regex reg = new Regex("^[a-zA-Z]+$");
vote4do ^ start of the string
wn vote [] character set

+ one time or the more


$ end of the string
^ and $ needed because you want validate all string, not part of the
string
share|improve this answer

Creating Regular Expressions

answered May 16 '11


at 13:05

Regular expressions are an efficient way to process text. The following regular
expression looks complicated to a beginner:
Collapse | Copy Code

^\w+$

The PERL developer would smile. All this regular expression does is return the exact
same word entered that the expression is compared to. The symbols look very difficult
to understand, and are.The ^ symbol refers to the start of the string. The $ refers to
the end of the string. The \w refers to the a whole word with the characters A-Z, a-z, 09 and underscore. The + is simply 0 or more repetitions. The regular expression would
match:
Collapse | Copy Code

test
testtest
test1
1test

Using Regular Expressions in C# .NET


The System.Text.RegularExpressions namespace contains the Regex class used to
form and evaluate regular expressions. The Regex class contains static methods used to
compare regular expressions against strings. The Regex class uses the IsMatch() static
method to compare a string with a regular expression.
Collapse | Copy Code

bool match = Regex.IsMatch


(string input, string pattern);
If writing C# code, the example above would be:
if (Regex.IsMatch("testtest", @"^\w+$"))
{
// Do something here
}

Another useful static method is Match(), which returns a Match object with all matches
in the input string. This is useful when more than one match exists in the input
text. The following code results in more than one match:
Collapse | Copy Code

string text = "first second";


string reg = @"^([\w]+) ([\w]+)$";
Match m = Regex.Match(text, reg, RegexOptions.CultureInvariant);
foreach (Group g in m.Groups)
{

Console.WriteLine(g.Value);
}

The expression groups are entered in parentheses. The example above returns three
groups; the entire text as the first match, the first word, and the second
word. Expression groups are useful when text needs to broken down and grouped into
several pieces of related text for storage orfurther manipulation.

A Quick Example
In this example, we validate an email address using regular expressions. My regular
expressionworks:
Collapse | Copy Code

^((([\w]+\.[\w]+)+)|([\w]+))@(([\w]+\.)+)([A-Za-z]{1,3})$

However, this isnt the only expression used to validate email addresses. There are at
least two other ways that I have come across. There are many more.
We write a small C# console application that takes some text as an input, and
determines if the text is an email address.
Collapse | Copy Code

using System.Text;
using System.Text.RegularExpressions;
string text = Console.ReadLine();
string reg = @"^((([\w]+\.[\w]+)+)|([\w]+))@(([\w]+\.)+)([A-Za-z]{1,3})$";
if (Regex.IsMatch(text, reg))
{
Console.WriteLine("Email.");
}
else
{
Console.WriteLine("Not email.");
}

Try this with a few real and fake email addresses and see if it works. Let me know if
you find an error.

Documentation
Regular expressions are developed differently. The same task can be accomplished
using many different expressions. Expressions created by a developer may be
undecipherable by another.

This is why documenting regular expressions is a very important part of the


development process.The expression code comments often span several lines, and is
worth the effort in case your expression has unintended effects, or if another developer
takes over your code. Enforcing good documentation standards for regular expressions
will ensure that maintenance issues are minimal.
For example, if we document the regular expression for validating email addresses
above, we would write comments like these:
Collapse | Copy Code

//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//

Validating email addresses


@"^((([\w]+\.[\w]+)+)|([\w]+))@(([\w]+\.)+)([A-Za-z]{1,3})$"
The expression has three expression
groups.
1. ((([\w]+\.[\w]+)+)|([\w]+))
The LHS of the or clause states
that there may be more than one
sequence of two words with a .
between them.
The RHS of the or clause states
that there may be a single word.
2. (([\w]+\.)+)
This expression states that there
may be as many
words separated by a . between them
as necessary.
3. ([A-Za-z]{1,3})
This expression states that the
last set of characters may be upper
or lowercase letters. There must be
a minimum of 1 and a maximum of 3.

This may be considered a long set of comments for a lot of development standards, but
the expression has been broken down into expression groups. A new developer has
very little difficulty in understanding the function and motivation behind writing the
expression. This practice should be consistently enforced to avoid headaches when
upgrading or debugging software.

Useful Regex Software


If youve used a shell script in *NIX, then youve used grep. Windows has the
PowerGrep tool, which is similar to grep. PowerShell is a another tool which is built on
the .NET Regular Expression engine, and has command line scripting utilities. Espresso

by UltraPico (www.ultrapico.com) is a free Regular Expression Editor which you can use
to build and test your regular expressions.

Conslusion
Regular expressions are an efficient way to search, identify and validate large quantities
of text without having to write any comparisons. Although they may be complicated,
writing and documenting regular expressions allows the developer to concentrate on
more important parts of the implementation process. The use of several free and open
source regular expression tools makes understanding and building regular expressions
a worthwhile task.
To download this technical article in PDF format, go to the Coactum Solutions website
athttp://www.coactumsolutions.com/Articles.aspx.

C# Regex.Match
Regex.Match searches strings
based on a pattern. It isolates
part of a string based on the
pattern specified. It requires that
you use the text-processing
language for the pattern. It
proves to be useful and effective
in many C# programs.
String
Input and output required for examples

Input string:

/content/some-page.aspx

Required match: some-page

Input string:

/content/alternate-1.aspx

Required match: alternate-1

Input string:

/images/something.png

Required match: -

Example
We first see how you can match
the filename in a directory path
with Regex. This has more
constraints regarding the
acceptable characters than many
methods have. You can see the
char range in the second
parameter to Regex.Match.
Program that uses Regex.Match: C#

using System;
using System.Text.RegularExpressions;

class Program
{
static void Main()
{
// First we see the input string.
string input = "/content/alternate-1.aspx";

// Here we call Regex.Match.


Match match = Regex.Match(input, @"content/([A-Za-z0-9\]+)\.aspx$",

RegexOptions.IgnoreCase);

// Here we check the Match instance.


if (match.Success)
{
// Finally, we get the Group value and display it.
string key = match.Groups[1].Value;
Console.WriteLine(key);
}
}
}

Output

alternate-1

In this example, we use the @


verbatim string syntax, which
designates the syntax we can
use in the pattern. Its pattern
starts with "content/". We
require that our group, which is
in parentheses, is after the
"content/" string.
String Literal
Also:The symbols in the "["
and "]" are ranges of
characters, or single
characters. These are the
allowed characters in our
group.
What it captures from the
string. It captures a Group. The
content in the parentheses,
Group, is collected. Then we
require that the match succeeds,
and then we access the value
with Groups[1].
Tip:It is important to note
that the indexing of the
Groups collection on Match
objects starts at 1.
And:Some computer
languages start with 1,
but the C# language
usually does not. It does

here, and we must


remember this.

ToLower

Using ToLower instead of


RegexOptions.IgnoreCase on the
Regex yielded a 10% or higher
improvement. Since I needed a
lowercase result, calling the C#
string ToLower method first was
simpler.
ToLower
Program that also uses Regex.Match: C#

using System;
using System.Text.RegularExpressions;

class Program
{
static void Main()
{
// This is the input string.
string input = "/content/alternate-1.aspx";

// Here we lowercase our input first.


input = input.ToLower();
Match match = Regex.Match(input, @"content/([A-Za-z0-9\]+)\.aspx$");

}
}

Static Regex

Here we see that using a Regex


instance object is faster than
using the static Regex.Match. For
performance, you should always
use an instance object. It can be
shared throughout the entire
project.
Static Regex
Program that uses static Regex: C#

using System;
using System.Text.RegularExpressions;

class Program
{
static void Main()
{
// The input string again.
string input = "/content/alternate-1.aspx";

// This calls the static method specified.


Console.WriteLine(RegexUtil.MatchKey(input));
}

static class RegexUtil


{
static Regex _regex = new Regex(@"/content/([a-z0-9\]+)\.aspx$");
/// <summary>
/// This returns the key that is matched within the input.
/// </summary>
static public string MatchKey(string input)
{
Match match = _regex.Match(input.ToLower());
if (match.Success)
{
return match.Groups[1].Value;
}
else
{
return null;
}
}
}

Output

alternate-1

This static class stores an


instance Regex that can be used
project-wide. We initialize it
inline. The custom method
exposes a MatchKey method.
This is a useful method I
developed to return the string
that we want from the input
value.
Static Class
Pattern description. It uses a
letter range. In this code I show
the Regex with the "A-Z" range
removed, because the string is
already lowercased. I found that
removing as many options from
the Regex as possible boosted
performance.
Tip:With this code, I found
that using
RegexOptions.RightToLeft
made the pattern slightly
faster as well.
Note:The expression
engine has to evaluate

fewer characters in this


case. But this option could
slow down or speed up
your Regex.

Numbers

One common requirement is


extracting a number from a
string. We can do this with
Regex.Match. Match handles only
one numberif a string has
more than one, use instead
Regex.Matches.
Next:We extract a group of
digit characters and access
the Value string
representation of that
number.
Also:To parse the
number, use int.Parse or
int.TryParse on the Value
here. This will convert it to
an int.

int.Parseint.TryParse
Program that uses Match on numbers: C#

using System;
using System.Text.RegularExpressions;

class Program
{
static void Main()
{
// ... Input string.
string input = "Dot Net 100 Perls";

// ... One or more digits.


Match m = Regex.Match(input, @"\d+");

// ... Write value.


Console.WriteLine(m.Value);
}
}

Output

100

Performance

You can add the


RegexOptions.Compiled flag for a
substantial performance gain at
runtime. This will however make
your program start up slower.
With RegexOptions.Compiled we
see often 30% better
performance.
RegexOptions.CompiledPerformance

Summary

We used Regex.Match. This


method extracts a single match
from the input string. We can
access the matched data with
the Value property. And similar
methods, such as IsMatch and
Matches, are often helpful.
IsMatchMatches

How to: Search Strings Using


Regular Expressions (C#
Programming Guide)
Visual Studio 2008
Other Versions

The System.Text.RegularExpressions.Regex class can be used to search strings. These searches can range
in complexity from very simple to making full use of regular expressions. The following are two examples
of string searching by using the Regex class. For more information, see .NET Framework Regular
Expressions.

Example
The following code is a console application that performs a simple case-insensitive search of the strings in
an array. The static method Regex.IsMatch performs the search given the string to search and a string that
contains the search pattern. In this case, a third argument is used to indicate that case should be ignored.
For more information, see System.Text.RegularExpressions.RegexOptions.
C#
class TestRegularExpressions
{
static void Main()
{
string[] sentences =
{
"C# code",
"Chapter 2: Writing Code",
"Unicode",
"no match here"
};
string sPattern = "code";
foreach (string s in sentences)
{
System.Console.Write("{0,24}", s);
if (System.Text.RegularExpressions.Regex.IsMatch(s, sPattern,
System.Text.RegularExpressions.RegexOptions.IgnoreCase))
{
System.Console.WriteLine(" (match for '{0}' found)", sPattern);
}
else
{
System.Console.WriteLine();
}
}
// Keep the console window open in debug mode.
System.Console.WriteLine("Press any key to exit.");
System.Console.ReadKey();

}
}
/* Output:
C# code (match for 'code' found)
Chapter 2: Writing Code (match for 'code' found)
Unicode (match for 'code' found)
no match here
*/
The following code is a console application that uses regular expressions to validate the format of each
string in an array. The validation requires that each string take the form of a telephone number in which
three groups of digits are separated by dashes, the first two groups contain three digits, and the third
group contains four digits. This is done by using the regular expression ^\\d{3}-\\d{3}-\\d{4}$. For
more information, see Regular Expression Language - Quick Reference.
C#
class TestRegularExpressionValidation
{
static void Main()
{
string[] numbers =
{
"123-555-0190",
"444-234-22450",
"690-555-0178",
"146-893-232",
"146-555-0122",
"4007-555-0111",
"407-555-0111",
"407-2-5555",
};
string sPattern = "^\\d{3}-\\d{3}-\\d{4}$";
foreach (string s in numbers)
{
System.Console.Write("{0,14}", s);
if (System.Text.RegularExpressions.Regex.IsMatch(s, sPattern))
{
System.Console.WriteLine(" - valid");
}
else
{
System.Console.WriteLine(" - invalid");
}
}
// Keep the console window open in debug mode.
System.Console.WriteLine("Press any key to exit.");
System.Console.ReadKey();

}
}
/* Output:
123-555-0190
444-234-22450
690-555-0178
146-893-232
146-555-0122
4007-555-0111
407-555-0111
407-2-5555
*/

valid
invalid
valid
invalid
valid
invalid
valid
invalid

You might also like