Chapter 2 StringBuilder Regex
Chapter 2 StringBuilder Regex
Regular Expressions
Overview
• The techniques in this section can be employed
to develop:
– text editors, word processors, page-layout software,
computerized typesetting systems and other kinds of
text processing software.
(count) to be copied.
• Line 23 takes two int arguments. The f irst argument specif ie s the
starting index from which the method copies characters from the
original string. The second argument specif ie s the length of the
substring to be copied. The substring returned contains a copy of
the specified characters from the original string.
Concatenating string
• .net provide many ways to concatenate strings.
• The + operator:
– E.g. string name = “muna” ; name += “abay”;
• The static method Concat of class String concatenates
two strings and returns a new string containing the
combined characters from both original strings.
Miscellaneous String Methods
• Class String provides several methods that return modif ie d copies
of strings.
• The following demonstrates the use of String methods:
– Replace(), ToLower(), ToUpper(), Trim() and ToString().
• Line 27 uses String method Replace() to return a new string,
replacing every occurrence in string1 of character 'e' with character
'E'.
• Method Replace takes two arguments—a string for which to search
and another string with which to replace all matching occurrences
of the f ir st argument. The original string remains unchanged. If
there are no occurrences of the f ir st argument in the string, the
method returns the original string.
• String method ToUpper generates a new string (line 31) that replaces any
lowercase letters in string1 with their uppercase equivalent.
• The method returns a new string containing the converted string; the original
string remains unchanged. If there are no characters to convert to
uppercase, the method returns the original string.
• Line 32 uses String method ToLower to return a new string in which any
uppercase letters in string1 are replaced by their lowercase equivalents. The
original string is unchanged. As with ToUpper, if there are no characters to
convert to lowercase, method ToLower returns the original string.
• Line 36 uses String method Trim to remove all whitespace
characters that appear at the beginning and end of a string. Without
otherwise altering the original string, the method returns a new
string that contains the string, but omits leading or trailing
whitespace characters. Another version of method Trim takes a
character array and returns a string that does not contain the
characters in the array argument.
ARRAYLIST, STRINGBUILDER,
REGULAR EXPRESSION(REGEX)
ArrayList (System.Collections namespace)
• An ArrayList is a dynamically-sized array
of elements stored in contiguous memory.
• Following is an example of using an
ArrayList:
• ArrayList al = new ArrayList();
– al.Add("Hello World");
– al.Add(100);
– al.Add(3.14159265);
• Count – is a property of the ArrayList
class that gets the number of elements
actually contained in the arraylist.
• Be aware of the following facts about using an ArrayList:
• Like using no rmal arrays, items in an ArrayList c an be
accessed by using the index number.
• When you try to access an index position that does not yet
hold data, an IndexOutOfBoundsException is thrown.
• Use the Add method to add items to the ArrayList.
• An ArrayList is an ordered collection. That is, elements placed
in an ArrayList are stored in the order in which they are added
to the ArrayList.
• Whereas a normal array can only hold elements of the same
type, an ArrayList can hold data of multiple types. The example
shown here adds a string, and int, and a double to the same
ArrayList.
• The ArrayList resizes automatically to hold all items added to it.
The default initial capacity is 16. If an ArrayList is full and one
more element is added to it, the ArrayList must be resized. To
do this, enough space is allocated for double its capacity, then
it copies all of its elements into the new space. Be aware that
this reallocation can take some time. It is wise to set the
capacity to a reasonable size when creating the ArrayList to
prevent resizing.
The following table lists common methods for working with ArrayLists.
Method Description
ArrayList.Add(object obj) Appends an element to the end of the ArrayList.
ArrayList.AddRange(ICollection c) Appends a collection of elements to the end of the ArrayList.
Inserts an element in the position indicated. Insert must move (copy) all
ArrayList.Insert(int index, object obj)
subsequent elements.
Inserts a collection of elements in the position indicated. All subsequent
ArrayList.InsertRange(int index, ICollection c)
elements must be moved.
ArrayList.Remove(object obj) Removes the first occurrence of obj.
ArrayList.RemoveAt(int index) Removes the element at the index indicated.
Removes elements of a specific type (in this example int) beginning at index
ArrayList.RemoveRange(int index, int count)
position.
ArrayList.Clear( ) Removes all the elements from the ArrayList.
ArrayList.GetRange(int index, int count) Returns a new ArrayList containing the elements from index for count.
ArrayList.Contains(object item) Indicates whether the array list contains item. Performs a linear search.
Returns the index of the first occurrence of item in the ArrayList. Returns -1 if
ArrayList.IndexOf(object item)
the item is not found.
Returns the index of the first occurrence of item in the ArrayList after start. Use
ArrayList.IndexOf(object item, int start)
this to search for all occurrences of an element sequentially.
ArrayList.Reverse( ) Reverses the elements in the ArrayList.
Sorts the ArrayList. Requires that all members implement the IComparable
ArrayList.Sort( )
Class StringBuilder – namespace
System.Text
C#
Define Regular Expressions
• Regular expressions provide a powerful,
flexible, and efficient method for processing
text.
• The extensive pattern-matching notation of
regular expressions enables you to quickly
parse large amounts of text to:
– Find specific character patterns.
– Validate text to ensure that it matches a
predefined pattern (such as an email address).
– Extract, edit, replace, or delete text substrings.
– Add extracted strings to a collection in order to
generate a report.
Real life applications of Regular
Expressions
• Extracting emails from a document: A lot of times, the
sales and marketing teams might require
finding/extracting emails and other contact information
from large text documents.
• Regular Expressions for Web Scraping (Data Collection):
One can simply scrape websites like Wikipedia etc. to
collect/generate data. But web scraping has its own
issues – the downloaded data is usually messy and full
of noise. This is where Regex can be used effectively.
• Working with Date-Time features: regex enables you to
work with different date formats
• Using Regex for Text Pre-processing (NLP): removing
inconsistent data when working with data collected
either manually or web scraped data.
Other applications of Regex
• Regular expressions are specially formatted
strings:
– Used to find patters in text, and
– During information validation (data is in a particular
format).
– E.g. the first three symbol of student id must be
alphabet.
– Last name must start with a capital letter.
• Application of regular expression – to facilitate
the construction of a compiler.
– Large and complex reg exp – used to validate the
syntax of a program.
• In .net classes to recognize and manipulate regular
e x p r e s s i o n s a r e f o u n d i n
System.Text.RegularExpressions namespace.
How does Regular Expression work?
• The centerpiece of text processing with regular
expressions is the regular expression engine,
which is represented by the
System.Text.RegularExpressions.Regex object i
n .NET.
• Processing text using regular expressions
requires that the regular expression engine be
p rov ided with the f ollowing two item s of
information:
– The regular expression pattern to identify in the text.
– The text to parse for the regular expression pattern.
•
Examples: Replace substrings
string pattern = @"(Mr\.? |Mrs\.? |Miss |Ms\.? )";
string[] names = { "Mr. Henry Hunt", "Ms. Sara
Samuels", "Abraham Adams", "Ms. Nicole Norris"
};
foreach (string name in names)
{Console.WriteLine(Regex.Replace(name, pattern,
String.Empty)); }
Output
Henry Hunt
Sara Samuels
Abraham Adams
Methods in Regex
• Class Regex – represent an immutable
regular expression.
– Contains static methods - such as
• Match() that returns an object of class Match
(represents a single regular expression match).
• Matches() finds all matches of a regluar expression
in an arbitrary string and returns a MatchCollection
object (set of Matches).
• IsMatch() Indicates whether the regular expression
specified in the Regex constructor finds a match in
the specified input string
• Other methods include Replace() Split()
Cont
• (see table in next slide) some character classes that can
be used with regular expressions.
• A character class is an escape sequence that represents
a group of characters.
– A word character is any alphanumeric character or
underscore.
– A whitespace character is a space, a tab, a carriage
return, a newline or a form feed.
– A digit is any numeric character.
• Regular expressions are not limited to these character
classes, however.
• The expressions employ various operators and other
forms of notation to search for complex patterns.
• We discuss several of these techniques in the context of
• Elements of RegularExpressions
• [abc] a,b or c
• [a-z] a to z
• [A-Z] A to Z
• [a-zA-z] a to z A to Z
• [0-9] digits 0 to 9
Elements of Regex text
• Quantifier
• [ ]? 0 or 1 time
• []+ occurs one or more time
• []* occurs zero or more time
• []{n} occurs n times
• []{n,m} occurs in the range of n to m
• The regular expression in line 19 (see also below)
searches for a string that starts with the letter "J",
followed by any number of characters, followed by a two
-digit number (of which the second digit cannot be 4),
followed by a dash, another two-digit number, a dash
and another two-digit number.
19
Method Replace() and Split() of
Regex
• Regex class provides static and instance versions of
methods Replace and Split.
– Replace() – is useful to replace parts of a string with another,
and,
– Split() – is useful to split a string according to a regular
expression.
Replace() method
• Method Replace replaces text in a string with new text wherever the
original string matches a regular expression. It has two version –
static and instance method
• Static version of Replace()
• Takes three parameters—the string to modify, the string containing
the regular expression to match and the replacement string.
• Replace replaces every instance of "*" in testString1 with "^".
– Notice the regular expression (@"\*") precedes character * with a backslash, \.
• Normally, * is a quantifier indicating that a regular expression should
match any number of occurrences of a preceding pattern.
• Using Replace() instance method that uses the regular
expression passed to the constructor for testRegex1 to
perform the replacement operation. In this case, every
match for the regular expression "stars" in testString1 is
replaced with "carets".
• Use of instance method Replace() to
Split() method of Regex
• Method Split divides a string into several substrings. The original
string is broken in any location that matches a specified regular
expression.
• Method Split returns an array containing the substrings between
matches for the regular expression.
• We use the static version of method Split to separate a strings that
are -separated by comma.
Exercise
• Enter a character between a and z
• Enter a character between a and z and
with length of 2
• Enter a character between a and z and
with length spanning from 1 up to 3
• Validate data with 8 digits fixed format
• Validate invoice number that has the
following format the first three are
characters followed by 8 digits