0% found this document useful (0 votes)
57 views59 pages

Chapter 2 StringBuilder Regex

This document discusses strings and string manipulation in C# including the String, StringBuilder, and Regex classes. It covers string constructors, properties like Length, and methods like IndexOf, Substring, Concat, and more for searching, extracting, comparing and concatenating strings.

Uploaded by

Mikiyas Abate
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views59 pages

Chapter 2 StringBuilder Regex

This document discusses strings and string manipulation in C# including the String, StringBuilder, and Regex classes. It covers string constructors, properties like Length, and methods like IndexOf, Substring, Concat, and more for searching, extracting, comparing and concatenating strings.

Uploaded by

Mikiyas Abate
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Char, String, StringBuilder, and

Regular Expressions
Overview
• The techniques in this section can be employed
to develop:
– text editors, word processors, page-layout software,
computerized typesetting systems and other kinds of
text processing software.

• Focus on the capabilities of:


– class String and type char - in System namespace,
– class StringBuilder - in System.Text namespace, and
– classes Regex and Match – in System.Text.RegularExpressions.
• Characters are the fundamental building blocks
of C# source code. It includes:
– Normal characters, Character constants (or
character code e.g 122 corresponds ‘Z’, 10
corresponds ‘\n’.
– established according to the Unicode character set.
• String is a series of characters treated as a
single unit.
– Uppercase, lowercase letters, digits, and special characters (+,
-, *, /, $) and others.
– A string is an object of class String in the System
namespace.
– string literals (string constants) - sequences of
characters in double quotation marks. Eg. “Hello
world”
• A string also contain multiple backslash
characters (e.g. in name of a file).
• @ character can be used to exclude escape
sequences and interpret all the characters in a
string literally.
• E.g
– string file = "C:\\MyFolder\\MySubFolder\\MyFile.txt
";
• It can be altered to
– string file = @"C:\MyFolder\MySubFolder\MyFile.txt";
• C# provides the string keyword as an alias for
class String
String constructors
• Class String provides eight constructors
for initializing strings in various ways.
• Line 25 - assigns to string3 a new string,
using the String constructor that takes a char
array and two int arguments.
– 2 argument - specifies the starting index position.
nd

– 3 argument - specif ies the number of characters


rd

(count) to be copied.

• Line 26 - assigns to string4 a new string,


using the String constructor that takes as
arguments a character and an int specifying
the number of times to repeat that character
in the string.
String Indexer, Length Property and CopyTo
Method
• String indexer - facilitates the retrieval of
any character in the string, and
• String property Length - returns the length
of the string.
• S tr i n g m e th o d C o p y To ( ) - c o p i e s a
specif ie d number of characters from a
string into a char array.
• The program determines - length of string, reverses
order of characters in the string, and copies a series of
characters from the string into a character array.
• Line 27 uses length property to determine the number of characters
in string string1. strings always know their own size.

• Lines 33–34 append to output the characters of the string string1 in


reverse order. The string indexer returns the character at a specif ic
position in the string. The string indexer treats a string as an array
of chars. The indexer receives an integer argument as the position
number and returns the character at that position. As with arrays,
the first element of a string is considered to be at position 0.
• Line 37 uses CopyTo method to copy the characters of a
string (string1) into a character array (characterArray).
– 1st argument is the index from which the method begins copying
characters in the string, and
– 2nd argument is the character array into which the characters are
copied.
– 3rd argument is the index specifying the location at which the
method places the copied characters in the character array.
– The last argument is the number of characters that the method
will copy from the string.
Comparing strings
• Compu ters can order ch aracters
alphabetically
– because the characters are represented
internally as Unicode numeric codes.
• String comparison - simply compares the
numeric codes of the characters in the
strings.
• .NET provides several ways to compare
strings.
– These are method - Equals(), CompareTo(),
and equality operator (==).
• Method Equals() - (inherited by String from class Object) - tests any
two objects for equality (i.e., checks objects contain identical
contents).
– The return of the method is either true or false.
• Method Equals uses a lexicographical comparison—the integer
Unicode values. Compares the numeric Unicode values that
represent the characters in each string.
• Line 27 uses – method Equals() to compare string1 and literal strin
g "hello”.
• Comparisons are case sensitive. Look at the following
that test for string equality between string3 and string4
(line 39).
• Here, static method Equals (as opposed to the instance
method uses in previous slide) is used to compare the
values of two strings.
• Line 33 uses equality operator (==) to compare string string1 with
the literal string "hello" for equality. This also uses a lexicographical
comparison to compare two strings.
– Thus, the condition in the if structure evaluates to true, because the values of
string1 and "hello" are equal.
• To compare the references of two strings, we must explicitly cast
the strings to type object and use the equality operator (==).
• Lines 46–54 use the String method CompareTo to compare strings.
– Method CompareTo returns 0 if the strings are equal, a -1 if the string that invokes
CompareTo is less than the string that is passed as an argument and
– a 1 if the string that invokes CompareTo is greater than the string that is passed
as an argument.
• Method CompareTo uses a lexicographical comparison.
Method StartsWith() and EndsWith()
• C# also provides ways to test whether a
string instance begins or ends with a
given string.
• Method StartsWith determines whether a
string instance starts with the string text
passed to it as an argument.
• Method EndsWith determines whether a
string instance ends with the string text
passed to it as an argument.
• See demon in the next slid
• Line 21 uses method StartsWith, which takes a string argument.
The condition in the if structure determines whether the string at
index i of the array starts with the characters "st". If so, the method
returns true and appends strings[i] to string output for display
purposes.
• Line 30 uses method EndsWith, which also takes a string argument.
The condition in the if structure determines whether the string at index i
of the array ends with the characters "ed". If so, the method returns true,
and strings[i] is appended to string output for display purposes.

• Reading assignment on String Method GetHashCode (pp 11)


Locating Characters and Substrings in
Strings
• In many applications, it is necessary to
search for a character or set of characters
in a string.
• The application in the next slide
demonstrates some of the many versions
of String methods:
– IndexOf, IndexOfAny, LastIndexOf and
LastIndexOfAny, which search for a specified
character or substring in a string.
• Lines 20, 23 and 26 use method IndexOf to locate the f irst occurrence of a
character or substring in a string.
– If IndexOf finds a character, IndexOf returns the index of the specified character in
the string;
– otherwise, IndexOf returns –1.
• The expression on line 23 uses a version of method IndexOf that takes two
arguments—the character to search for and the starting index at which the
search of the string should begin.
– The method does not examine any characters that occur prior to the starting
index (in this case 1).
• The expression in line 26 uses another version of method IndexOf that takes
three arguments—the character to search for, the index at which to start
searching and the number of characters to search.
• Lines 30, 33 and 36 use method LastIndexOf to locate the last
occurrence of a character in a string.
• Method LastIndexOf performs the search from the end of the string
toward the beginning of the string.
– If method LastIndexOf finds the character, LastIndexOf returns the index
of the specif ie d character in the string; otherwise, LastIndexOf returns
–1.
• There are three versions of LastIndexOf .
• Line 30 uses LastIndexOf that takes as an argument the character
for which to search.
• Line 33 uses LastIndexOf that takes two arguments—the character
for which to search and the highest index from which to begin
searching backward for the character.
• Line 36 uses a third version of method LastIndexOf that takes three
arguments— the character for which to search, the starting index
from which to start searching backward and the number of
characters (the portion of the string) to search.
Extracting Substrings from Strings
• Line 19 uses the Substring method that takes one int argument.
The argument specif ie s the starting index from which the method
copies characters in the original string.
– The substring returned contains a copy of the characters from the starting index
to the end of the string.

• Line 23 takes two int arguments. The f irst argument specif ie s the
starting index from which the method copies characters from the
original string. The second argument specif ie s the length of the
substring to be copied. The substring returned contains a copy of
the specified characters from the original string.
Concatenating string
• .net provide many ways to concatenate strings.
• The + operator:
– E.g. string name = “muna” ; name += “abay”;
• The static method Concat of class String concatenates
two strings and returns a new string containing the
combined characters from both original strings.
Miscellaneous String Methods
• Class String provides several methods that return modif ie d copies
of strings.
• The following demonstrates the use of String methods:
– Replace(), ToLower(), ToUpper(), Trim() and ToString().
• Line 27 uses String method Replace() to return a new string,
replacing every occurrence in string1 of character 'e' with character
'E'.
• Method Replace takes two arguments—a string for which to search
and another string with which to replace all matching occurrences
of the f ir st argument. The original string remains unchanged. If
there are no occurrences of the f ir st argument in the string, the
method returns the original string.
• String method ToUpper generates a new string (line 31) that replaces any
lowercase letters in string1 with their uppercase equivalent.
• The method returns a new string containing the converted string; the original
string remains unchanged. If there are no characters to convert to
uppercase, the method returns the original string.
• Line 32 uses String method ToLower to return a new string in which any
uppercase letters in string1 are replaced by their lowercase equivalents. The
original string is unchanged. As with ToUpper, if there are no characters to
convert to lowercase, method ToLower returns the original string.
• Line 36 uses String method Trim to remove all whitespace
characters that appear at the beginning and end of a string. Without
otherwise altering the original string, the method returns a new
string that contains the string, but omits leading or trailing
whitespace characters. Another version of method Trim takes a
character array and returns a string that does not contain the
characters in the array argument.
ARRAYLIST, STRINGBUILDER,
REGULAR EXPRESSION(REGEX)
ArrayList (System.Collections namespace)
• An ArrayList is a dynamically-sized array
of elements stored in contiguous memory.
• Following is an example of using an
ArrayList:
• ArrayList al = new ArrayList();
– al.Add("Hello World");
– al.Add(100);
– al.Add(3.14159265);
• Count – is a property of the ArrayList
class that gets the number of elements
actually contained in the arraylist.
• Be aware of the following facts about using an ArrayList:
• Like using no rmal arrays, items in an ArrayList c an be
accessed by using the index number.
• When you try to access an index position that does not yet
hold data, an IndexOutOfBoundsException is thrown.
• Use the Add method to add items to the ArrayList.
• An ArrayList is an ordered collection. That is, elements placed
in an ArrayList are stored in the order in which they are added
to the ArrayList.
• Whereas a normal array can only hold elements of the same
type, an ArrayList can hold data of multiple types. The example
shown here adds a string, and int, and a double to the same
ArrayList.
• The ArrayList resizes automatically to hold all items added to it.
The default initial capacity is 16. If an ArrayList is full and one
more element is added to it, the ArrayList must be resized. To
do this, enough space is allocated for double its capacity, then
it copies all of its elements into the new space. Be aware that
this reallocation can take some time. It is wise to set the
capacity to a reasonable size when creating the ArrayList to
prevent resizing.

The following table lists common methods for working with ArrayLists.
Method Description
ArrayList.Add(object obj) Appends an element to the end of the ArrayList.
ArrayList.AddRange(ICollection c) Appends a collection of elements to the end of the ArrayList.
Inserts an element in the position indicated. Insert must move (copy) all
ArrayList.Insert(int index, object obj)
subsequent elements.
Inserts a collection of elements in the position indicated. All subsequent
ArrayList.InsertRange(int index, ICollection c)
elements must be moved.
ArrayList.Remove(object obj) Removes the first occurrence of obj.
ArrayList.RemoveAt(int index) Removes the element at the index indicated.
Removes elements of a specific type (in this example int) beginning at index
ArrayList.RemoveRange(int index, int count)
position.
ArrayList.Clear( ) Removes all the elements from the ArrayList.

ArrayList.GetRange(int index, int count) Returns a new ArrayList containing the elements from index for count.

ArrayList.ToArray( ) Builds a System.Array out of the ArrayList.


ArrayList.Clone( ) Makes an exact (shallow) copy of the ArrayList.

ArrayList.Contains(object item) Indicates whether the array list contains item. Performs a linear search.

Returns the index of the first occurrence of item in the ArrayList. Returns -1 if
ArrayList.IndexOf(object item)
the item is not found.

Returns the index of the first occurrence of item in the ArrayList after start. Use
ArrayList.IndexOf(object item, int start)
this to search for all occurrences of an element sequentially.
ArrayList.Reverse( ) Reverses the elements in the ArrayList.
Sorts the ArrayList. Requires that all members implement the IComparable
ArrayList.Sort( )
Class StringBuilder – namespace
System.Text

• String class has many capabilities for processing strings.


• However a string’s contents can never change – immutable.
– Eg. Concatenation of string (+=) - create new string and assigns its reference
to the variable.
• class StringBuilder - used to create and manipulate dynamic string
information—i.e., mutable (changeable)
• Every StringBuilder can store a certain number of characters that’s
specified by its capacity. Exceeding the capacity of a StringBuilder
causes the capacity to expand to accommodate the additional
characters.
– E.g concatenation method such as Append and AppendFormat – maintain
without creating any new string objects.
• StringBuilder is particularly useful for manipulating in place a large
number of strings, as it’s much more efficient than creating
individual immutable strings.
StringBuilder Constructors
• Class StringBuilder provides six overloaded
constructors.
– E.g.
– var buffer1 = new StringBuilder(); // with default initial
capacity
– var buffer2 = new StringBuilder(10); //initial capacity
spacified in int
– var buffer3 = new StringBuilder("hello");// initialized with
string content
• Output of:
– Console.WriteLine($"buffer1 = \"{buffer1}\""); // buffer1
=““
Length and Capacity Properties, EnsureCapacity Method and Indexer of Class StringBuilder

• Property - Length and Capacity


– Length - return the number of characters currently in a
StringBuilder, and
– Capacity – return the number of characters that a
StringBuilder can store without allocating more memory.
• used to increase or decrease the length or the
capacity of the StringBuilder.
• Method - EnsureCapacity
– allows to reduce the number of times that a
StringBuilder’s capacity must be increased.
• The method ensures that the StringBuilder’s
capacity is at least the specified value.
var buffer = new StringBuilder("Hello, how are you?");
// use Length and Capacity properties
Console.WriteLine($"buffer = {buffer}" + $"\nLength = { }" + $"\nCapacity = { }");
buffer.EnsureCapacity(75);
Console.WriteLine($"\nNew capacity = { }");
// truncate StringBuilder by setting Length property
buffer.Length=10;
Console.Write($"New length = { buffer.Length}\n\nbuffer = "); // use StringBuilder
indexer
for (int i = 0; i < ; ++i)
{
Console.Write(buffer[i] );
} Console.WriteLine();
Append and AppendFormat Methods of Class StringBuilder

• Class StringBuilder provides overloaded


Append methods that allow various types of
values to be added to the end of a
StringBuilder.
• The Framework Class Library provides
versions for each simple type and for
character arrays, strings and objects.
(Remember that method ToString produces a
string representation of any object.)
• Each method takes an argument, converts it to
a string and appends it to the StringBuilder.
• object objectValue = "hello";
• var stringValue = "good bye"; • // use method Append to append values to
buffer
• char[] characterArray = {'a', 'b', 'c', 'd', 'e', 'f'};
• var booleanValue = true; • buffer.Append(objectValue); buffer.Append(" ");
• var characterValue = 'Z'; • buffer.Append(stringValue); buffer.Append(" ");
• var integerValue = 7;
• buffer.Append(characterArray); buffer.Append(" ");
• var longValue = 1000000L; // L suffix indicates a long
literal • buffer.Append(characterArray, 0, 3);
• var floatValue = 2.5F; // F suffix indicates a float literal buffer.Append(" ");
• var doubleValue = 33.333; • buffer.Append(booleanValue); buffer.Append(" ");
• var buffer = new StringBuilder();
• buffer.Append(characterValue); buffer.Append(" ");
• buffer.Append(integerValue); buffer.Append(" ");
• buffer.Append(longValue); buffer.Append(" ");
• buffer.Append(floatValue); buffer.Append(" ");
• buffer.Append(doubleValue);
• Console.WriteLine($"buffer = {buffer.ToString()}");
Insert, Remove and Replace Methods of Class
StringBuilder
• Class StringBuilder provides overloaded Insert methods
– to allow various types of data to be inserted at any position in a StringBuilder.
• The class provides versions for each simple type and for character arrays, strings
and objects.
• Each method takes its second argument, converts it to a string and inserts the string
into the StringBuilder in front of the character in the position specified by the first
argument.
• The index specified by the first argument must be greater than or equal to 0 and less
than the StringBuilder’s length; otherwise, the program throws an
ArgumentOutOfRangeException.
• Class StringBuilder also provides method Remove for deleting any portion of a
StringBuilder.
• Method Remove takes two arguments—the index at which to begin deletion and the
number of characters to delete.
• The sum of the starting index and the number of characters to be deleted must
always be less than the StringBuilder’s length; otherwise, the program throws an
ArgumentOutOfRangeException.
The Insert and Remove methods are demonstrated
• object objectValue = "hello"; • buffer.Insert(0, objectValue); buffer.Insert(0, " ");
• var stringValue = "good bye"; • buffer.Insert(0, stringValue); buffer.Insert(0, " ");
• char[] characterArray = {'a', 'b', 'c', 'd', 'e', 'f'};
• buffer.Insert(0, characterArray); buffer.Insert(0, " ");
• var booleanValue = true;
• var characterValue = 'K'; • buffer.Insert(0, booleanValue); buffer.Insert(0, " ");
• var integerValue = 7; • buffer.Insert(0, characterValue); buffer.Insert(0, " ");
• var longValue = 1000000L; // L suffix indicates a
long literal • buffer.Insert(0, integerValue); buffer.Insert(0, " ");
• var floatValue = 2.5F; // F suffix indicates a float • buffer.Insert(0, longValue); buffer.Insert(0, " ");
literal
• var doubleValue = 33.333; • buffer.Insert(0, floatValue); buffer.Insert(0, " ");
• var buffer = new StringBuilder(); • buffer.Insert(0, doubleValue); buffer.Insert(0, " ");
• Console.WriteLine($"buffer after Inserts: \n{buffer}\n");
• buffer.Remove(10, 1); // delete 2 in 2.5
• buffer.Remove(4, 4); // delete .333 in 33.333
• Console.WriteLine($"buffer after Removes:\n{buffer}");
• Another useful method included with StringBuilder is
Replace, which searches for a specified string or
character and substitutes another string or
character all occurrences.

– var builder1 = new StringBuilder("Happy Birthday Jane");


– var builder2 = new StringBuilder("goodbye greg");
– Console.WriteLine($"Before replacements:
\n{builder1}\n{builder2}");
– builder1.Replace("Jane", "Greg");
– builder2.Replace('g', 'G', 0, 5);//replace g by G if the char is found in the index spacefied
i.e. 0 - 5
– Console.WriteLine($"\nAfter replacements:
\n{builder1}\n{builder2}");
Char Methods
• All struct(structure) types derive from class ValueType, which
derives from object. Also, all struct types are implicitly
sealed(cannot be inherited).
• In the struct System.Char—which is the struct for characters and
represented by C# keyword char—most methods are static, take
at least one character argument and perform either a test or a
manipulation on the character.
• We present several of these in the next example. Figure 16.15
demonstrates static methods that test characters to determine
whether they’re of a specific character type and static methods
that perform case conversions on characters.
Regular Expressions

C#
Define Regular Expressions
• Regular expressions provide a powerful,
flexible, and efficient method for processing
text.
• The extensive pattern-matching notation of
regular expressions enables you to quickly
parse large amounts of text to:
– Find specific character patterns.
– Validate text to ensure that it matches a
predefined pattern (such as an email address).
– Extract, edit, replace, or delete text substrings.
– Add extracted strings to a collection in order to
generate a report.
Real life applications of Regular
Expressions
• Extracting emails from a document: A lot of times, the
sales and marketing teams might require
finding/extracting emails and other contact information
from large text documents.
• Regular Expressions for Web Scraping (Data Collection):
One can simply scrape websites like Wikipedia etc. to
collect/generate data. But web scraping has its own
issues – the downloaded data is usually messy and full
of noise. This is where Regex can be used effectively.
• Working with Date-Time features: regex enables you to
work with different date formats
• Using Regex for Text Pre-processing (NLP): removing
inconsistent data when working with data collected
either manually or web scraped data.
Other applications of Regex
• Regular expressions are specially formatted
strings:
– Used to find patters in text, and
– During information validation (data is in a particular
format).
– E.g. the first three symbol of student id must be
alphabet.
– Last name must start with a capital letter.
• Application of regular expression – to facilitate
the construction of a compiler.
– Large and complex reg exp – used to validate the
syntax of a program.
• In .net classes to recognize and manipulate regular
e x p r e s s i o n s a r e f o u n d i n
System.Text.RegularExpressions namespace.
How does Regular Expression work?
• The centerpiece of text processing with regular
expressions is the regular expression engine,
which is represented by the
System.Text.RegularExpressions.Regex object i
n .NET.
• Processing text using regular expressions
requires that the regular expression engine be
p rov ided with the f ollowing two item s of
information:
– The regular expression pattern to identify in the text.
– The text to parse for the regular expression pattern.
• 
Examples: Replace substrings
string pattern = @"(Mr\.? |Mrs\.? |Miss |Ms\.? )";
string[] names = { "Mr. Henry Hunt", "Ms. Sara
Samuels", "Abraham Adams", "Ms. Nicole Norris"
};
foreach (string name in names)
{Console.WriteLine(Regex.Replace(name, pattern,
String.Empty)); }
Output
Henry Hunt
Sara Samuels
Abraham Adams
Methods in Regex
• Class Regex – represent an immutable
regular expression.
– Contains static methods - such as
• Match() that returns an object of class Match
(represents a single regular expression match).
• Matches() finds all matches of a regluar expression
in an arbitrary string and returns a MatchCollection
object (set of Matches).
• IsMatch() Indicates whether the regular expression
specified in the Regex constructor finds a match in
the specified input string
• Other methods include Replace() Split()
Cont
• (see table in next slide) some character classes that can
be used with regular expressions.
• A character class is an escape sequence that represents
a group of characters.
– A word character is any alphanumeric character or
underscore.
– A whitespace character is a space, a tab, a carriage
return, a newline or a form feed.
– A digit is any numeric character.
• Regular expressions are not limited to these character
classes, however.
• The expressions employ various operators and other
forms of notation to search for complex patterns.
• We discuss several of these techniques in the context of
• Elements of RegularExpressions
• [abc] a,b or c
• [a-z] a to z
• [A-Z] A to Z
• [a-zA-z] a to z A to Z
• [0-9] digits 0 to 9
Elements of Regex text
• Quantifier
• [ ]? 0 or 1 time
• []+ occurs one or more time
• []* occurs zero or more time
• []{n} occurs n times
• []{n,m} occurs in the range of n to m
• The regular expression in line 19 (see also below)
searches for a string that starts with the letter "J",
followed by any number of characters, followed by a two
-digit number (of which the second digit cannot be 4),
followed by a dash, another two-digit number, a dash
and another two-digit number.
19
Method Replace() and Split() of
Regex
• Regex class provides static and instance versions of
methods Replace and Split.
– Replace() – is useful to replace parts of a string with another,
and,
– Split() – is useful to split a string according to a regular
expression.
Replace() method
• Method Replace replaces text in a string with new text wherever the
original string matches a regular expression. It has two version –
static and instance method
• Static version of Replace()
• Takes three parameters—the string to modify, the string containing
the regular expression to match and the replacement string.
• Replace replaces every instance of "*" in testString1 with "^".
– Notice the regular expression (@"\*") precedes character * with a backslash, \.
• Normally, * is a quantifier indicating that a regular expression should
match any number of occurrences of a preceding pattern.
• Using Replace() instance method that uses the regular
expression passed to the constructor for testRegex1 to
perform the replacement operation. In this case, every
match for the regular expression "stars" in testString1 is
replaced with "carets".
• Use of instance method Replace() to
Split() method of Regex
• Method Split divides a string into several substrings. The original
string is broken in any location that matches a specified regular
expression.
• Method Split returns an array containing the substrings between
matches for the regular expression.
• We use the static version of method Split to separate a strings that
are -separated by comma.
Exercise
• Enter a character between a and z
• Enter a character between a and z and
with length of 2
• Enter a character between a and z and
with length spanning from 1 up to 3
• Validate data with 8 digits fixed format
• Validate invoice number that has the
following format the first three are
characters followed by 8 digits

You might also like