0% found this document useful (0 votes)
28 views62 pages

Chapter 2 Part 2

Uploaded by

beshahashenafe20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views62 pages

Chapter 2 Part 2

Uploaded by

beshahashenafe20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

Strings, StringBuilders

& Regular Expressions

Chapter 2 - Part 2
Contents

●String class, constructors, properties, and methods


●Stringbuilder class in C#
●Regular Expressions
Strings in C#

● A string is a series of characters treated as a single unit.


● These characters can be uppercase letters, lowercase letters, digits and
various special characters, such as +, -, *, /, $ and others.
● A string is an object of class String in the System namespace.
● A declaration can assign a string literal to a string reference. The
declaration is as follows:
String Constructors
● You can create strings with the various constructors provided by C#

birthday
day
CCCCC
Cont.
● Line 24 assigns to string2 a new string, using the String constructor that takes a
character array as an argument.
● The new string contains a copy of the characters in array characterArray.
● Line 25 assigns to string3 a new string, using the String constructor that takes a char
array and two int arguments. The second argument specifies the starting index
position (the offset) from which characters in the array are copied. The third
argument specifies the number of characters (the count) to be copied from the
specified starting position in the array.
● Line 26 assigns to string4 a new string, using the String constructor that takes as
arguments a character and an int specifying the number of times to repeat that
character in the string.
Cont.

● Memory representation of strings


○ If there are multiple occurrences of the same string literal object in an
application, a single copy of the string literal object will be located from
each location in the program that uses the string literal. You can use the (==)
or equals() to check if two string objects references the same string literal.
Verbatim String

● Verbatim means "as is".


● The @ special character serves as a verbatim identifier. You use it in
the following ways:
● To indicate that a string literal is to be interpreted verbatim. The @
character in this instance defines a verbatim string literal.
● Example: the following line of code produces a syntax error indicating
that there is an unrecognized escape sequence.
Cont.

● Adding the verbatim @ character eliminates the problem


String Indexer, Length Property & CopyTo method

● String indexer facilitates the retrieval of any character in the string,


● The String property Length, returns the length of the string.
● The String method CopyTo copies a specified number of characters from
a string into a char array.
● See examples on the following slides.
Demo App: Accessing each string element using Indexers
Using the Length Property of String
CopyTo() Method
● The String method CopyTo copies a specified number of characters from
a string into a char array. Assuming that a char array is declared outside
of the combobox_indexchanged event handler.
Comparing Strings
● C# provides various methods to compare Strings.
● The process of comparing strings is akin to the system of alphabetizing a
series of last names
● You as a reader would place “Abebe” before “Kebede” because the first letter
of “Abebe” comes before the letter of “Kebede” in the alphabet.
● This indicates that the alphabet system is not just a collection of letters but
also an ordered list of characters.
● Therefore the letter “Z” is not just an letter in the alphabet: Z is specifically
the 26th letter of the alphabet.
● Computers can order characters alphabetically because the characters are
represented internally as Unicode numeric codes. When comparing two
strings, C# simply compares the numeric codes of the characters in the strings.
Comparing strings with Equals()

● Method Equals (inherited by String from class Object) tests any two
objects for equality (i.e., checks whether the objects contain identical
contents).
● The method returns true if the objects are equal and false otherwise.
● Method Equals compares the numeric Unicode values that represent
the characters in each string.
● A comparison of the string "hello" with the string "HELLO" would
return false, because the numeric representations of lowercase
letters are different from the numeric representations of
corresponding uppercase letters.
Demo App
Compare strings using CompareTo()

● Another method for comparing methods is CompareTo().


● CompareTo() class in C# compares two strings and returns an integer that
indicates their relative order in terms of sorting.
● Method CompareTo returns 0 if the strings are equal
● -1 if the string that invokes CompareTo is less than the string that is
passed as an argument
● 1 if the string that invokes CompareTo is greater than the string that is
passed as an argument
● Method CompareTo also uses a lexicographical comparison.
CompareTo() Demo
Methods StartsWith() & EndsWith()

● The two methods shows how to test whether a string instance begins or ends with a
given string.
● Method StartsWith determines whether a string instance starts with the string text
passed to it as an argument.
● Method EndsWith determines whether a string instance ends with the string text passed
to it as an argument.
● The Methods return either true or false accordingly.
Demo App
Locating Strings and Substrings in Strings

● In many applications, it is necessary to search for a character or set of characters in a


string.
● For example, a programmer creating a word processor would want to provide
capabilities for searching through documents.
● String methods IndexOf, IndexOfAny, LastIndexOf and LastIndexOfAny search for a
specified character or substring in a string
● IndexOf returns the index of the specified character in the string; otherwise,
IndexOf returns –1.
● IndexOf() has three overloaded versions.
● IndexOf(char), IndexOf(char, startingindex<int>), IndexOf(char,
startingindex<int>,no_of_chars<int>)
Demo App IndexOf(char)

private void comboBox4_SelectedIndexChanged(object sender,


EventArgs e)

decimal index = numericUpDown1.Value;

char c = Convert.ToChar(textBox6.Text);

string s6 = textBox5.Text;

if(comboBox4.SelectedIndex ==0)

MessageBox.Show( s6.IndexOf(c).ToString());

}
Demo App IndexOf(char, starting_index<int>)

if(comboBox4.SelectedIndex ==1)

MessageBox.Show(s6.IndexOf(c, 1).ToString());

}
LastIndexOf()

● The method returns the index position of the last occurrence of a


specified character or string within the given string.
○ If method LastIndexOf finds the character, LastIndexOf returns the index of
the specified character in the string; otherwise, LastIndexOf returns –1.
● There are three versions of LastIndexOf .
○ LastIndexOf that takes as an argument the character for which to
search.
○ LastIndexOf that takes two arguments—the character for which to
search and the highest index from which to begin searching backward
for the character.
○ A third version of method LastIndexOf that takes three arguments—
the character for which to search, the starting index from which to
start searching backward and the number of characters (the portion
of the string) t
Demo App
private void comboBox4_SelectedIndexChanged(object sender, EventArgs e) {

decimal index = numericUpDown1.Value;

char c = Convert.ToChar(textBox6.Text);

string s6 = textBox5.Text;

if(comboBox4.SelectedIndex ==0) {

MessageBox.Show( s6.IndexOf(c).ToString());

if(comboBox4.SelectedIndex ==1) {

MessageBox.Show(s6.IndexOf(c, 1).ToString()); }

if (comboBox4.SelectedIndex ==3) {

MessageBox.Show(null,
s6.LastIndexOf(c).ToString(),"Information",MessageBoxButtons.OK);

}
Extracting Substrings from Strings
● Line 19 uses the Substring method that takes one int argument. The
argument specifies the starting index from which the method copies
characters in the original string.
○ The substring returned contains a copy of the characters from the starting index to
the end of the string.
Demo
private void comboBox4_SelectedIndexChanged(object sender, EventArgs e)

{ int s_index = Convert.ToInt32(numericUpDown1.Value);

int e_index = Convert.ToInt32(numericUpDown2.Value);

char c = Convert.ToChar(textBox6.Text);

string s6 = textBox5.Text;

if (comboBox4.SelectedIndex == 0) { {

MessageBox.Show(s6.IndexOf(c).ToString()); }

if (comboBox4.SelectedIndex == 1) {

MessageBox.Show(s6.IndexOf(c, 1).ToString()); }

if (comboBox4.SelectedIndex == 3) {

MessageBox.Show(null, s6.LastIndexOf(c).ToString(), "Information", MessageBoxButtons.OK); }

if (comboBox4.SelectedIndex == 4) {

MessageBox.Show(null, "Substring is " + s6.Substring(s_index, e_index), "Information",


MessageBoxButtons.OK);
Cont.

● In the above example, the Substring method takes two int


arguments. The first argument specifies the starting index from
which the method copies characters from the original string. The
second argument specifies the length of the substring to be copied.
The substring returned contains a copy of the specified characters
from the original string.
Miscellaneous String Methods
● ToUpper()
○ Returns a new string with all the characters converted to uppercase

● ToLower()
○ Returns a new string with all the characters converted to lowercase

● Concat(string str0, string str1)


○ Concatenates two string objects.

● Copy(string str)
○ Creates a new String object with the same value as the specified string.

● Contains (string value)


○ Returns a value(true or false) indicating whether the specified String object occurs within this string.

10/29/2021 28
Cont.
● Replace(char oldChar, char newChar)
○ Replaces all occurrences of a specified Unicode character in the current string object with
the specified Unicode character and returns the new string.
● Replace(string oldValue, string newValue)
○ Replaces all occurrences of a specified string in the current string object with the specified
string and returns the new string.
● Trim()
○ Removes all leading and trailing white-space characters from the current String object
● Split(params char[] separator)
○ Returns a string array that contains the substrings in the current string object, delimited by
elements of a specified Unicode character array.

10/29/2021 29
Example

10/29/2021 30
StringBuilder
StringBuilder explained

● String class has many capabilities for processing strings. However a string’s contents can
never change – immutable.
● Eg. Concatenation of string (+=) - create new string and assigns its reference to the
variable.
● Class StringBuilder - used to create and manipulate dynamic string information—i.e.,
mutable (changeable).
● Every StringBuilder can store a certain number of characters that’s specified by its
capacity. Exceeding the capacity of a StringBuilder causes the capacity to expand to
accommodate the additional characters.
○ E.g concatenation method such as Append and AppendFormat – maintain without creating any
new string objects.
❖ StringBuilder is particularly useful for manipulating in place a large number of strings, as
it’s much more efficient than creating individual immutable strings.
StringBuilder Constructors
● Class StringBuilder provides six overloaded constructors.
○ E.g. var buffer1 = new StringBuilder(); // with default initial capacity
○ var buffer2 = new StringBuilder(10); //initial capacity spacified in int
○ var buffer3 = new StringBuilder("hello");// initialized with string content
● Output of:
○ Console.WriteLine($"buffer1 = \"{buffer1}\""); // buffer1 = “ “
Length and Capacity Properties, EnsureCapacity Method and Indexer of Class
StringBuilder

● Property - Length and Capacity


○ Length - return the number of characters currently in a StringBuilder, and
○ Capacity – return the number of characters that a StringBuilder can store without
allocating more memory.
● used to increase or decrease the length or the capacity of the StringBuilder.
● Method - EnsureCapacity
○ allows to reduce the number of times that a StringBuilder’s capacity must be increased.
○ The method ensures that the StringBuilder’s capacity is at least the specified
value.
var buffer = new StringBuilder("Hello, how are you?");
// use Length and Capacity properties
Console.WriteLine($"buffer = {buffer}" + $"\nLength = { }" + $"\nCapacity = { }");
buffer.EnsureCapacity(75);
Console.WriteLine($"\nNew capacity = { }");
// truncate StringBuilder by setting Length property
buffer.Length=10;
Console.Write($"New length = { buffer.Length}\n\nbuffer = "); // use
StringBuilder indexer
for (int i = 0; i < ; ++i)
{
Console.Write(buffer[i] );
} Console.WriteLine();
Append and AppendFormat Methods of Class StringBuilder

● Class StringBuilder provides overloaded Append methods that allow various types
of values to be added to the end of a StringBuilder.
● The Framework Class Library provides versions for each simple type and for
character arrays, strings and objects. (Remember that method ToString produces a
string representation of any object.)
● Each method takes an argument, converts it to a string and appends it to the
StringBuilder.
● object objectValue = "hello";
• // use method Append to append values to buffer
● var stringValue = "good bye";
● char[] characterArray = {'a', 'b', 'c', 'd', 'e', 'f'}; • buffer.Append(objectValue); buffer.Append(" ");
● var booleanValue = true; • buffer.Append(stringValue); buffer.Append(" ");
● var characterValue = 'Z';
• buffer.Append(characterArray); buffer.Append(" ");
● var integerValue = 7;
• buffer.Append(characterArray, 0, 3); buffer.Append("
● var longValue = 1000000L; // L suffix indicates a
");
long literal
● var floatValue = 2.5F; // F suffix indicates a float • buffer.Append(booleanValue); buffer.Append(" ");
literal • buffer.Append(characterValue); buffer.Append(" ");
● var doubleValue = 33.333;
• buffer.Append(integerValue); buffer.Append(" ");
● var buffer = new StringBuilder();
• buffer.Append(longValue); buffer.Append(" ");

• buffer.Append(floatValue); buffer.Append(" ");

• buffer.Append(doubleValue);

• Console.WriteLine($"buffer = {buffer.ToString()}");
Insert, Remove and Replace Methods of Class
StringBuilder

● Class StringBuilder provides overloaded Insert methods


○ to allow various types of data to be inserted at any position in a
StringBuilder.
● The class provides versions for each simple type and for character arrays, strings
and objects.
● Each method takes its second argument, converts it to a string and inserts
the string into the StringBuilder in front of the character in the position
specified by the first argument.
● The index specified by the first argument must be greater than or equal to
0 and less than the StringBuilder’s length; otherwise, the program throws
an ArgumentOutOfRangeException.
Cont.

● Class StringBuilder also provides method Remove for deleting any portion of a
StringBuilder.
● Method Remove takes two arguments—the index at which to begin deletion
and the number of characters to delete.
● The sum of the starting index and the number of characters to be deleted
must always be less than the StringBuilder’s length; otherwise, the program
throws an ArgumentOutOfRangeException.
The Insert and Remove methods are demonstrated
• buffer.Insert(0, objectValue); buffer.Insert(0, " ");
● object objectValue = "hello"; • buffer.Insert(0, stringValue); buffer.Insert(0, " ");
● var stringValue = "good bye"; • buffer.Insert(0, characterArray); buffer.Insert(0, " ");
● char[] characterArray = {'a', 'b', 'c', 'd', 'e', 'f'};
• buffer.Insert(0, booleanValue); buffer.Insert(0, " ");
● var booleanValue = true;
● var characterValue = 'K'; • buffer.Insert(0, characterValue); buffer.Insert(0, " ");
● var integerValue = 7; • buffer.Insert(0, integerValue); buffer.Insert(0, " ");
● var longValue = 1000000L; // L suffix indicates a long literal
• buffer.Insert(0, longValue); buffer.Insert(0, " ");
● var floatValue = 2.5F; // F suffix indicates a float literal
● var doubleValue = 33.333; • buffer.Insert(0, floatValue); buffer.Insert(0, " ");
● var buffer = new StringBuilder(); • buffer.Insert(0, doubleValue); buffer.Insert(0, " ");
• Console.WriteLine($"buffer after Inserts: \n{buffer}\n");
• buffer.Remove(10, 1); // delete 2 in 2.5
• buffer.Remove(4, 4); // delete .333 in 33.333
• Console.WriteLine($"buffer after Removes:\n{buffer}");
● Another useful method included with StringBuilder is Replace, which searches for a
specified string or character and substitutes another string or character all occurrences.
○ var builder1 = new StringBuilder("Happy Birthday Jane");
○ var builder2 = new StringBuilder("goodbye greg");
○ Console.WriteLine($"Before replacements:\n{builder1}\n{builder2}");
○ builder1.Replace("Jane", "Greg");
○ builder2.Replace('g', 'G', 0, 5);//replace g by G if the char is found in the index specified
i.e. 0 - 5
○ Console.WriteLine($"\nAfter replacements:\n{builder1}\n{builder2}");
Regular Expressions
● Define Regular Expression(RegEx)
● Applications of RegEx
● How to design a RegEx expression
RegEx Explained

● Regular expressions are specially formatted strings used to find


patterns in text and can be useful during information validation, to
ensure that data is in a particular format.
● Regular expressions provide a powerful, flexible, and efficient
method for processing text.
● There are numerous applications of Regular Expressions
Applications of RegEx
● Compiler
○ RegEx facilitate the construction of a compiler. a large and
complex regular expression is used to validate the syntax of a
program.If the program code does not match the regular
expression, the compiler knows that there is a syntax error
within the code.
● Input Validation: Validate text to ensure that it matches a
predefined pattern (such as an email address).
● Find specific character patterns.
● Extract, edit, replace, or delete text substrings.
● Add extracted strings to a collection in order to generate a report.
Other real life applications of RegEx
● Extracting emails from a document: A lot of times, the sales and marketing
teams might require finding/extracting emails and other contact information
from large text documents.
● Regular Expressions for Web Scraping (Data Collection): One can simply scrape
websites like Wikipedia etc. to collect/generate data. But web scraping has its
own issues –the downloaded data is usually messy and full of noise. This is
where Regex can be used effectively.
● Working with Date-Time features: regex enables you to work with different
date formats
● Using Regex for Text Pre-processing (NLP): removing inconsistent data when
working with data collected either manually or web scraped data.
How do we design apps with RegEx

● The centerpiece of text processing with regular expressions is the regular


expression engine, which is represented by the System.Text.RegularExpressions.Regex
object in .NET.
● Processing text using regular expressions requires that the regular expression
engine be provided with the following two items of information:
○ The regular expression pattern to identify in the text.
○ The text to parse for the regular expression pattern.
Cont.

The .NET Framework provides several classes to help developers recognize and
manipulate regular expressions.

Class Regex (System.Text.RegularExpressions namespace) represents an immutable


regular expression.
Methods in Regex
● Class Regex – represent an immutable regular expression. Contains static
methods - such as
■ Match() that returns an object of class Match (represents a single regular
expression match).
■ Matches() finds all matches of a regluar expression in an arbitrary string and
returns a MatchCollection object (set of Matches).
■ IsMatch() Indicates whether the regular expression specified in the Regex
constructor finds a match in the specified input string
■ Other methods include Replace() Split()
Examples: Using RegEx Method Replace ()
string pattern = @"(Mr\.? |Mrs\.? |Miss |Ms\.? )";
string[] names = { "Mr. Henry Hunt", "Ms. Sara Samuels", "Abraham
Adams", "Ms. Nicole Norris" };
foreach (string name in names)
{Console.WriteLine(Regex.Replace(name, pattern, String.Empty)); }
Output
Henry Hunt
Sara Samuels
Abraham Adams
Nicole Norris
Method Replace() and Split() of Regex

● Regex class provides static and instance versions of


methods Replace and Split.
○ Replace() – is useful to replace parts of a string with another, and,
○ Split() – is useful to split a string according to a regular expression.
Replace() method
● Method Replace replaces text in a string with new text wherever the original
string matches a regular expression. It has two version – static and instance
method
● Static version of Replace()
● Takes three parameters—the string to modify, the string containing the regular
expression to match and the replacement string.
● Replace replaces every instance of "*" in testString1 with "^".
○ Notice the regular expression (@"\*") precedes character * with a backslash, \.

● Normally, * is a quantifier indicating that a regular expression should match any


number of occurrences of a preceding pattern.
● Using Replace() instance method that uses the regular
expression passed to the constructor for testRegex1 to
perform the replacement operation. In this case, every
match for the regular expression "stars" in testString1 is
replaced with "carets".
● Use of instance method Replace() to
Split() method of Regex

● Method Split divides a string into several substrings. The original


string is broken in any location that matches a specified regular
expression.
● Method Split returns an array containing the substrings between
matches for the regular expression.
● We use the static version of method Split to separate a strings that
are -separated by comma.
Demo App
private void comboBox1_SelectedIndexChanged(object sender, EventArgs
e)

textBox1.Text = "ABC123fgh46koli";

string pattern = "[a-zA-Z]+";

string[] result = Regex.Split(textBox1.Text, pattern);

foreach (string s in result)

label11.Text += s + "\n";

}
Elements of a RegEx

• A character class is an escape sequence that represents a


group of characters.
– A word character is any alphanumeric character or
underscore.
– A whitespace character is a space, a tab, a carriage return,
a newline or a form feed.
– A digit is any numeric character.
● \-> Marks the next character as either a special character or escapes a literal. For
example, "n" matches the character "n". "\n" matches a newline character. The
sequence "\\" matches "\" and "\(" matches "(".
● ^ -> first character in a string(starts with)
● $-> ends with
● * -> Matches the preceding character zero or more times. For example, "zo*"
matches either "z" or "zoo".
● + -> Matches the preceding character one or more times. For example, "zo+"
matches "zoo" but not "z".
● ? -> Matches the preceding character zero or one time. For example, "a?ve?"
matches the "ve" in "never".
● . ->Matches any single character except a newline character.
Elements of RegularExpressions
1. [abc] a,b or c
2. [a-z] a to z
3. [A-Z] A to Z
4. [a-zA-z] a to z A to Z
5. [0-9] digits 0 to 9
Quantifiers of RegEx
● [ ]? 0 or 1 time
● []+ occurs one or more time
● []* occurs zero or more time
● []{n} occurs n times
● []{n,m} occurs in the range of n to m
Exercise on Regex

● Enter a character between a and z


○ [a-z]
● Enter a character between a and z and with length of 2
○ [a-z]{2}
● Enter a character between a and z and with length spanning from 1 up to 3
○ [a-z]{1,3}
● Validate data with 8 digits fixed format
○ [\d]{8}
Cont.

● Validate invoice number that has the following format the first three
are characters followed by 8 digits
READING ASSIGNMENT

● COMPARING STRINGS USING (==)


● Extract the numbers from date formats
● Sort array of strings in alphabetical order

You might also like