C# Substring Programs: Getting First Part
C# Substring Programs: Getting First Part
You want to extract several characters from your C# string as another string, which is called taking a substring. There are two overloaded Substring methods on string, which are ideal for getting parts of strings. This document contains several examples and a useful Substring benchmark, using the C# programming language.
=== Substring benchmark that tests creation time (C#) === Based on .NET Framework 3.5 SP1.
using System;
// Get first three characters string sub = input.Substring(0, 3); Console.WriteLine("Substring: {0}", sub); } }
Substring: One
Description. The Substring method is an instance method on the string class, which means you must have a non-null string to use it without triggering an exception. This program will extract the first three characters into a new string reference, which is separately allocated on the managed heap.
using System;
class Program { static void Main() { string input = "OneTwoThree"; // Indexes: // 0:'O' // 1:'n' // 2:'e' // 3:'T'
// 4:'w' ...
Substring: TwoThree
Description. The program text describes logic that takes all the characters in the input string excluding the first three. The end result is that you extract the last several characters. The Substring method internally causes the runtime to allocate a new string on the managed heap.
using System;
} }
Substring: Two
Description of parameters. The two parameters in the example say, "I want the substring at index 3 with a length of three." Essentially, the third through sixth characters. The program then displays the resulting string that is pointed to by the string reference 'sub'.
Slicing strings
Here we note that you can add an extension method to "slice" strings as is possible in languages such as JavaScript. The Substring method in C# doesn't use the same semantics as the Slice method from JavaScript and Python. However, you can develop an extension method that fills this need efficiently.
See String Slice.
using System;
Substring: OneTwo
MSDN research
Here we note some reference material on the MSDN website provided by Microsoft. The Substring articles I found on MSDN are really awful and not nearly as nice as this document. They do not say anything that you cannot find from Visual Studio's IntelliSense.
Visit msdn.microsoft.com.
Exceptions raised
Here we look at exceptions that can be raised when the Substring instance method on the string type is called with incorrect parameters. Here we see an example where I trigger the ArgumentOutOfRangeException. When you try to go beyond the string length, or use a parameter < 0, you get the ArgumentOutOfRangeException from the internal method InternalSubStringWithChecks.
=== Program that shows Substring exceptions (C#) ===
using System;
System.ArgumentOutOfRangeException System.String.InternalSubStringWithChecks
System.ArgumentOutOfRangeException System.String.InternalSubStringWithChecks
Benchmark
Here I wanted to see if taking characters and putting them into a char[] array could be faster than calling Substring. My result was that Substring is faster. However, if you want to extract only certain
characters, consider the char[] approach shown. This benchmark is based on .NET 3.5 SP1.
=== Data tested ===
char[] c = new char[3]; c[0] = s[3]; c[1] = s[4]; c[2] = s[5]; string x = new string(c); // "two" if (x == null) { }
=== Substring benchmark result === Substring was faster. See figures at top.
Benchmark notes. The above code is simply a benchmark you can run in Visual Studio to see the performance difference of Substring and char[] arrays. It is best to use Substring when it has equivalent
behavior. This site contains a useful benchmarking harness located in the "performance" section.
Summary
Here we saw several examples concentrated on the Substring instance method with one or two parameters on the string type in the C# programming language. Additionally, we saw where to research Substring on MSDN, information about Slice, Substring exceptions, and a benchmark of Substring. Substring is very useful and can help simplify your programs, without significant performance problems. Combine it with IndexOf and Split for powerful string handling.
You want to split strings on different characters with single character or string delimiters. For example, split a string that contains "\r\n" sequences, which are Windows newlines. Through these examples, we learn ways to use the Split method on the string type in the C# programming language.
Use the Split method to separate parts from a string. If your input string is A,B,C -Split on the comma to get an array of: "A" "B" "C"
Using Split
To begin, we look at the basic Split method overload. You already know the general way to do this, but it is good to see the basic syntax before we move on. This example splits on a single character.
=== Example program for splitting on spaces (C#) ===
using System;
class Program { static void Main() { string s = "there is a cat"; // // Split string on spaces. // ... This will separate all the words. // string[] words = s.Split(' '); foreach (string word in words) { Console.WriteLine(word); } } }
there is a cat
Description. The input string, which contains four words, is split on spaces and the foreach loop then displays each word. The result value from Split is a string[] array.
Multiple characters
Here we use either the Regex method or the C# new array syntax. Note that a new char array is created in the following usages. There
is an overloaded method with that signature if you need StringSplitOptions, which is used to remove empty strings.
=== Program that splits on lines with Regex (C#) ===
class Program { static void Main() { string value = "cat\r\ndog\r\nanimal\r\nperson"; // // Split the string on line breaks. // ... The return value from Split is a string[] array. // string[] lines = Regex.Split(value, "\r\n");
StringSplitOptions
While the Regex type methods can be used to Split strings effectively, the string type Split method is faster in many cases. The Regex Split method is static; the string Split method is instancebased. The next example shows how you can specify an array as the first parameter to string Split.
=== Program that splits on multiple characters (C#) ===
using System;
class Program { static void Main() { // // This string is also separated by Windows line breaks. // string value = "shirt\r\ndress\r\npants\r\njacket";
// // Use a new char[] array of two characters (\r and \n) to break // lines from into separate strings. Use "RemoveEmptyEntries" // to make sure no empty strings get put in the string[] array. // char[] delimiters = new char[] { '\r', '\n' }; string[] parts = value.Split(delimiters, StringSplitOptions.RemoveEmptyEntries); for (int i = 0; i < parts.Length; i++) { Console.WriteLine(parts[i]);
// // Same as the previous example, but uses a new string of 2 characters. // parts = value.Split(new string[] { "\r\n" }, StringSplitOptions.None); for (int i = 0; i < parts.Length; i++) { Console.WriteLine(parts[i]); } } }
Overview. One useful overload of Split receives char[] arrays. The string Split method can receive a character array as the first parameter. Each char in the array designates a new block. Using string arrays. Another overload of Split receives string[] arrays. This means string array can also be passed to the Split method. The new string[] array is created inline with the Split call. Explanation of StringSplitOptions. The RemoveEmptyEntries enum is specified. When two delimiters are adjacent, we end up with an empty result. We can use this as the second parameter to
avoid this. The following screenshot shows the Visual Studio debugger.
See StringSplitOptions Enumeration.
Separating words
Here we see how you can separate words with Split. Usually, the best way to separate words is to use a Regex that specifies nonword chars. This example separates words in a string based on nonword characters. It eliminates punctuation and whitespace from the return array.
=== Program that separates on non-word pattern (C#) ===
class Program { static void Main() { string[] w = SplitWords("That is a cute cat, man"); foreach (string s in w) { Console.WriteLine(s); }
Console.ReadLine(); }
/// <summary> /// Take all the words in the input string and separate them. /// </summary> static string[] SplitWords(string s) { // // Split on all non-word characters. // ... Returns an array of all the words. // return Regex.Split(s, @"\W+"); // @ // \W+ } } special verbatim string syntax one or more non-word characters together
Word splitting example. Here you can separate parts of your input string based on any character set or range with Regex. Overall, this provides more power than the string Split methods.
See Regex.Split Method Examples.
Dog,Cat,Mouse,Fish,Cow,Horse,Hyena Programmer,Wizard,CEO,Rancher,Clerk,Farmer
class Program { static void Main() { int i = 0; foreach (string line in File.ReadAllLines("TextFile1.txt")) { string[] parts = line.Split(','); foreach (string part in parts) { Console.WriteLine("{0}:{1}", i, part);
0:Dog 0:Cat 0:Mouse 0:Fish 0:Cow 0:Horse 0:Hyena 1:Programmer 1:Wizard 1:CEO 1:Rancher 1:Clerk 1:Farmer
using System;
class Program { static void Main() { // The directory from Windows const string dir = @"C:\Users\Sam\Documents\Perls\Main"; // Split on directory separator string[] parts = dir.Split('\\'); foreach (string part in parts) { Console.WriteLine(part); } } }
Internal logic
The logic internal to the .NET framework for Split is implemented in managed code. The methods call into the overload with three parameters. The parameters are next checked for validity. Finally, it uses unsafe code to create the separator list, and then a for loop combined with Substring to return the array.
Benchmarks
I tested a long string and a short string, having 40 and 1200 chars. String splitting speed varies on the type of strings. The length of the blocks, number of delimiters, and total size of the string factor into performance. The Regex.Split option generally performed the worst. I felt that the second or third methods would be the best, after observing performance problems with regular expressions in other situations.
=== Strings used in test (C#) === // // Build long string. // _test = string.Empty; for (int i = 0; i < 120; i++) { _test += "01234567\r\n"; } // // Build short string. // _test = string.Empty; for (int i = 0; i < 10; i++) { _test += "ab\r\n"; }
static void Test2() { string[] arr = _test.Split(new char[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries); }
Longer strings: 1200 chars. The benchmark for the methods on the long strings is more even. It may be that for very long strings, such as entire files, the Regex method is equivalent or even faster. For short strings, Regex is slowest, but for long strings it is very fast.
=== Benchmark of Split on long strings ===
434 ms 63 ms [fastest] 83 ms
Short strings: 40 chars. This shows the three methods compared to each other on short strings. Method 1 is the Regex method, and it is by far the slowest on the short strings. This may be because of the compilation time. Smaller is better. This article was last updated for .NET 3.5 SP1.
Performance recommendation. For programs that use shorter strings, the methods that split based on arrays are faster and simpler, and they will avoid Regex compilation. For somewhat longer strings or files that contain more lines, Regex is appropriate. Also, I show some Split improvements that can improve your program.
See Split String Improvement.
Escaped characters
Here we note that you can use Replace on your string input to substitute special characters in for any escaped characters. This can solve lots of problems on parsing computer-generated code or data.
See Split Method and Escape Characters.
Delimiter arrays
In this section, we focus on how you can specify delimiters to the Split method in the C# language. My further research into Split and its performance shows that it is worthwhile to declare your char[] array you are splitting on as a local instance to reduce memory pressure and improve runtime performance. There is another example of delimiter array allocation on this site.
See Split Delimiter Use.
=== Slow version, before (C#) ===
// // Split on multiple characters using new char[] inline. // string t = "string to split, ok";
for (int i = 0; i < 10000000; i++) { string[] s = t.Split(new char[] { ' ', ',' }); }
// // Split on multiple characters using new char[] already created. // string t = "string to split, ok"; char[] c = new char[]{ ' ', ',' }; // <-- Cache this
Interpretation. We see that storing the array of delimiters separately is good. My measurements show the above code is less than 10% faster when the array is stored outside the loop.
Explode
In this part, we discuss the explode function from the PHP environment. The .NET Framework has no explode method exactly like PHP explode, but you can gain the functionality quite easily with Split, for the most part. You can replace explode with the Split method that receives a string[] array. Explode allows you to split strings based on a fixed size. The new article on this topic implements the logic in the C# language directly.
See Explode String Extension Method.
Summary
In this tutorial, we saw several examples and two benchmarks of the Split method in the C# programming language. You can use Split to divide or separate your strings while keeping your code as simple as possible. Sometimes, using IndexOf and Substring together to parse your strings can be more precise and less errorprone.