0% found this document useful (0 votes)
205 views

Convert Between Legacy Encodings and Unicode (C# Programming Guide)

This document discusses how to convert text files encoded in legacy encodings like ASCII to Unicode. It shows how to read a file containing Greek characters encoded in Windows Code Page 737, convert it to a Unicode string using the Encoding class, and write the string to a new file encoded as UTF-8. The example code demonstrates retrieving the original byte values, converting them to Unicode, and writing the converted string to a new file while preserving the text content.

Uploaded by

Kyle Daly
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
205 views

Convert Between Legacy Encodings and Unicode (C# Programming Guide)

This document discusses how to convert text files encoded in legacy encodings like ASCII to Unicode. It shows how to read a file containing Greek characters encoded in Windows Code Page 737, convert it to a Unicode string using the Encoding class, and write the string to a new file encoded as UTF-8. The example code demonstrates retrieving the original byte values, converting them to Unicode, and writing the converted string to a new file while preserving the text content.

Uploaded by

Kyle Daly
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

How to: Convert Between Legacy Encodings and Unicode (C# Program...

http://msdn.microsoft.com/en-us/library/cc488003

How to: Convert Between Legacy Encodings and Unicode (C# Programming Guide)
Visual Studio 2010 This topic has not yet been rated - Rate this topic In C#, all strings in memory are encoded as Unicode (UTF-16). When you bring data from storage into a string object, the data is automatically converted to UTF-16. If the data contains only ASCII values from 0 through 127, the conversion requires no extra effort on your part. However, if the source text contains extended ASCII byte values (128 through 255), the extended characters will be interpreted by default according to the current code page. To specify that the source text should be interpreted according to a different code page, use the System.Text.Encoding class as shown in the following example.

Example
The following example shows how to convert a text file that has been encoded in 8-bit ASCII, interpreting the source text according to Windows Code Page 737.

class ANSIToUnicode { static void Main() { // Create a file that contains the Greek work (psyche) when interpreted by using // code page 737 ((DOS) Greek). You can also create the file by using Character Map // to paste the characters into Microsoft Word and then "Save As" by using the DOS // (Greek) encoding. (Word will actually create a six-byte file by appending "\r\n" at the en System.IO.File.WriteAllBytes(@"greek.txt", new byte[] { 0xAF, 0xAC, 0xAE, 0x9E }); // Specify the code page to correctly interpret byte values Encoding encoding = Encoding.GetEncoding(737); //(DOS) Greek code page byte[] codePageValues = System.IO.File.ReadAllBytes(@"greek.txt"); // Same content is now encoded as UTF-16 string unicodeValues = encoding.GetString(codePageValues); // Show that the text content is still intact in Unicode string // (Add a reference to System.Windows.Forms.dll) System.Windows.Forms.MessageBox.Show(unicodeValues); // Same content "" is stored as UTF-8 System.IO.File.WriteAllText(@"greek_unicode.txt", unicodeValues); // Conversion is complete. Show the bytes to prove the conversion. Console.WriteLine("8-bit encoding byte values:"); foreach(byte b in codePageValues) Console.Write("{0:X}-", b); Console.WriteLine(); Console.WriteLine("Unicode values:"); string unicodeString = System.IO.File.ReadAllText("greek_unicode.txt"); System.Globalization.TextElementEnumerator enumerator =

1 of 2

5/25/2012 11:48 AM

How to: Convert Between Legacy Encodings and Unicode (C# Program...

http://msdn.microsoft.com/en-us/library/cc488003

System.Globalization.StringInfo.GetTextElementEnumerator(unicodeString); while(enumerator.MoveNext()) { string s = enumerator.GetTextElement(); int i = Char.ConvertToUtf32(s, 0); Console.Write("{0:X}-", i); } Console.WriteLine(); // Keep the console window open in debug mode. Console.Write("Press any key to exit."); Console.ReadKey();

} /* * Output: 8-bit encoding byte values: AF-AC-AE-9E Unicode values: 3C8-3C5-3C7-3B7 */

See Also
Other Resources Strings (C# Programming Guide)

Did you find this helpful?

Yes

No

Community Content
2012 Microsoft. All rights reserved.

2 of 2

5/25/2012 11:48 AM

You might also like