Extr 050
Extr 050
Figure 6.3:
The first page of the
ShowUnicode
application project has
a long list of sections of
Unicode characters
Notice how the code saves the number of the “page” in the Tag property of the items
of the ListView, an information used later to fill a page. As a user selects one of the
items, the application moves to the second page of the TabControl, filling its string
grid with the 256 characters of the section:
procedure TForm2.ListView1ItemClick(const Sender: TObject;
const AItem: TListViewItem);
var
I, NStart: Integer;
begin
NStart := AItem.Tag * 256;
for I := 0 to 255 do
begin
StringGrid1.Cells [I mod 16, I div 16] :=
IfThen (not Char(I + NStart).IsControl, Char (I + NStart), '');
end;
TabControl1.ActiveTab := TabItem2;
The IfThen function used in the code above is a two way test: If the condition passed
in the first parameter is true, the function returns the value of the second parameter;
if not, it returns the value of the third one. The test in the first parameter uses the
IsControl method of the Char type helper, to filter out non-printable control char-
acters.
note The IfThen function operates more or less like the ?: operator of most programming languages
based on the C syntax. There is a version for strings and a separate one for Integers. For the string
version you have to include the System.StrUtils unit, for the Integer version of IfThen the Sys-
tem.SysUtils unit.
The grid of Unicode characters produced by the application is visible in Figure 6.4.
Notice that the output varies depending on the ability of the selected font and the
specific operating system to display a given Unicode character.
Figure 6.4:
The second page of the
ShowUnicode
application project has
some of the actual
Unicode characters
note The Character unit also defines a TCharacter record, which is basically a collection of static class
functions, plus a number of global routines mapped to these method. These are older, deprecated
functions, given that now the preferred way to work on the Char type at the Unicode level is the
use of the class helper.
The unit also defines two interesting enumerated types. The first is called TUnicode-
Category and maps the various characters in broad categories like control, space,
uppercase or lowercase letter, decimal number, punctuation, math symbol, and
many more. The second enumeration is called TUnicodeBreak and defines the family
of the various spaces, hyphen, and breaks. If you are used to ASCII operations, this
is a big change.
Numbers in Unicode are not only the characters between 0 and 9; spaces are not
limited to the character #32; and so on for many other assumption of the (much
simpler) 256-elements alphabet.
The Char type helper has over 40 methods that comprise many different tests and
operations. They can be used for:
• Getting the numeric representation of the character (GetNumericValue).
• Asking for the category (GetUnicodeCategory) or checking it against one of the
various categories (IsLetterOrDigit, IsLetter, IsDigit, IsNumber, IsControl,
IsWhiteSpace, IsPunctuation, IsSymbol, and IsSeparator). I used the IsCon-
trol operation in the previous demo.
• Checking if it is lowercase or uppercase (IsLower and IsUpper) or converting it
(ToLower and ToUpper).
• Verifying if it is part of a UTF-16 surrogate pair (IsSurrogate, IsLowSurrogate,
and IsHighSurrogate) and convert surrogate pairs in various ways.
• Converting it to and from UTF32 (ConvertFromUtf32 and ConvertToUtf32) and
UCS4Char type (ToUCS4Char).
• Checking if it is part of a given list of characters (IsInArray).
Notice that some of these operations can be applied to the type as a whole, rather
than to a specific variable. In that can you have to call them using the Char type as
prefix, as in the second code snippet below.
To experiment a bit with these operations on Unicode characters, I've create an
application project called CharTest. One of the examples of this demo is the effect of
calling uppercase and lowercase operations on Unicode elements. In fact, the classic
UpCase function of the RTL works only for the base 26 English language characters
of the ANSI representation, while it fails some Unicode character that do have a spe-
cific uppercase representations (not all alphabets have the concept of uppercase, so
this is not a universal notion).
To test this scenario, in the CharTest application project I've added the following
snippet that tries to convert an accented letter to uppercase:
var
ch1: Char;
begin
ch1 := 'ù';
Show ('UpCase ù: ' + UpCase(ch1));
Show ('ToUpper ù: ' + ch1.ToUpper);