Extr 050

Uploaded by

skamelrech2020

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views4 pages

Extr 050

Uploaded by

skamelrech2020

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

162 - 06: All About Strings

ListItem.Text := 'Surrogate Code Points';

end;
end;

Figure 6.3:
The first page of the
ShowUnicode
application project has
a long list of sections of
Unicode characters

Notice how the code saves the number of the “page” in the Tag property of the items
of the ListView, an information used later to fill a page. As a user selects one of the
items, the application moves to the second page of the TabControl, filling its string
grid with the 256 characters of the section:
procedure TForm2.ListView1ItemClick(const Sender: TObject;
const AItem: TListViewItem);
var
I, NStart: Integer;
begin
NStart := AItem.Tag * 256;
for I := 0 to 255 do
begin
StringGrid1.Cells [I mod 16, I div 16] :=
IfThen (not Char(I + NStart).IsControl, Char (I + NStart), '');
end;
TabControl1.ActiveTab := TabItem2;

Marco Cantù, Object Pascal Handbook

06: All About Strings - 163

The IfThen function used in the code above is a two way test: If the condition passed
in the first parameter is true, the function returns the value of the second parameter;
if not, it returns the value of the third one. The test in the first parameter uses the
IsControl method of the Char type helper, to filter out non-printable control char-
acters.

note The IfThen function operates more or less like the ?: operator of most programming languages
based on the C syntax. There is a version for strings and a separate one for Integers. For the string
version you have to include the System.StrUtils unit, for the Integer version of IfThen the Sys-
tem.SysUtils unit.

The grid of Unicode characters produced by the application is visible in Figure 6.4.
Notice that the output varies depending on the ability of the selected font and the
specific operating system to display a given Unicode character.

Figure 6.4:
The second page of the
ShowUnicode
application project has
some of the actual
Unicode characters

Marco Cantù, Object Pascal Handbook

164 - 06: All About Strings

The Char Type Revisited

After this introduction to Unicode, let's get back to the real topic of this chapter,
which is how the Object Pascal language manages characters and strings. I intro-
duced the Char data type in Chapter 2, and mentioned some of the type helper
functions available in the Character unit. Now that you have a better understanding
of Unicode, it is worth revisiting that section and going though some more details.
First of all, the Char type does not invariably represent a Unicode code point. The
data type, in fact, uses 2 bytes for each element. While it does represent a code point
for elements in Unicode'e Basic Multi-language Plane (BMP), a Char can also be part
of a pair of surrogate values, representing a code point.
Technically, there is a different type you could use to represent any Unicode code
point directly, and this is the UCS4Char type, which used 4 bytes to represent a
value). This type is rarely used, as the extra memory required is generally hard to
justify, but you can see that the Character unit (covered next) also includes several
operations for this data type.
Back to the Char type, remember it is an enumerated type (even if a rather large
one), so it has the notion of sequence and offers code operations like Ord, Inc, Dec,
High, and Low. Most extended operations, including the specific type helper, are not
part of the basic system RTL units but require the inclusion of the Character unit.

Unicode Operations With The Character Unit

Most of the specific operations for Unicode characters (and also Unicode strings, of
course) are defined in a special units called System.Character. This unit defines the
TCharHelper helper for the Char type, which lets you apply operations directly to
variables of that type.

note The Character unit also defines a TCharacter record, which is basically a collection of static class
functions, plus a number of global routines mapped to these method. These are older, deprecated
functions, given that now the preferred way to work on the Char type at the Unicode level is the
use of the class helper.

The unit also defines two interesting enumerated types. The first is called TUnicode-
Category and maps the various characters in broad categories like control, space,
uppercase or lowercase letter, decimal number, punctuation, math symbol, and
many more. The second enumeration is called TUnicodeBreak and defines the family

Marco Cantù, Object Pascal Handbook

06: All About Strings - 165

of the various spaces, hyphen, and breaks. If you are used to ASCII operations, this
is a big change.
Numbers in Unicode are not only the characters between 0 and 9; spaces are not
limited to the character #32; and so on for many other assumption of the (much
simpler) 256-elements alphabet.
The Char type helper has over 40 methods that comprise many different tests and
operations. They can be used for:
• Getting the numeric representation of the character (GetNumericValue).
• Asking for the category (GetUnicodeCategory) or checking it against one of the
various categories (IsLetterOrDigit, IsLetter, IsDigit, IsNumber, IsControl,
IsWhiteSpace, IsPunctuation, IsSymbol, and IsSeparator). I used the IsCon-
trol operation in the previous demo.
• Checking if it is lowercase or uppercase (IsLower and IsUpper) or converting it
(ToLower and ToUpper).
• Verifying if it is part of a UTF-16 surrogate pair (IsSurrogate, IsLowSurrogate,
and IsHighSurrogate) and convert surrogate pairs in various ways.
• Converting it to and from UTF32 (ConvertFromUtf32 and ConvertToUtf32) and
UCS4Char type (ToUCS4Char).
• Checking if it is part of a given list of characters (IsInArray).
Notice that some of these operations can be applied to the type as a whole, rather
than to a specific variable. In that can you have to call them using the Char type as
prefix, as in the second code snippet below.
To experiment a bit with these operations on Unicode characters, I've create an
application project called CharTest. One of the examples of this demo is the effect of
calling uppercase and lowercase operations on Unicode elements. In fact, the classic
UpCase function of the RTL works only for the base 26 English language characters
of the ANSI representation, while it fails some Unicode character that do have a spe-
cific uppercase representations (not all alphabets have the concept of uppercase, so
this is not a universal notion).
To test this scenario, in the CharTest application project I've added the following
snippet that tries to convert an accented letter to uppercase:
var
ch1: Char;
begin
ch1 := 'ù';
Show ('UpCase ù: ' + UpCase(ch1));
Show ('ToUpper ù: ' + ch1.ToUpper);

Marco Cantù, Object Pascal Handbook

01 R D Supekar Computer Science 11th
71% (7)
01 R D Supekar Computer Science 11th
212 pages
New - Unit 2
No ratings yet
New - Unit 2
264 pages
01 KM 072010004930012
No ratings yet
01 KM 072010004930012
174 pages
PPL Unit 2 PPT
No ratings yet
PPL Unit 2 PPT
195 pages
Number System Notes
No ratings yet
Number System Notes
6 pages
ch6 1-Datatypes
No ratings yet
ch6 1-Datatypes
84 pages
Delphi in A Unicode World Updated
No ratings yet
Delphi in A Unicode World Updated
30 pages
Module 2 (Data Types)
No ratings yet
Module 2 (Data Types)
97 pages
Cs321 Winter 2023 Lecture 3 Strings
No ratings yet
Cs321 Winter 2023 Lecture 3 Strings
36 pages
cs321 Wi
No ratings yet
cs321 Wi
36 pages
Chapter 06 Data Types
No ratings yet
Chapter 06 Data Types
32 pages
Pmdas Gmdas
100% (2)
Pmdas Gmdas
19 pages
Compliance Table
No ratings yet
Compliance Table
32 pages
Compliance Table
No ratings yet
Compliance Table
32 pages
Week 3 Unicode and Windows Architecture
No ratings yet
Week 3 Unicode and Windows Architecture
20 pages
ABAP Language: New Features With Relases 6.10 and 6.20: Andreas Blumenthal, SAP AG
No ratings yet
ABAP Language: New Features With Relases 6.10 and 6.20: Andreas Blumenthal, SAP AG
153 pages
Extra 01
No ratings yet
Extra 01
3 pages
Ex 0003
No ratings yet
Ex 0003
4 pages
Ex 0005
No ratings yet
Ex 0005
4 pages
Unit3 A
No ratings yet
Unit3 A
33 pages
Ex 0001
No ratings yet
Ex 0001
4 pages
Faculty of Engineering Yogyakarta State University Lab Sheet (Computer Programming)
No ratings yet
Faculty of Engineering Yogyakarta State University Lab Sheet (Computer Programming)
15 pages
Ex 0002
No ratings yet
Ex 0002
4 pages
Ex 0004
No ratings yet
Ex 0004
4 pages
186 - 06: All About Strings: Unicode Tunicodeencoding
No ratings yet
186 - 06: All About Strings: Unicode Tunicodeencoding
4 pages
Java and Unicode: The Confusion About String and Char in Java
No ratings yet
Java and Unicode: The Confusion About String and Char in Java
15 pages
Compliance Table
No ratings yet
Compliance Table
33 pages
Data Types
No ratings yet
Data Types
23 pages
Python Extra Tutorial
50% (2)
Python Extra Tutorial
172 pages
Character Sets, Encodings, and Unicode
No ratings yet
Character Sets, Encodings, and Unicode
26 pages
Extr 030
No ratings yet
Extr 030
4 pages
Computer Codes
No ratings yet
Computer Codes
28 pages
PSR E363 Ypt 360 en Songbook r1
100% (1)
PSR E363 Ypt 360 en Songbook r1
212 pages
Delphi and Unicode 2013
No ratings yet
Delphi and Unicode 2013
29 pages
Extr 040
No ratings yet
Extr 040
4 pages
159.102 Computer Science Fundamentals - Massey - Exam - S2 2012
No ratings yet
159.102 Computer Science Fundamentals - Massey - Exam - S2 2012
6 pages
Immediate Access To Unicode Demystified A Practical Programmer S Guide To The Encoding Standard 1st Edition Richard Gillam Ebook Full Chapters
No ratings yet
Immediate Access To Unicode Demystified A Practical Programmer S Guide To The Encoding Standard 1st Edition Richard Gillam Ebook Full Chapters
87 pages
Complete-Reference-Vb Net 61
No ratings yet
Complete-Reference-Vb Net 61
1 page
Character Sets and Encoding
No ratings yet
Character Sets and Encoding
7 pages
Characters Sets
No ratings yet
Characters Sets
2 pages
Unicode Better Explained
No ratings yet
Unicode Better Explained
5 pages
Characters and Char Sets
No ratings yet
Characters and Char Sets
24 pages
CS-602 - PPL - Unit-2
No ratings yet
CS-602 - PPL - Unit-2
31 pages
10200
No ratings yet
10200
38 pages
DTC Unicode Programming
No ratings yet
DTC Unicode Programming
14 pages
Maxbox Starter120 Unicode
No ratings yet
Maxbox Starter120 Unicode
7 pages
An Introduction To Unicode - The Trainer's Friend
No ratings yet
An Introduction To Unicode - The Trainer's Friend
52 pages
Abm Bus Math Q1 M1
No ratings yet
Abm Bus Math Q1 M1
6 pages
Strings - ASCII, UTF8, UTF32, ISCII (Indian Script Code), Unicode-2 PDF
No ratings yet
Strings - ASCII, UTF8, UTF32, ISCII (Indian Script Code), Unicode-2 PDF
30 pages
Ott-03-0035 Unicode and C Business Functions
No ratings yet
Ott-03-0035 Unicode and C Business Functions
11 pages
Chapter 1 Part 3 Continuation
No ratings yet
Chapter 1 Part 3 Continuation
2 pages
CodeGuru - C# 4.0 Cheat Sheet
100% (6)
CodeGuru - C# 4.0 Cheat Sheet
2 pages
Machine Level Representation of Data Character Representation
No ratings yet
Machine Level Representation of Data Character Representation
14 pages
Math 4 Quarter 1 Week 2
No ratings yet
Math 4 Quarter 1 Week 2
25 pages
Advanced C Workbook For Fybcs 2020
No ratings yet
Advanced C Workbook For Fybcs 2020
23 pages
Characters and Strings: Eric Roberts CS 106A April 27, 2012
No ratings yet
Characters and Strings: Eric Roberts CS 106A April 27, 2012
30 pages
Snare Drum Mastery 101 Sample Pack Lessons 12
No ratings yet
Snare Drum Mastery 101 Sample Pack Lessons 12
9 pages
07slide (Math Functions Characters and Strings)
No ratings yet
07slide (Math Functions Characters and Strings)
42 pages
Introduction To Unicode: History of Character Codes
No ratings yet
Introduction To Unicode: History of Character Codes
4 pages
5147 - C - CheatSheet - 2010 - Blue
No ratings yet
5147 - C - CheatSheet - 2010 - Blue
2 pages
Ascii and Unicode
No ratings yet
Ascii and Unicode
6 pages
CHARACTER ENCODING: How Do Computers Deal With Multiple Language?
No ratings yet
CHARACTER ENCODING: How Do Computers Deal With Multiple Language?
26 pages
Cobol
No ratings yet
Cobol
272 pages
Merry-Go-Round of Life - Bassoon
No ratings yet
Merry-Go-Round of Life - Bassoon
3 pages
Alphabet Practice
No ratings yet
Alphabet Practice
3 pages
10.2005.5 Unicode
No ratings yet
10.2005.5 Unicode
4 pages
Mathematics For Earth Science
No ratings yet
Mathematics For Earth Science
45 pages
Number Sense: Fractions
No ratings yet
Number Sense: Fractions
18 pages
Chapter 3 Measurement
100% (1)
Chapter 3 Measurement
19 pages
PDF Korean Phrases
No ratings yet
PDF Korean Phrases
7 pages
Unicode and Character Sets
No ratings yet
Unicode and Character Sets
2 pages
CP Questions Bank
100% (1)
CP Questions Bank
4 pages
Asynchronous Down Counter OBJECTIVE: To Design and Simulate The Asynchronous Down Counter
No ratings yet
Asynchronous Down Counter OBJECTIVE: To Design and Simulate The Asynchronous Down Counter
20 pages
P7 Measurment
No ratings yet
P7 Measurment
3 pages
Problem Addressed by The Topic
No ratings yet
Problem Addressed by The Topic
2 pages
Some Basic Block Codes - 2804
No ratings yet
Some Basic Block Codes - 2804
3 pages
The LINQ Project: Don Box, Architect, Microsoft Corporation and Anders Hejlsberg, Technical Fellow, Microsoft Corporation
No ratings yet
The LINQ Project: Don Box, Architect, Microsoft Corporation and Anders Hejlsberg, Technical Fellow, Microsoft Corporation
31 pages
2 221102 152422
No ratings yet
2 221102 152422
26 pages
Topic 2 - Subtopic 2.1
No ratings yet
Topic 2 - Subtopic 2.1
53 pages
CH 04
No ratings yet
CH 04
9 pages
Ex 01
No ratings yet
Ex 01
5 pages
Cha 01
No ratings yet
Cha 01
7 pages
Ex 03
No ratings yet
Ex 03
4 pages
Extr 010
No ratings yet
Extr 010
4 pages
Cha 03
No ratings yet
Cha 03
8 pages
Extra 3
No ratings yet
Extra 3
4 pages
Grade 5 DLL MAPEH 5 Q4 Week 4
No ratings yet
Grade 5 DLL MAPEH 5 Q4 Week 4
7 pages
Revision On Binary & Hexa - Answers
No ratings yet
Revision On Binary & Hexa - Answers
3 pages
I Am The Strongest Piano Violin Viola Arrangement by TakiArte Version 1 PDF
No ratings yet
I Am The Strongest Piano Violin Viola Arrangement by TakiArte Version 1 PDF
11 pages
CS600 Lab 1 - 54423
No ratings yet
CS600 Lab 1 - 54423
4 pages
Spanish Months of The Year SpanishDict
No ratings yet
Spanish Months of The Year SpanishDict
1 page
Solfege Basics Dash: Drills
No ratings yet
Solfege Basics Dash: Drills
2 pages
C Programming Language
From Everand
C Programming Language
Younish Pathan
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
A Beginner's guide to Python
From Everand
A Beginner's guide to Python
Steven Mcananey
No ratings yet

Extr 050

Uploaded by

Extr 050

Uploaded by

162 - 06: All About Strings

ListItem.Text := 'Surrogate Code Points';

Marco Cantù, Object Pascal Handbook

Marco Cantù, Object Pascal Handbook

The Char Type Revisited

Unicode Operations With The Character Unit

Marco Cantù, Object Pascal Handbook

Marco Cantù, Object Pascal Handbook

You might also like