Differences Between Unicode and Non-Unicode Programs
Differences Between Unicode and Non-Unicode Programs
SAP NetWeaver AS ABAP Release 731, Copyright 2015 SAP AG. All rights reserved.
Uncontrolled access to segments of the working memory is not possible in Unicode programs.
This makes Unicode programs easier to understand, more robust, and easier to maintain than non-Unicode
programs.
The following section lists the language constructs and statements for which there are differences between
Unicode and non-Unicode programs:
1.
2.
3.
Underscores ("_")
For compatibility reasons, you can also use the characters "%", "$", "?", "-", "#", and "*" but these should be
used only in exception cases (for example, for existing program generations) and with good justification. You
can also use forward slashes ("/") for namespace prefixes.
Note
Apart from ABAP Objects, non-Unicode programs can also use characters other than the ones listed above.
This can cause the following problems in these programs:
If characters are used that are not available in all code page supported by SAP, it might not be possible to run
certain programs when using a different code page to the one in which they were created.
Meaning
Text field
Date field
Numerical text
Time field
Text string
In addition, structures are character-type if they contain only flat character-type components (only components
from the above table with the exception of text strings).
In Unicode programs, a structure can now essentially only be used at an operand position that expects a single
field if the structure is character-type. It is then handled in the same way as a data object of type c.
In non-Unicode programs, all flat structures and byte-type data objects are also still handled as character-type
data objects (implicit casting).
Note
The incorrect use of structures at operand positions is greatly restricted in Unicode programs. For example, a
structure that contains a numeric component can no longer be used at a numeric operand position.
Z, M), character-type data objects are still expected, while in Unicode programs only byte-type data objects are
permitted.
Note
In Unicode programs, the storage of byte strings in character-type containers causes problems, as the byte
order of character-type data objects in Unicode systems is platform dependent. In non-Unicode systems, this
only applies for data objects of numeric data types. The content of the data objects is interpreted incorrectly if a
container of this type is stored persistently and is then imported to an application server with a different byte
sequence.
components of different data types, it is not possible to define whether offset and length should be specified in
characters or bytes. Furthermore, restrictions have been introduced that forbid access to memory areas outside
of flat data objects.
flat, has a character-like initial fragment according to the Unicode fragment view, and the offset/length
specification accesses this initial fragment.
In both cases, the specification of offset and length is interpreted as a number of characters.
Example
The following structure has both character-like and non-character-like components:
DATA:
BEGIN OF struc,
a TYPE c LENGTH 3,
b TYPE n LENGTH 4,
c TYPE d,
d TYPE t,
e TYPE decfloat16,
f TYPE c LENGTH 28,
g TYPE x LENGTH 2,
END OF struc.
"Length
"Length
"Length
"Length
"Length
"Length
"Length
3 characters
4 characters
8 characters
6 characters
8 bytes
28 characters
2 bytes
The Unicode fragment view splits the structure into five areas, F1 - F5.
[ aaa | bbbb | cccccccc | ddd | AAA | eeee | fffffffffffff | gg ]
[
F1
| F2 | F3 |
F4
| F5 ]
Offset/length access is only possible for the character-like initial fragment F1. Specifications such as
struc(21) or struc+7(14) are accepted and are handled as a single field of type c. An access such as
struc+57(2), for example, is not permitted in Unicode systems.
When assigning a memory area to a field symbol using the ASSIGN statement, in Unicode programs it is now
only possible to use offset/length specifications to access the memory within the data object. The addition
RANGE defines the data object.
Field symbols themselves are also allocated an assignable memory area. This is effective if a field symbol is
used as a source in the ASSIGN statement.
In non-Unicode programs, the assignable area is defined by the data area of the current program, which can
lead to references being overwritten.
If a data object is entered as a source in ASSIGN, no offset can be specified without a length unless the explicit
RANGE addition is specified. Otherwise, this would implicitly set the length of the data object. If the name of a
field symbol is specified, its data type in Unicode programs must be flat and elementary if an offset is specified
without a length.
Note
Previously, cross-field offset/length accesses could be usefully implemented in the ASSIGN statement for
processing repeating groups in structures. In order to enable this in Unicode systems, the ASSIGN statement
has been enhanced with the additions RANGE and INCREMENT.
DO ... VARYING
In the DO and WHILE loops in Unicode programs, all data objects of the sequence must be compatible and
either be structure components that belong to the same structure, or subareas of the same data object
specified using offset/length specifications. In Unicode programs, a RANGE must also be entered if it cannot be
statically recognized that the data objects involved are components of the same structure. Otherwise, the
permitted memory area is determined from the smallest possible substructure.
When memory sequences are added using ADD, in Unicode programs, all data objects of the sequence must
be components of a structure. If this cannot be statically recognized in the syntax check, a structure must be
specified using the addition RANGE.
Two structures in Unicode programs are only compatible when all alignment gaps are identical on all platforms.
This applies in particular for alignment gaps that are created by included structures (INCLUDE)
ABAP Dictionary structures and database tables that are delivered by SAP can be enhanced using customizing
includes or append structures. These types of changes cause problems in Unicode programs if the
enhancements change the Unicode fragment view.
For this reason, the option to classify structures and database tables was introduced, which makes it possible
to recognize and handle problems related to structure enhancements. This classification is used during in the
program check to create a warning at all points where the program works with structures, and where later
structure enhancements can cause syntax errors or changes in program behavior. When you define a structure
or a database table in ABAP Dictionary, you can specify the enhancement categories that are displayed in the
following table as classification.
Level Category
1
Unclassified
2
Cannot be enhanced
Meaning
The structure does not have an enhancement category.
The structure must not be enhanced.
All structure components and their enhancements must be character-like
and flat.
All structure components and their enhancements can have any data
type.
The warnings displayed after the program check are classified into three levels from the following table,
depending on the consequences of the permitted structure enhacements.
Level Type of Check Meaning
An enhancement that fully utilizes the enhancement category of the structure in question leads to a
A
Syntax check
syntax error.
B
Extended check Permitted enhancements can lead to a syntax errors, but not always.
Permitted enhancements cannot lead to syntax errors, although changes to program behavior do
C
Extended check
result in semantic problems.
Example
If the structure ddic_struc in ABAP Dictionary is defined only with flat components but is classified as Can
be enhanced in any way, then the following program section leads to a warning in the syntax check. If the
structure were to be enhanced by a deep component after the program was delivered, the program would be
syntactically incorrect and no longer executable. This is why in this case you either have to classify the
structure ddic_struc in ABAP Dictionary as Can be enhanced and character-like or else you cannot specify
the offset/length in the program.
DATA: my_struc TYPE ddic_struc,
str TYPE string,
off TYPE i,
len TYPE i.
...
str = my_struc+off(len).
In Unicode programs, character string and byte string processing are strictly separated. The operands of
character string processing must be character-like data objects, and operands in byte string processing must
be byte-like data objects. In non-Unicode programs, byte strings are normally handled in the same way as
character strings.
Syntactic Separation
DESCRIBE DISTANCE
It must be possible to exchange data between different non-Unicode systems that use different code pages.
For this reason, in Unicode programs, you must always define the code page used to encode the charactertype data that is written in text files or that is read from text files.
You must also consider that a Unicode program must be executable in a non-Unicode system as well as a
Unicode system. Some of the syntax rules for the file interface have therefore been modified so that
programming data access in Unicode programs is less prone to errors than in non-Unicode programs.
Before every read or write access, a file must be opened explicitly using OPEN DATASET. Furthermore, a file
that is already open cannot be opened again. In non-Unicode programs, the first time a file is accessed, it is
implicitly opened using the standard settings. The statement for opening a file can be applied to an open file in
non-Unicode-programs, although a file can only be opened once within a program.
When opening the file, the access type and type of file storage must be specified explicitly using the following
additions:
INPUT|OUTPUT|APPENDING|UPDATE
When opening a file in TEXT MODE, the ENCODING addition must be used to specify the character
representation. When opening a file in LEGACY MODE, the byte order (endian) and a non-Unicode code page
must be specified.
In non-Unicode programs, if nothing is entered, a file is opened with implicit standard settings.
If a file is opened for reading, the content can only be read. In non-Unicode programs, it is also possible to gain
write access to these files.
If a file is opened as a text file, only the content of character-type data objects can be read or written. In nonUnicode programs, byte-type and numeric data objects are also allowed.
Note
In Unicode programs, file names can also contain blank characters.
Each time a data object is produced by WRITE, the system defines an output length either implicitly or explicitly;
the implicit output length depends on the data type. The output length defines the following two attributes:
Number of positions or memory spaces available for characters in the list buffer
If the output length is shorter than the length of the data object, the system shortens its content according to
certain rules when writing the data to the list buffer. Any values lost in numeric fields are indicated by a *.
When displaying or printing a list, the content stored in the list buffer is transferred to the list as follows:
In non-Unicode systems, each character occupies the same amount of space in the list buffer as it requires
columns in the list. In single-byte systems, a character occupies one byte in the list buffer and one column in the
list, while a character that occupies several bytes in the list buffer in multi-byte systems also occupies the same
number of columns in the list. For this reason, all the characters stored in the list buffer are displayed in the list in
non-Unicode systems.
In Unicode systems, every character usually occupies one place in the list buffer. However, a character can also
occupy more than one column, as is the case for Eastern Asian characters. However, since the list only contains
the same number of columns as there are positions in the list buffer, the number of characters that can be
displayed in the list is smaller than the number of characters stored in the list buffer in this case. List output is
shortened accordingly, with the page formatted according to the specified alignment and marked with the
characters > or <. You can then only display the entire content of the list by choosing the menu path System
List Unicode Display.
For this reason, the horizontal position of the list cursor only has the same meaning as the output column in a
list displayed or printed in non-Unicode systems. In Unicode systems, this is only guaranteed for the top and
bottom output limits.
1.
2.
3.
1.
In data objects of the types c and string, the output length is set to the number of columns required to
display the entire content in the list; closing blanks are ignored for type c. In the case of data objects of the
type string, this has the same meaning as the implicit length.
2.
In data objects of the types d and t, the output length is set to 10 and 8.
3.
In data objects of the numeric types i, f, and p, the output length is set to the value required to display the
current value including thousand separators. This rule is applied to the value after any CURRENCY,
DECIMALS, NO-SIGN, ROUND, or UNIT have been used.
4.
The implicit output length is used for data objects of the types n, x, and xstring.
4.
1.
In data objects of the type c, the output length is set to twice the length of the data object, and in data objects
of the type string, to twice the number of characters contained in the object.
2.
In data objects of the types d and t, the output length is set to 10 and 8.
3.
In data objects of the numeric types i, f, and p, the output length is set to the value required in order to
display the maximum possible values for these types, including plus and minus signs and thousands
separators. This rule is applied to the value after any CURRENCY, DECIMALS, NO-SIGN, ROUND, or
UNIT additions have been used.
4.
The implicit output length is used for data objects of the types n, x, and xstring.
The behavior of the output lengths (*) and (**) when using the addition USING EDIT MASK and the
templates for date fields is described in Formatting Options.
List Settings
The objects in a list can be displayed in different output lengths by specifying the desired length in the menu
under System List Unicode Display. This is particularly advantageous for screen lists in Unicode systems
where the output is cut off as indicated by the characters > or <.
Recommendations
We recommend that you adhere to the following rules when programming lists, to ensure that they have the
same appearance and functions both in Unicode and non-Unicode systems:
Do not use the additions RIGHT-JUSTIFIED or CENTERED for WRITE TO if this statement is followed by
list output with WRITE.
In customer-programmed horizontal scrolling with a SCROLL statement, you should only specify the upper or
lower limit of data objects displayed, since the positions in the list buffer and in the list displayed are only certain
to match for these field limits in Unicode systems.