0% found this document useful (0 votes)
2 views

Unicode System - Outside Communication for ABAP Programmers

The document discusses the challenges and solutions related to Unicode communication for ABAP programmers, highlighting the limitations of traditional code pages and the advantages of using Unicode. It covers various aspects such as RFC (Remote Function Call) handling, file transfer methods, and common mistakes in data exchange. Additionally, it provides practical examples and exercises to reinforce understanding of Unicode implementation in SAP systems.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Unicode System - Outside Communication for ABAP Programmers

The document discusses the challenges and solutions related to Unicode communication for ABAP programmers, highlighting the limitations of traditional code pages and the advantages of using Unicode. It covers various aspects such as RFC (Remote Function Call) handling, file transfer methods, and common mistakes in data exchange. Additionally, it provides practical examples and exercises to reinforce understanding of Unicode implementation in SAP systems.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Unicode System:

Outside Communication
for ABAP Programmers

Dr. Christian Hansen


Server Technology Internationalization
SAP AG
Contents

Introduction
About Code Pages
Communication: The Ideal Picture
Communication: The Reality

Part I – RFC
Unicode Unicode
Unicode single code page system
Unicode MDMP system

Part II – File transfer


Writing and reading files on the application server
Writing and reading files on the front end

Part III – Common mistakes

Exercises
 2003 SAP AG, Unicode Outside Communication, Christian Hansen 2
About Code Pages: Conventional Code Pages

Disadvantages of old standard code pages


Each covers only a subset of all characters used
Incompatibilities between different codepages
Only restricted data exchange possible
Too many of them
KYOCERA
Canon
APPLE
HP
IBM

IS0-5IS0-9
IS0-9 Mircosoft
EBCDIC 1250
697/ 1251
IS0-3
697/ 0277 IS0-7 IS0-2
IS0-3
0500
IS0-7 IS0-2 1256
1257 1252 12
ASCII1252 12
1250
1251
IS0-3
IS0-2
IS0-9
BIG-5
BIG-5 1252 1254
IS0-5
IS0-5 IS0-9
IS0-8 SJIS
IS0-8 SJIS
IS0-4
IS0-3
IS0-4
IS0-3
IS0-2
IS0-7
IS0-7 IS0-6IS0-2
IS0-1
IS0-1IS0-6 SAP:
Languages: 41
Characters: 22,378
 2003 SAP AG, Unicode Outside Communication, Christian Hansen 3 Code Pages: 390
Solution: Unicode, one Code Page for all Scripts

Japanese Chinese
Hebrew
Korean

Greek
And more
Taiwanese languages
can be
Russian English supported
Ukrainian easily
without the
Thai need for
Danish
Dutch, German Croatian new code
pages or
ndic

Finnish Czech
French, Italian Hungarian other new
Icela

Norwegian Polish methods


Portuguese Rumanian
Spanish Slovakian
Swedish Slovene
Turkish
 2003 SAP AG, Unicode Outside Communication, Christian Hansen 4
Solution: Unicode characters
ASCII
General Scripts
Symbols

CJK Ideographs

65,000 characters

Hangul

Compatibility

Additional
Surrogate Area
1,000,000 characters

 2003 SAP AG, Unicode Outside Communication, Christian Hansen 5


Representation of Unicode Characters

UTF-16 – Unicode Transformation Format, 16 bit encoding


Fixed length, 1 character = 2 bytes (surrogate pairs = 2 + 2 bytes)
Platform-dependent byte order (big/little endian)
2 byte alignment restriction

UTF-8 – Unicode Transformation Format, 8 bit encoding


Variable length, 1 character = 1...4 bytes
Platform independent
no alignment restriction
7 bit US ASCII compatible

Character Unicode UTF-16 UTF-16 UTF-8


scalar value big endian little endian
a U+0061 00 61 61 00 61
ä U+00E4 00 E4 E4 00 C3 A4
α U+03B1 03 B1 B1 03 CE B1

U+3479 34 79 79 34 E3 91 B9

 2003 SAP AG, Unicode Outside Communication, Christian Hansen 6


Communication: The Ideal Picture

The ideal Picture: only Unicode components

Conversions are done R/3 Enterprise


algorythmically (1:1
relation)
No data
misinterpretation
mySAP BW
No data loss
3rd Party
All business relevant
characters available at
the same time
R/3 Enterprise
...
Internet Files

 2003 SAP AG, Unicode Outside Communication, Christian Hansen 7


Communication: Reality

The reality: Unicode and non-Unicode components


Conversions between
incompatible code pages
everywhere R/3 4.6C
ISO8859-1
Only common subset SJIS
exchangeable
Special rules have to be
obeyed to make mySAP BW
communication possible ISO8859-1
3rd Party
... EBCDIC

R/3 Enterprise

Internet Files
IS0-1 IS0-8
1251
...charset=iso-8859-1" >
IS0-9 SJIS
...charset=windows-1257" > BIG-5697/
...charset=Shift_JIS" > 697/
0500
IS0-3
...charset=utf-8" > 0277
IS0-2
IS0-7
1252

 2003 SAP AG, Unicode Outside Communication, Christian Hansen 8


Contents

Introduction
About Code Pages
The Ideal Picture
Reality

Part I – RFC
Unicode Unicode
Unicode single code page system
Unicode MDMP system

Part II – File transfer


Writing and reading files on the application server
Writing and reading files on the front end

Part III – Common mistakes

Exercises
 2003 SAP AG, Unicode Outside Communication, Christian Hansen 9
RFC Unicode Unicode

R/3 Enterprise R/3 Enterprise

In case of an Unicode Unicode combination RFC passes all character


data without code page conversion or merely with adaption of the
endianness.

• UTF-16 big endian = SAP code page 4102


• UTF-16 little endian = SAP code page 4103

Information about the destination is maintained in SM59


special options character width in target system

• 1 Byte = non-Unicode
• 2 Byte = Unicode

 2003 SAP AG, Unicode Outside Communication, Christian Hansen 10


RFC Unicode non-Unicode single code page

R/3 Enterprise R/3 4.6C


ISO8859-1

In case of an Unicode non-Unicode single code page combination,


RFC passes all character data with code page conversion between Unicode
and the old code page.

As Unicode is a true superset of any old standard codepage not all


Unicode characters can be transfered to the non-Unicode system:

Ä Ä
ß ß
あ #
東 #
한 #
พ #
 2003 SAP AG, Unicode Outside Communication, Christian Hansen 11
RFC Unicode non-Unicode MDMP

R/3 Enterprise R/3 4.6C


ISO8859-1
SJIS

In case of an Unicode non-Unicode MDMP combination RFC passes all


character data with code page conversion between Unicode and the different
old code pages.

Which of the MDMP code pages is choosen depends on the language:

Ä DE Ä
ß DE ß
あ JA あ
東 JA 東

한 JA #
พ JA #

 2003 SAP AG, Unicode Outside Communication, Christian Hansen 12


RFC Unicode non-Unicode MDMP

Excursion: Difference between “flat“ and “deep“ data types

Flat: C, N, D, T, X, I, F, P and
any structure consisting only of these fields

Deep: STRING, XSTRING, table types, object references and


any structure containing one of these types

Deep data types are transferred using an UTF-8 encoded XML format (XRFC).

 2003 SAP AG, Unicode Outside Communication, Christian Hansen 13


RFC Unicode non-Unicode MDMP

Excursion: Difference between “flat“ and “deep“ data types

Detailed conversion paths:

Deep data: Unicode XML UTF-8 target code page


Flat data: Unicode non-Unicode compatible target code page
source code page

Deep data: Unicode XML UTF-8 source code page


non-Unicode compatible
Flat data: Unicode target code page source code page

Unicode system Non-Unicode system

 2003 SAP AG, Unicode Outside Communication, Christian Hansen 14


RFC Unicode non-Unicode MDMP

Deriving code pages a) : Data without language key

Source Data Source code Intermediate Target code


system type page format * page
Logon language
Flat source system * *
Unicode Logon language
Unicode target system
Deep
UTF-8 based XML

Logon language Logon language


non- Flat
source system source system
Unicode Unicode
Deep SY-LANGU UTF-8 based XML
source system
* XML / non-Uniocde compatible code page
* * You may switch to “Logon language target system” using RFC bit option 0x200 at SM59 Special options RFC Bit Options

Example: Flat data, logon language German


Ä Logon = DE Ä
あ Logon = DE #

 2003 SAP AG, Unicode Outside Communication, Christian Hansen 15


RFC Unicode non-Unicode MDMP

Deriving code pages b) : Data (flat) with language key

Flat Structures containing a language key (domain SPRAS, DDIC data


type LANG) and maintained text language flag have a special handling:

Automatic language code page assignment is done during


RFC for each row independent of logon language.

This enables sending and and receiving tables from MDMP systems
(different code pages for each row):

Ä Logon = DE / Lang key = DE Ä


あ Logon = DE / Lang key = JA あ

Maintain language codepage assignment with SM59


Maintain text language flag with SE11

 2003 SAP AG, Unicode Outside Communication, Christian Hansen 16


Maintain RFC destination SM59: MDMP settings

 2003 SAP AG, Unicode Outside Communication, Christian Hansen 17


SE11: Maintain text language

 2003 SAP AG, Unicode Outside Communication, Christian Hansen 18


Contents

Introduction
About Code Pages
The Ideal Picture
Reality

Part I – RFC
Unicode Unicode
Unicode single code page system
Unicode MDMP system

Part II – File transfer


Writing and reading files on the application server
Writing and reading files on the front end

Part III – Common mistakes

Exercises
 2003 SAP AG, Unicode Outside Communication, Christian Hansen 19
File transfer: Application server

Pattern for writing/reading files on the application server:

OPEN DATASET IN <modus> MODE


TRANSFER/READ
CLOSE DATASET

<modus>:

BINARY MODE
Uninterpreted sequence of bytes.

TEXT MODE ENCODING UTF-8 / NON-UNICODE / DEFAULT


Pure unstructured text data. DEFAULT equals UTF-8 in Unicode
systems and NON-UNICODE in non-Unicode systems.

LEGACY TEXT/BINARY MODE


Produces an format compatible to non-Unicode systems. Text data is always
written in NON-UNICODE format. Not character-like structures are allowed.
The only difference between TEXT and BINARY is, that in case of TEXT an
EOF (END OF FILE) marker is added.

 2003 SAP AG, Unicode Outside Communication, Christian Hansen 20


File transfer: Application server

Code page selection NON-UNICODE:


If during data transfer a Unicode non-Unicode conversion is
neccessary, the non-Unicode code page is derived from the current
system language SY-LANGU, which may be changed by using SET
LOCALE LANGUAGE <lang>.

Advantages and disadvantages for data exchange:


BINARY. Not a good exchange format in itself. Use this for
writing/reading prepared data of well known format (e.g. XML
/UTF-8 as XSTRING) or use for write/read on the same application
server.
TEXT MODE: UTF-8 is a good exchange format. Structures may
not be transfered as a whole. Only single fields.
LEGACY MODES: Only for reading or writing non-Unicode data.
Structure and code page information is considered.

 2003 SAP AG, Unicode Outside Communication, Christian Hansen 21


File transfer: Application server

Example 1: BINARY MODE

BINARY MODE

R/3 Enterprise R/3


LEGACY ISO8859-1
BINARY MODE 1100 BINARY MODE SJIS

8000
SY-LANGU

 2003 SAP AG, Unicode Outside Communication, Christian Hansen 22


File transfer: Application server

Example 2: TEXT MODE UTF-8

TEXT MODE UTF-8 TEXT MODE UTF-8


SY-LANGU

R/3 Enterprise R/3


ISO8859-1
TEXT MODE UTF-8 TEXT MODE UTF-8 SJIS

SY-LANGU

☺ Full charset supported (no data loss in the file)


☹ Structured data as a whole write field by field = ☺
 2003 SAP AG, Unicode Outside Communication, Christian Hansen 23
File transfer: Application server

Example 3: TEXT MODE NON-UNICODE

TEXT MODE TEXT MODE


NON-UNICODE 1100 NON-UNICODE
SY-LANGU 8000

R/3 Enterprise R/3


ISO8859-1
TEXT MODE TEXT MODE
1100 SJIS
NON-UNICODE NON-UNICODE
8000
SY-LANGU

☹ Full charset supported (no data loss in the file)


☹ Structured data as a whole write field by field = ☺
 2003 SAP AG, Unicode Outside Communication, Christian Hansen 24
File transfer: Application server

Example 4: TEXT MODE DEFAULT

TEXT MODE TEXT MODE


DEFAULT UTF-8
SY-LANGU

R/3 Enterprise R/3


TEXT MODE TEXT MODE ISO8859-1
NON-UNICODE 1100 DEFAULT SJIS

SY-LANGU 8000

 2003 SAP AG, Unicode Outside Communication, Christian Hansen 25


File transfer: Application server

Example 5: LEGACY TEXT/BINARY MODE

LEGACY TEXT/ LEGACY TEXT/


BINARY MODE 1100 BINARY MODE
SY-LANGU 8000

R/3 Enterprise R/3


LEGACY TEXT/ LEGACY TEXT/ ISO8859-1
BINARY MODE 1100 BINARY MODE SJIS

8000
SY-LANGU

☹ Full charset supported (no data loss in the file)


☺ Structured data
 2003 SAP AG, Unicode Outside Communication, Christian Hansen 26
File transfer: Using XML

Using XML as transport format


Use CALL TRANSFORMATION with target data type XSTRING to
create an UTF-8 based XML representation of your data.

Structure information
(no layout / alignment
problems)
UTF-8 based (no data
loss)
Transport in binary
form

 2003 SAP AG, Unicode Outside Communication, Christian Hansen 27


File transfer: Application server

Example 6: UTF-8 based XML + BINARY MODE

CALL BINARY MODE +


TRANSFORMATION CALL
+ BINARY MODE TRANSFORMATION

SY-LANGU

R/3 Enterprise BINARY MODE + R/3


CALL
CALL TRANSFORMATION ISO8859-1
TRANSFORMATION + BINARY MODE SJIS

SY-LANGU

☺ Full charset supported (no data loss in the file)


☺ Structured data
 2003 SAP AG, Unicode Outside Communication, Christian Hansen 28
File transfer: Frontend

File transfer at the frontend with GUI_UP/DOWNLOAD

The function modules GUI_/UPDOWNLOAD convert data into textual


representation. Structures are allowed.

Determination of the outside code page:

Front end code page matching to the current system code page
(SY-LANGU, SET LOCALE LANGUAGE)
Declared explicitly with optional parameter CODEPAGE
(Starting with release 6.20 SP 21).

It is planned to provide in cl_gui_frontend_services=>file_open/save_dialog


the possibility to select from different frontend code pages (e.g. in the
Unicode system you may select old standard code pages rather than using
the standard frontend cp UTF-8 or later UTF-16).

 2003 SAP AG, Unicode Outside Communication, Christian Hansen 29


Overview: RFC and File transfer

RFC and file transfer from a Unicode systems perspective

 2003 SAP AG, Unicode Outside Communication, Christian Hansen 30


Contents

Introduction
About Code Pages
The Ideal Picture
Reality

Part I – RFC
Unicode Unicode
Unicode single code page system
Unicode MDMP system

Part II – File transfer


Writing and reading files on the application server
Writing and reading files on the front end

Part III – Common mistakes

Exercises
 2003 SAP AG, Unicode Outside Communication, Christian Hansen 31
Common mistakes: overview

Things you should never do!

☠ Type hiding

☠ Missing language key

☠ Wrong length assumptions

☠ Sending data that is not in the receivers codepage

☠ ...

 2003 SAP AG, Unicode Outside Communication, Christian Hansen 32


Common mistakes: Type hiding: binary data

Don't hide types 1


If you conceal the true types from the system the system cannot anything for you.
As a consequence, data may, for example, be subject to unwanted codepage conversions.

Example: ☠ Transporting binary data in character containers

 2003 SAP AG, Unicode Outside Communication, Christian Hansen 33


Common mistakes: Type hiding: characterlike data
Don't hide types 2
Even sending a pure characterlike structure in a character container conceals
important information – the field boundaries – from the system.

Example: ☠ Transporting characterlike data in character containers

 2003 SAP AG, Unicode Outside Communication, Christian Hansen 34


Common mistakes: Type hiding: characterlike data

Workaround if container approach cannot be changed

Use CL_NLS_STRUC_CONTAINER to correct the implicit layout:

NAME RGB Value

초 록 색 0 0 F F 0 0
Unicode struc_to_cont cont_to_struc
system
초 록 색 0 0 F F 0 0
Data
RFC container

초 록 색 0 0 F F 0 0
Non-Unicode cont_to_struc struc_to_cont
system
초 록 색 0 0 F F 0 0
 2003 SAP AG, Unicode Outside Communication, Christian Hansen 35
Common mistakes: Missing language key

Always use language keys


In principle you must not send any data without language key if the data contains non 7
bit ASCII characters. Otherwise corruption of the data is the result.

Example: ☠ Sending non Latin 1 data without language key by RFC with German logon

 2003 SAP AG, Unicode Outside Communication, Christian Hansen 36


Common mistakes: Wrong length assumptions

Problems with length assumptions


String lengths are not invariant under code page conversions. This may lead
to different problems:

In a Unicode system a character field of certain length can hold more


characters than the same character field in a non-Unicode system. Sending
such data will result in data loss (☠).

 2003 SAP AG, Unicode Outside Communication, Christian Hansen 37


Common mistakes: Wrong length assumptions

Problems with length assumptions (continued)

Breaking a string into a table of fixed line size and sending the table from a
non-Unicode to a Unicode-system does not work, since the information
about the occupied length is lost and subsequent reassembling into a string
will insert unwanted spaces (☠).

 2003 SAP AG, Unicode Outside Communication, Christian Hansen 38


Common mistakes: data not in receivers codepage
Data not in the receivers code page

In general you must not send data from a source system into a target
system, if the characters send are not in the target systems code page.
Especially don‘t send one of the characters that are only in the Unicode
code page to an old-fashioned non-Unicode system:

Try to send a white smiling face (☺) or a black smiling face (☻) or
some beamed eigth notes (♫) ! ( # ☠)

 2003 SAP AG, Unicode Outside Communication, Christian Hansen 39


Contents

Introduction
About Code Pages
The Ideal Picture
Reality

Part I – RFC
Unicode Unicode
Unicode single code page system
Unicode MDMP system

Part II – File transfer


Writing and reading files on the application server
Writing and reading files on the front end

Part III – Common mistakes

Exercises
 2003 SAP AG, Unicode Outside Communication, Christian Hansen 40
Exercises
Send single code page and MDMP data via RFC
Type hiding and missing language keys:
TECHED_UNICODE_EXERCISE_11/12/13/14 and15
Wrong length assumptions:
TECHED_UNICODE_EXERCISE_16/18
Data not in the receivers code page:
TECHED_UNICODE_EXERCISE_17

Transfer data via file on the application server


Writing files:
TECHED_UNICODE_EXERCISE_19
Reading files:
TECHED_UNICODE_EXERCISE_20
Transfer data via file on the frontend
Writing files:
TECHED_UNICODE_EXERCISE_21
Reading files:
TECHED_UNICODE_EXERCISE_22
 2003 SAP AG, Unicode Outside Communication, Christian Hansen 41
Further Information
Service Marketplace:
Technical information: http://service.sap.com/Unicode@SAP
Customer contact: mail [email protected]

Further Presentations
http://service.sap.com/Unicode@SAP Unicode Technology
Media Library:
‘Unicode Enabling ABAP Programs’ or
‘ABAP Conversion – SAP Tutor’
Unicode Support in SAP Web Application Server

 2003 SAP AG, Unicode Outside Communication, Christian Hansen 42


Questions?

Q&A

 2003 SAP AG, Unicode Outside Communication, Christian Hansen 43


Copyright 2003 SAP AG. All Rights Reserved

No part of this publication may be reproduced or transmitted in any form or for any purpose without the express
permission of SAP AG. The information contained herein may be changed without prior notice.
Some software products marketed by SAP AG and its distributors contain proprietary software components of other
software vendors.
Microsoft®, WINDOWS®, NT®, EXCEL®, Word®, PowerPoint® and SQL Server® are registered trademarks of
Microsoft Corporation.
IBM®, DB2®, DB2 Universal Database, OS/2®, Parallel Sysplex®, MVS/ESA, AIX®, S/390®, AS/400®, OS/390®,
OS/400®, iSeries, pSeries, xSeries, zSeries, z/OS, AFP, Intelligent Miner, WebSphere®, Netfinity®, Tivoli®,
Informix and Informix® Dynamic ServerTM are trademarks of IBM Corporation in USA and/or other countries.
ORACLE® is a registered trademark of ORACLE Corporation.
UNIX®, X/Open®, OSF/1®, and Motif® are registered trademarks of the Open Group.
Citrix®, the Citrix logo, ICA®, Program Neighborhood®, MetaFrame®, WinFrame®, VideoFrame®, MultiWin® and
other Citrix product names referenced herein are trademarks of Citrix Systems, Inc.
HTML, DHTML, XML, XHTML are trademarks or registered trademarks of W3C®, World Wide Web Consortium,
Massachusetts Institute of Technology.
JAVA® is a registered trademark of Sun Microsystems, Inc.
JAVASCRIPT® is a registered trademark of Sun Microsystems, Inc., used under license for technology invented
and implemented by Netscape.
MarketSet and Enterprise Buyer are jointly owned trademarks of SAP AG and Commerce One.
SAP, SAP Logo, R/2, R/3, mySAP, mySAP.com and other SAP products and services mentioned herein as well as
their respective logos are trademarks or registered trademarks of SAP AG in Germany and in several other
countries all over the world. All other product and service names mentioned are trademarks of their respective
companies.

 2003 SAP AG, Unicode Outside Communication, Christian Hansen 44

You might also like