0% found this document useful (0 votes)

22 views33 pages

AU14C04-Codepages and DB2

The document discusses character sets, encodings, and code page conversion as it relates to DB2. It provides details on defining code pages at the operating system, database, and application levels. It also covers where and when code page conversion occurs between a client and server, and potential issues that can arise from conversion.

Uploaded by

schock.903777

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views33 pages

AU14C04-Codepages and DB2

Uploaded by

schock.903777

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

#IDUG

Code sets, NLS and

character conversion vs. DB2

Roland Schock
ARS Computer und Consulting GmbH
Session Code: C04
2014-09-10 | Platform: LUW
#IDUG
2

Overview

• What are character sets, encoding schemes and code pages?

• Where can I define the code page used?
• What is code page conversion and where does it happen?
• What problems can arise and how can I avoid them?
• Performance considerations
#IDUG
3

Character Sets

• Basically a character set is just a collection of entities or

graphical symbols with a meaning.
• Examples for character sets are the latin alphabet, digits, naval
flag signs or other symbols:

A, B, C, ... ᇹぁゆ㌹㌺
agpx
A b c d 亹怔떟떥
#IDUG
4

Character Encoding

• A character encoding or code page is a mapping of symbols of

a character set to bit patterns which are also referred as code
points.
A → 17, B → 23, C → 42, …
• Typical examples of encodings are ASCII, EBCDIC or Unicode.

• Part of the encoding scheme is also the definition

of a serialisation scheme to convert
the code point into a sequence of bytes.
#IDUG
5

ASCII

• Sample of an encoding scheme:

• First version 1963, Standardized 1968

• Ordered mapping to 7-bit numbers
#IDUG
6

Single Byte Char Sets (SBCS)

• Extensions from 7-bit ASCII to 8-bit code pages

• ISO-8859-x: ASCII + special characters for some languages
• ISO-8859-1 (Latin 1): ASCII + Westeuropean Chars
• ISO-8859-2 (Latin 2): ASCII + Easteuropean Chars
• ISO-8859-15: Modified ISO-8859-1 including Euro-Symbol (€)
• Platform specific charsets: Windows ANSI or MacRoman
#IDUG
7

Double Byte Char Sets (DBCS)

• Expansion of the SBCS concept from one byte to two bytes per
character
• Mainly used for asiatic languages with more than 256
characters to encode
• Latin text is expanded to twice the size of SBCS
#IDUG
8

EUC (Extended Unix Code)

• Multi Byte Char Set (MBCS): 2 or 4 bytes/char

• Only used for Japanese, Korean, Traditional and Simplified
Chinese on Unix platform
• Uses single shift characters to switch to a another code group
to build a multi byte character
#IDUG
9

Unicode

• Intended to simplify and unify the different definitions of code

pages and hence conversion.
• The first definition contained 65536 characters
(16-bit, 1991, UCS-2).
• Version 2.0 extended the charset with 16 planes for up to
1.114.112 characters
(32-bit, 1996, UCS-4).
• Today in Unicode Version 4.0 we have approx. 100.000
characters assigned to code points.
#IDUG
10

Unicode char sets and encodings

• UCS-2: two bytes per character

• UCS-4: four bytes per character
• UTF-16: Encoding of UCS-4 into one or two words: the first 64k
code points use two bytes per character, all others four byte
• UTF-8: dynamic or variable length encoding of characters with
one to four bytes
• Possible problems with UCS-2, UCS-4, UTF-16:
Byte order differences (big-endian vs. little-endian) between
different processor architectures.
#IDUG
11

UTF-8

• Encoding in variable length sequence of bytes

• Simple recognition of multibyte chars
• Compact storage of text in latin chars
• Only the shortest encoding allowed
#IDUG
12

Overview

• What are character sets, encoding schemes and code pages?

• Where can I define the code page used?
• What is code page conversion and where does it happen?
• What problems can arise and how can I avoid them?
• Performance considerations
#IDUG
13

Usage of a code page

• Code pages can be specified at different levels:

• At the operating system where the application runs
• At the operating system where the server runs
• At the operating system where the application is
prepared/bound
• At the database level
#IDUG
14

Default code page

• As default DB2 server and clients use the local settings of the
operating system or user:
• Windows: The server process is using the default region settings of the
operating system.
• Linux/Unix: The codepage is derived from the locale setting for the
instance user (i.e. the user running the database processes).
• Client (LUW): The current locale settings of the user determine the code
page used during CONNECT.
• Programming language: Java is always using Unicode when connecting
to a database via JDBC.
#IDUG
15

Specifying a code page: OS level

• Windows: Control Panel → Regional and Language settings,

chcp command
• Linux/Unix: locale command
#IDUG
16

At prepare/bind time

• Special case during development of database software with

static, embedded SQL.
• Embedded SQL needs a prepare phase before compilation of
the source code.
• Later the prepared package needs to be bound to the database
with the bind command.
• Both commands need a database connection and at the
connect time; the current setting of the locale is used.
#IDUG
17

Defining a database w/ code page

• Explicitly set the code page at creation time:

CREATE DB test USING CODESET codeset
TERRITORY territory COLLATE collatingseq
• Otherwise current locale is used to determine database
codeset.
• The choosen code page cannot be changed later.
• In DB2 for iSeries and for z/OS you can also define single
columns of a table in a different code set (not detailed here).
#IDUG
18

Overview

• What are character sets, encoding schemes and code pages?

• Where can I define the code page used?
• What is code page conversion and where does it happen?
• What problems can arise and how can I avoid them?
• Performance considerations
#IDUG
19

Code page conversion

• If application and server use a different code page, code page

conversion happens.
• Code page conversion is always done at the receivers side:
• at the servers side for data sent from client to server
• at the clients side for data sent from server to client
• Exception: Importing IXF files generated on a different system
with another code page
• If conversion tables are missing: SQLCODE -332
#IDUG

Client to server conversion

Client Server
uses code page X uses code page Y

§ Send data using § Receive data

code page X § Convert to code page Y
§ Process data
§ Receive data in Y § Return result in code page Y
§ Convert to code
page X
#IDUG
21

Using DB2 Connect

Client Gateway Server

uses code page X uses code page Y uses code page Z

§ Send data using § Receive data

code page X § Convert to code
page Y
§ Send data in Y § Receive data
§ Convert to code
page Z
§ Receive data in Z § Return result in
§ Convert to Y code page Z
§ Receive data in Y § Return result in Y
§ Convert to code
page X
#IDUG
22

Other considerations

• Mapping of characters (injective):

If a character in the source code page is not contained in the
target code page, it is replaced by a substitution character.
• Round trip conversion (bijective):
If no substitution needs to take place between source and
target code pages, a round trip conversion does not loose
information.
• Encoding/Decoding can change the number of bytes needed to
store the data.
#IDUG
23

More considerations

• Using different conversion tables and €-Symbol:

Microsoft ANSI code page and the official code page 850 have
a different code point for the Euro symbol. If needed code
conversion tables can be replaced (ref. Administration Guide,
Planning).
• Unicode support:
DB2 supports the UCS-2 character set with UTF-8 and UCS-2
encoding for Unicode databases
• For PureXML (V9.x) a UTF-8 database is needed.
#IDUG
24

More considerations

• To change a code page of a database, you have to use

db2move (Export/Import). Backup/Restore cannot be used. So
choosing the right database code page during database
creation is crucial.
• Binary data (BLOB, FOR BIT DATA) is internally stored with code
page 0, so no character conversion is applied.
#IDUG
25

Overview

• What are character sets, encoding schemes and code pages?

• Where can I define the code page used?
• What is code page conversion and where does it happen?
• What problems can arise and how can I avoid them?
• Performance considerations
#IDUG
26

Troubleshooting

• Identify used code pages:

• db2 get db cfg for sample
Retrieves database code page
• Displaying SQLCA area during CONNECT with CLP
When connecting to a database via CLP the option "–a"
displays the SQLCA data area, which shows the code page of
the database and the connecting client.
• If connecting to iSeries or zSeries machines from DB2 LUW,
check if conversion tables are available.
#IDUG
27

Pitfalls

• Watch out for unintentional "conversions"

• All database communication partners are configured correct,
but the DBA is looking via a console window at the data and the
console window (or putty) is using a font with the wrong codepage
to display the data!
#IDUG
28

db2set DB2CODEPAGE

• Know what you intend to do, if you use the DB2 environment
variable DB2CODEPAGE
• It tells DB2 you will feed it with the right code points regardless
of the displayed symbols.

• See Technote "Setting DB2CODEPAGE=1208 may result in

incorrect character data insertion"
SQL0191N Error occurred because of a fragmented MBCS character.
http://www.ibm.com/support/docview.wss?uid=swg21601028
#IDUG
29

db2set DB2CONSOLECP

• Intended to allow DB2 CLI to use different codepages for

output:

• Multiple APARs for DB2 9.1, 9.5, 9.7:

"DB2CONSOLECP environment variable has no effect on DB2
message text or is not working"
#IDUG
30

DB2 Special Registers for NLS

• Change message text for DB2 Monreport modules:

db2 "SET CURRENT LOCALE LC_MESSAGES = 'de_DE'"
db2 "call monreport.lockwait"
• Change message names for Time/Dates:
db2 "SET CURRENT LOCALE LC_TIME = 'fr_FR'"
db2 "values monthname(current date)"
(Works with DAYNAME, MONTHNAME, NEXT_DAY, ROUND, ROUND_TIMESTAMP,
TIMESTAMP_FORMAT, TRUNCATE, TRUNC_TIMESTAMP and VARCHAR_FORMAT)
#IDUG
31

Performance considerations

• Try to avoid unneccessary conversions.

• Create databases already with the code page needed for your
applications.
• For international databases prefer UTF-8, especially when used
with Java programs.
• Remember: Conversion takes time.
#IDUG
32

Links

• IBM developerworks white paper:

http://www.ibm.com/developerworks/db2/library/techarticle/dm-0506chong/index.html

• DB2 Infocenter
http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp

• Unicode
http://www.unicode.org

• UTF-8 article at Wikipedia

http://en.wikipedia.org/wiki/UTF-8
#IDUG

Roland Schock
ARS Computer und Consulting GmbH
[email protected]

C04
Code sets, NLS and character conversion vs. DB2

Increase Immunity
No ratings yet
Increase Immunity
264 pages
Write A Program For Error Detecting Code Using CRC-CCITT (16-Bits)
No ratings yet
Write A Program For Error Detecting Code Using CRC-CCITT (16-Bits)
4 pages
UNIT1 - Introduction Number Systems and Conversion PDF
No ratings yet
UNIT1 - Introduction Number Systems and Conversion PDF
33 pages
Number System Conversion Questions and Answers PDF - Gate Vidyalay
No ratings yet
Number System Conversion Questions and Answers PDF - Gate Vidyalay
34 pages
Uppercase Lowercase Letters B
No ratings yet
Uppercase Lowercase Letters B
5 pages
Alt Codes
No ratings yet
Alt Codes
10 pages
Aix Details
No ratings yet
Aix Details
46 pages
DB2 SQL Tuning Best Practices
No ratings yet
DB2 SQL Tuning Best Practices
22 pages
db2 Unicode-Dbcs
No ratings yet
db2 Unicode-Dbcs
30 pages
COmmel VP Cenovnik-Oktobar 2017
No ratings yet
COmmel VP Cenovnik-Oktobar 2017
19 pages
Academic Workshop Course Book
No ratings yet
Academic Workshop Course Book
377 pages
Topic Note: Ability To Identify Trends/ Relationships: Image Based Patterns
No ratings yet
Topic Note: Ability To Identify Trends/ Relationships: Image Based Patterns
13 pages
XOR
0% (1)
XOR
2 pages
DB2 11.1 New Features Highlight
67% (3)
DB2 11.1 New Features Highlight
64 pages
Sixth Grade Spelling Menu
No ratings yet
Sixth Grade Spelling Menu
1 page
Comp, Comp1, Comp2 and Comp3 in Cobol
No ratings yet
Comp, Comp1, Comp2 and Comp3 in Cobol
3 pages
IBM TechDoc 1 PDF
No ratings yet
IBM TechDoc 1 PDF
695 pages
Avalon 2.3.1.a HexadecimalOctalNumberSystems
No ratings yet
Avalon 2.3.1.a HexadecimalOctalNumberSystems
5 pages
DB2 LUW V11 Certification Training - Part 2
No ratings yet
DB2 LUW V11 Certification Training - Part 2
30 pages
AIX5L StudentGuide PDF
No ratings yet
AIX5L StudentGuide PDF
610 pages
Katalia Small Housing Book
No ratings yet
Katalia Small Housing Book
37 pages
Character Description Example: Uncomplicating The Complicated
No ratings yet
Character Description Example: Uncomplicating The Complicated
2 pages
CL205v1.0 Student Exercises - 06092016
No ratings yet
CL205v1.0 Student Exercises - 06092016
124 pages
Diff Activities
No ratings yet
Diff Activities
6 pages
CL 213 Bstud
No ratings yet
CL 213 Bstud
503 pages
ZOS System Logger
No ratings yet
ZOS System Logger
346 pages
Gs Ciddc - Ps
No ratings yet
Gs Ciddc - Ps
4 pages
Unit III - Digital Image Fundamentals
No ratings yet
Unit III - Digital Image Fundamentals
19 pages
DB2 For ZOS Course 1
No ratings yet
DB2 For ZOS Course 1
728 pages
SQL Tips and Best Practices
0% (1)
SQL Tips and Best Practices
70 pages
ES074STUD
No ratings yet
ES074STUD
456 pages
From ASCII To UTF-8-RolandSchock
No ratings yet
From ASCII To UTF-8-RolandSchock
52 pages
DV With Python-1-5
No ratings yet
DV With Python-1-5
12 pages
S6198XINST Ejercicios
No ratings yet
S6198XINST Ejercicios
170 pages
Set Up Linux On IBM System Z For Production: Front Cover
No ratings yet
Set Up Linux On IBM System Z For Production: Front Cover
190 pages
Academic Workshop Course Book
No ratings yet
Academic Workshop Course Book
329 pages
Share - DB2 Enclaves - V2 PDF
No ratings yet
Share - DB2 Enclaves - V2 PDF
36 pages
Best Practices DB2 BLU Acceleration Für SAP - Olaf Depper
No ratings yet
Best Practices DB2 BLU Acceleration Für SAP - Olaf Depper
67 pages
CRC 1
No ratings yet
CRC 1
22 pages
Utilities
No ratings yet
Utilities
334 pages
Student Manual 251 - 4
No ratings yet
Student Manual 251 - 4
542 pages
Informix 9.x System Admin Guide
No ratings yet
Informix 9.x System Admin Guide
748 pages
Chapter 4 (Cont.) : Real Number Representations
No ratings yet
Chapter 4 (Cont.) : Real Number Representations
18 pages
Station Iata Code Icao Code Watch Hours in Utc Days Ref/Notam
No ratings yet
Station Iata Code Icao Code Watch Hours in Utc Days Ref/Notam
2 pages
IBM Power8硬件培训
No ratings yet
IBM Power8硬件培训
85 pages
Hints and Tips For Implementing Storwize V7000 V3.1 30 - July
No ratings yet
Hints and Tips For Implementing Storwize V7000 V3.1 30 - July
35 pages
Binary
No ratings yet
Binary
2 pages
Math3 q1 Mod06 Identifyingordinalnumbers v2
100% (4)
Math3 q1 Mod06 Identifyingordinalnumbers v2
17 pages
GDV9P km130 Instructorexercises
No ratings yet
GDV9P km130 Instructorexercises
88 pages
Unit1 Cloud Intro v2021
No ratings yet
Unit1 Cloud Intro v2021
67 pages
Guitar Sheet Music and Tabs Solea in Mi E
No ratings yet
Guitar Sheet Music and Tabs Solea in Mi E
6 pages
A Multilingual Database Management System For Ideographic Languages
No ratings yet
A Multilingual Database Management System For Ideographic Languages
10 pages
Changing The Database Character Set Character Set)
No ratings yet
Changing The Database Character Set Character Set)
5 pages
IBM Training: Front Cover
No ratings yet
IBM Training: Front Cover
26 pages
AIX Tuning For Oracle DB
No ratings yet
AIX Tuning For Oracle DB
63 pages
Front Cover: Linux Network Administration I: TCP/IP and TCP/IP Services
No ratings yet
Front Cover: Linux Network Administration I: TCP/IP and TCP/IP Services
26 pages
Electronic Data Systems Version 1.0
No ratings yet
Electronic Data Systems Version 1.0
137 pages
33-International Considerations in SQL Server
No ratings yet
33-International Considerations in SQL Server
10 pages
PoT IM 06 1 027 14-Workbook
No ratings yet
PoT IM 06 1 027 14-Workbook
217 pages
PoT - Im.06.1.027.14 Presentation
No ratings yet
PoT - Im.06.1.027.14 Presentation
196 pages
Keycode Constants in VBA
No ratings yet
Keycode Constants in VBA
4 pages
xw5033 3.50 Part 1 Chassis
No ratings yet
xw5033 3.50 Part 1 Chassis
153 pages
Image Compression Standards
No ratings yet
Image Compression Standards
4 pages
RSCT Diagnosis Guide
No ratings yet
RSCT Diagnosis Guide
278 pages
DB2 Security
No ratings yet
DB2 Security
290 pages
An512inst PDF
No ratings yet
An512inst PDF
754 pages
Front Cover: Query and Manage XML Data With DB2 9
No ratings yet
Front Cover: Query and Manage XML Data With DB2 9
14 pages
The Inevitable Unicode Project: Tikkana Akurati, Upgrade & Unicode Specialist
No ratings yet
The Inevitable Unicode Project: Tikkana Akurati, Upgrade & Unicode Specialist
11 pages
EDB116 CG v1.0 SS
No ratings yet
EDB116 CG v1.0 SS
12 pages
SSF01G04
No ratings yet
SSF01G04
70 pages
10G6 Quick View
No ratings yet
10G6 Quick View
17 pages
Powering Your IBM BladeCenter Chassis
No ratings yet
Powering Your IBM BladeCenter Chassis
49 pages
ES35 - Student Guide
No ratings yet
ES35 - Student Guide
304 pages
DB2 Vs Oracle
No ratings yet
DB2 Vs Oracle
61 pages
Enterprise Database Security & Monitoring: Alfred Horng IBM Software Group
No ratings yet
Enterprise Database Security & Monitoring: Alfred Horng IBM Software Group
19 pages
DB2 Batch
No ratings yet
DB2 Batch
13 pages
DB2 Tech Talk PureData Systems Presentation PDF
No ratings yet
DB2 Tech Talk PureData Systems Presentation PDF
65 pages
Ibm Aix - LVM (Lab)
No ratings yet
Ibm Aix - LVM (Lab)
2 pages
DB2 and SAP DR Using DS8300 Global Mirror V1.1
No ratings yet
DB2 and SAP DR Using DS8300 Global Mirror V1.1
28 pages
Db2 Components & Basics 1
No ratings yet
Db2 Components & Basics 1
18 pages
UTS Station QR Code For All Mumbai Stations (Acti
No ratings yet
UTS Station QR Code For All Mumbai Stations (Acti
5 pages
Ibm Aix
No ratings yet
Ibm Aix
10 pages
Chapter 4 Data Representation
No ratings yet
Chapter 4 Data Representation
10 pages
IBM DB2 For I Catalogs
No ratings yet
IBM DB2 For I Catalogs
3 pages
Assignment - 2 - Solutions
No ratings yet
Assignment - 2 - Solutions
3 pages
Unicode System - Outside Communication For ABAP Programmers
No ratings yet
Unicode System - Outside Communication For ABAP Programmers
44 pages
The Importance of Number System Conversions in Technical Fields Full
100% (1)
The Importance of Number System Conversions in Technical Fields Full
7 pages
Character Sets and Encoding
No ratings yet
Character Sets and Encoding
7 pages
Windows Server 2008 For Dummies
From Everand
Windows Server 2008 For Dummies
Ed Tittel
No ratings yet
VMware Horizon View Essentials
From Everand
VMware Horizon View Essentials
Peter von Oven
No ratings yet

AU14C04-Codepages and DB2

Uploaded by

AU14C04-Codepages and DB2

Uploaded by

#IDUG

Code sets, NLS and

• What are character sets, encoding schemes and code pages?

• Basically a character set is just a collection of entities or

• A character encoding or code page is a mapping of symbols of

• Part of the encoding scheme is also the definition

• Sample of an encoding scheme:

• First version 1963, Standardized 1968

Single Byte Char Sets (SBCS)

• Extensions from 7-bit ASCII to 8-bit code pages

Double Byte Char Sets (DBCS)

EUC (Extended Unix Code)

• Multi Byte Char Set (MBCS): 2 or 4 bytes/char

• Intended to simplify and unify the different definitions of code

Unicode char sets and encodings

• UCS-2: two bytes per character

• Encoding in variable length sequence of bytes

• What are character sets, encoding schemes and code pages?

Usage of a code page

• Code pages can be specified at different levels:

Default code page

Specifying a code page: OS level

• Windows: Control Panel → Regional and Language settings,

• Special case during development of database software with

Defining a database w/ code page

• Explicitly set the code page at creation time:

• What are character sets, encoding schemes and code pages?

Code page conversion

• If application and server use a different code page, code page

Client to server conversion

§ Send data using § Receive data

Using DB2 Connect

Client Gateway Server

§ Send data using § Receive data

• Mapping of characters (injective):

• Using different conversion tables and €-Symbol:

• To change a code page of a database, you have to use

• What are character sets, encoding schemes and code pages?

• Identify used code pages:

• Watch out for unintentional "conversions"

• See Technote "Setting DB2CODEPAGE=1208 may result in

• Intended to allow DB2 CLI to use different codepages for

• Multiple APARs for DB2 9.1, 9.5, 9.7:

DB2 Special Registers for NLS

• Change message text for DB2 Monreport modules:

• Try to avoid unneccessary conversions.

• IBM developerworks white paper:

• UTF-8 article at Wikipedia

You might also like