3GPP TS 23.038

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 56

3GPP TS 23.038 V10.0.

0 (2011-03)
Technical Specification

3rd Generation Partnership Project; Technical Specification Group Core Network and Terminals; Alphabets and language-specific information (Release 10)

The present document has been developed within the 3rd Generation Partnership Project (3GPP TM) and may be further elaborated for the purposes of 3GPP. The present document has not been subject to any approval process by the 3GPP Organisational Partners and shall not be implemented. This Specification is provided for future development work within 3GPP only. The Organisational Partners accept no liability for any use of this Specification. Specifications and reports for implementation of the 3GPP TM system should be obtained via the 3GPP Organisational Partners' Publications Offices.

Release 10

3GPP TS 23.038 V10.0.0 (2011-03)

Keywords
GSM, UMTS, LTE, character

3GPP Postal address 3GPP support office address


650 Route des Lucioles - Sophia Antipolis Valbonne - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16

Internet
http://www.3gpp.org

Copyright Notification No part may be reproduced except as authorized by written permission. The copyright and the foregoing restriction extend to reproduction in all media.
2011, 3GPP Organizational Partners (ARIB, ATIS, CCSA, ETSI, TTA, TTC). All rights reserved. UMTS is a Trade Mark of ETSI registered for the benefit of its members 3GPP is a Trade Mark of ETSI registered for the benefit of its Members and of the 3GPP Organizational Partners LTE is a Trade Mark of ETSI registered for the benefit of its Members and of the 3GPP Organizational Partners GSM and the GSM logo are registered and owned by the GSM Association

3GPP

Release 10

3GPP TS 23.038 V10.0.0 (2011-03)

Contents
Contents....................................................................................................................................................3 Foreword...................................................................................................................................................5 1 Scope......................................................................................................................................................6 2 References..............................................................................................................................................6 3 Abbreviations and definitions................................................................................................................7 4 SMS Data Coding Scheme.....................................................................................................................8 5 CBS Data Coding Scheme...................................................................................................................11 6 Individual parameters...........................................................................................................................14
6.1 General principles................................................................................................................................................14 6.1.1 General notes.....................................................................................................................................................14 6.1.2 Character packing..............................................................................................................................................14 6.1.2.1 SMS Packing..................................................................................................................................................14 6.1.2.1.1 Packing of 7-bit characters..........................................................................................................................14 6.1.2.2 CBS Packing.................................................................................................................................................15 6.1.2.2.1 Packing of 7-bit characters..........................................................................................................................15 6.1.2.3 USSD packing................................................................................................................................................16 6.1.2.3.1 Packing of 7 bit characters..........................................................................................................................16 6.2 Character sets and coding.....................................................................................................................................19 6.2.1 GSM 7 bit Default Alphabet ............................................................................................................................19 6.2.1.1 GSM 7 bit default alphabet extension table ..................................................................................................20 6.2.1.2 National Language Identifier.........................................................................................................................22 6.2.1.2.1 Introduction.................................................................................................................................................22 6.2.1.2.2 Single shift mechanism...............................................................................................................................22 6.2.1.2.3 Locking shift mechanism............................................................................................................................22 6.2.1.2.4 National Language Identifier......................................................................................................................22 6.2.1.2.5 Processing of national language characters.................................................................................................23 6.2.2 8 bit data............................................................................................................................................................24 6.2.3 UCS2 24

Annex A (normative): National Language Tables....................................................................25 A.1 Introduction......................................................................................................................................25 A.2 National Language Single Shift Tables............................................................................................26
A.2.1 Turkish National Language Single Shift Table................................................................................................26 A.2.2 Spanish National Language Single Shift Table................................................................................................27 A.2.3 Portuguese National Language Single Shift Table .........................................................................................28 A.2.4 Bengali National Language Single Shift Table................................................................................................28 A.2.5 Gujarati National Language Single Shift Table ..............................................................................................30 A.2.6 Hindi National Language Single Shift Table...................................................................................................31 A.2.7 Kannada National Language Single Shift Table .............................................................................................32 A.2.8 Malayalam National Language Single Shift Table..........................................................................................33 A.2.9 Oriya National Language Single Shift Table...................................................................................................34 A.2.10 Punjabi National Language Single Shift Table..............................................................................................35 A.2.11 Tamil National Language Single Shift Table.................................................................................................36 A.2.12 Telugu National Language Single Shift Table...............................................................................................37 A.2.13 Urdu National Language Single Shift Table..................................................................................................38

A.3 National Language Locking Shift Tables.........................................................................................39


A.3.1 Turkish National Language Locking Shift Table.............................................................................................39 A.3.2 Void..................................................................................................................................................................40 A.3.3 Portuguese National Language Locking Shift Table.......................................................................................40 A.3.4 Bengali National Language Locking Shift Table.............................................................................................40
3GPP

Release 10

3GPP TS 23.038 V10.0.0 (2011-03)

A.3.5 Gujarati National Language Locking Shift Table............................................................................................42 A.3.6 Hindi National Language Locking Shift Table................................................................................................43 A.3.7 Kannada National Language Locking Shift Table...........................................................................................44 A.3.8 Malayalam National Language Locking Shift Table.......................................................................................45 A.3.9 Oriya National Language Locking Shift Table................................................................................................46 A.3.10 Punjabi National Language Locking Shift Table...........................................................................................47 A.3.11 Tamil National Language Locking Shift Table..............................................................................................48 A.3.12 Telugu National Language Locking Shift Table............................................................................................49 A.3.13 Urdu National Language Locking Shift Table...............................................................................................50

Annex B (informative): Guidelines for creating language tables...............................................51 B.1 Introduction......................................................................................................................................51 B.2 Template for Single Shift Language Tables......................................................................................51 B.3 Template for Locking Shift Language Tables...................................................................................53 Annex C (Informative): Example for locking shift and single shift mechanisms......................54 C.1 Introduction......................................................................................................................................54 C.2 Example of single shift.....................................................................................................................54 C.3 Example of locking shift...................................................................................................................54 Annex D (informative): Document change history.....................................................................56

3GPP

Release 10

3GPP TS 23.038 V10.0.0 (2011-03)

Foreword
This Technical Specification has been produced by the 3rd Generation Partnership Project (3GPP). The contents of the present document are subject to continuing work within the TSG and may change following formal TSG approval. Should the TSG modify the contents of the present document, it will be re-released by the TSG with an identifying change of release date and an increase in version number as follows: Version x.y.z where: x the first digit: 1 presented to TSG for information; 2 presented to TSG for approval; 3 Indicates TSG approved document under change control. y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, updates, etc. z the third digit is incremented when editorial only changes have been incorporated in the specification;

3GPP

Release 10

3GPP TS 23.038 V10.0.0 (2011-03)

1 Scope
The present document defines the character sets, languages and message handling requirements for SMS, CBS and USSD and may additionally be used for Man Machine Interface (MMI) (3GPP TS 22.030 [2]). The specification for the Data Circuit terminating Equipment/Data Terminal Equipment (DCE/DTE) interface (3GPP TS 27.005 [8]) will also use the codes specified herein for the transfer of SMS data to an external terminal.

2 References
The following documents contain provisions which, through reference in this text, constitute provisions of the present document. References are either specific (identified by date of publication, edition number, version number, etc.) or non-specific. For a specific reference, subsequent revisions do not apply. For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same Release as the present document. [1] [2] [3] [4] [5] [6] [7] [8] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] void 3GPP TS 22.030: "Man-Machine Interface (MMI) of the User Equipment (UE)". 3GPP TS 23.090: "Unstructured Supplementary Service Data (USSD) - Stage 2". 3GPP TS 23.040: "Technical realization of the Short Message Service (SMS) ". 3GPP TS 23.041: "Technical realization of Cell Broadcast Service (CBS)". 3GPP TS 24.011: "Point-to-Point (PP) Short Message Service (SMS) support on mobile radio interface". Void. 3GPP TS 27.005: "Use of Data Terminal Equipment - Data Circuit terminating Equipment (DTE DCE) interface for Short Message Service (SMS) and Cell Broadcast Service (CBS)". ISO/IEC 10646: "Information technology; Universal Multiple-Octet Coded Character Set (UCS)". 3GPP TS 24.090: "Unstructured Supplementary Service Data (USSD); Stage 3". ISO 639: "Code for the representation of names of languages". 3GPP TS 23.042: "Compression algorithm for text messaging services". 3GPP TR 21.905: "Vocabulary for 3GPP Specifications". "Wireless Datagram Protocol Specification", Wireless Application Protocol Forum Ltd. ISO 1073-1 and ISO 1073-2 Alphanumeric character sets for optical recognition Parts 1 and 2: Character sets OCR-A and OCR-B, respectively - Shapes and dimensions of the printed image. 3GPP TS 31.102: "Characteristics of the USIM application" 3GPP TS 51.011 Release 4 (version 4.x.x): Specification of the Subscriber Identity Module Mobile Equipment (SIM - ME) interface 3GPP TS 24.294: "IMS Centralized Services (ICS) Protocol via I1 Interface".

3GPP

Release 10

3GPP TS 23.038 V10.0.0 (2011-03)

3 Abbreviations and definitions


For the purposes of the present document, the following terms and definitions apply: National Language Identifier: A code representing a specific language and thereby selecting a specific National Language Table. National Language Locking Shift Table: A national language table which replaces the GSM 7 bit default alphabet table in the case where the locking shift mechanism as defined in subclause 6.2.1.2.3 is used. National Language Single Shift Table: A national language table which replaces the GSM 7 bit default alphabet extension table in the case where the single shift mechanism as defined in subclause 6.2.1.2.2 is used. National Language Table: A table containing the characters of a specific national language. For the purposes of the present document, the abbreviations used in the present document are listed in 3GPP TR 21.905 [14].

3GPP

Release 10

3GPP TS 23.038 V10.0.0 (2011-03)

4 SMS Data Coding Scheme


The TP-Data-Coding-Scheme field, defined in 3GPP TS 23.040 [4], indicates the data coding scheme of the TP-UD field, and may indicate a message class. Any reserved codings shall be assumed to be the GSM 7 bit default alphabet (the same as codepoint 00000000) by a receiving entity. The octet is used according to a coding group which is indicated in bits 7..4. The octet is then coded as follows:

Coding Group Bits 7..4 00xx

Use of bits 3..0 General Data Coding indication Bits 5..0 indicate the following: Bit 5, if set to 0, indicates the text is uncompressed Bit 5, if set to 1, indicates the text is compressed using the compression algorithm defined in 3GPP TS 23.042 [13] Bit 4, if set to 0, indicates that bits 1 to 0 are reserved and have no message class meaning Bit 4, if set to 1, indicates that bits 1 to 0 have a message class meaning:: Bit 1 0 0 1 1 Bit 0 0 1 0 1 Message Class Class 0 Class 1 Default meaning: ME-specific. Class 2 (U)SIM specific message Class 3 Default meaning: TE specific (see 3GPP TS 27.005 [8])

Bits 3 and 2 indicate the character set being used, as follows : Bit 3 Bit2 Character set: 0 0 GSM 7 bit default alphabet 0 1 8 bit data 1 0 UCS2 (16bit) [10] 1 1 Reserved NOTE: The special case of bits 7..0 being 0000 0000 indicates the GSM 7 bit default alphabet with no message class Message Marked for Automatic Deletion Group This group can be used by the SM originator to mark the message ( stored in the ME or (U)SIM ) for deletion after reading irrespective of the message class. The way the ME will process this deletion should be manufacturer specific but shall be done without the intervention of the End User or the targeted application. The mobile manufacturer may optionally provide a means for the user to prevent this automatic deletion. 1000..1011 1100 Bit 5..0 are coded exactly the same as Group 00xx Reserved coding groups Message Waiting Indication Group: Discard Message The specification for this group is exactly the same as for Group 1101, except that: after presenting an indication and storing the status, the ME may discard the contents of the message. The ME shall be able to receive, process and acknowledge messages in this group, irrespective of memory availability for other types of short message.

01xx

3GPP

Release 10

3GPP TS 23.038 V10.0.0 (2011-03)

Coding Group Bits 7..4 1101

Use of bits 3..0 Message Waiting Indication Group: Store Message This Group defines an indication to be provided to the user about the status of types of message waiting on systems connected to the GSM/UMTS PLMN. The ME should present this indication as an icon on the screen, or other MMI indication. The ME shall update the contents of the Message Waiting Indication Status on the SIM (see 3GPP TS 51.011 [18]) or USIM (see 3GPP TS 31.102 [17]) when present or otherwise should store the status in the ME. In case there are multiple records of EFMWIS this information shall be stored within the first record. The contents of the Message Waiting Indication Status should control the ME indicator. For each indication supported, the mobile may provide storage for the Origination Address. The ME may take note of the Origination Address for messages in this group and group 1100. Text included in the user data is coded in the GSM 7 bit default alphabet. Where a message is received with bits 7..4 set to 1101, the mobile shall store the text of the SMS message in addition to setting the indication. The indication setting should take place irrespective of memory availability to store the short message. Bits 3 indicates Indication Sense: Bit 3 0 Set Indication Inactive 1 Set Indication Active Bit 2 is reserved, and set to 0 Bit 1 0 0 1 1 Bit 0 0 1 0 1 Indication Type: Voicemail Message Waiting Fax Message Waiting Electronic Mail Message Waiting Other Message Waiting*

1110

* Mobile manufacturers may implement the "Other Message Waiting" indication as an additional indication without specifying the meaning. Message Waiting Indication Group: Store Message The coding of bits 3..0 and functionality of this feature are the same as for the Message Waiting Indication Group above, (bits 7..4 set to 1101) with the exception that the text included in the user data is coded in the uncompressed UCS2 character set. Data coding/message class Bit 3 is reserved, set to 0. Bit 2 0 1 Bit 1 0 0 1 1 Message coding: GSM 7 bit default alphabet 8-bit data Bit 0 0 1 0 1 Message Class: Class 0 Class 1 default meaning: ME-specific. Class 2 (U)SIM-specific message. Class 3 default meaning: TE specific (see 3GPP TS 27.005 [8])

1111

GSM 7 bit default alphabet indicates that the TP-UD is coded from the GSM 7 bit default alphabet given in clause 6.2.1. When this character set is used, the characters of the message are packed in octets as shown in clause 6.1.2.1.1, and the message can consist of up to 160 characters. The GSM 7 bit default alphabet shall be supported by all MSs and SCs offering the service. If the GSM 7 bit default alphabet extension mechanism is used then the number of displayable characters will reduce by one for every instance where the GSM 7 bit default alphabet extension table is used. 8-bit data indicates that the TP-UD has user-defined coding, and the message can consist of up to 140 octets. UCS2 character set indicates that the TP-UD has a UCS2 [10] coded message, and the message can consist of up to 140 octets, i.e. up to 70 UCS2 characters. The General notes specified in clause 6.1.1 override any contrary

3GPP

Release 10

10

3GPP TS 23.038 V10.0.0 (2011-03)

specification in UCS2, so for example even in UCS2 a <CR> character will cause the MS to return to the beginning of the current line and overwrite any existing text with the characters which follow the <CR>. When a message is compressed, the TP-UD consists of the GSM 7 bit default alphabet or UCS2 character set compressed message, and the compressed message itself can consist of up to 140 octets in total. When a mobile terminated message is class 0 and the MS has the capability of displaying short messages, the MS shall display the message immediately and send an acknowledgement to the SC when the message has successfully reached the MS irrespective of whether there is memory available in the (U)SIM or ME. The message shall not be automatically stored in the (U)SIM or ME. The ME may make provision through MMI for the user to selectively prevent the message from being displayed immediately. If the ME is incapable of displaying short messages or if the immediate display of the message has been disabled through MMI then the ME shall treat the short message as though there was no message class, i.e. it will ignore bits 0 and 1 in the TP-DCS and normal rules for memory capacity exceeded shall apply. When a mobile terminated message is Class 1, the MS shall send an acknowledgement to the SC when the message has successfully reached the MS and can be stored. The MS shall normally store the message in the ME by default, if that is possible, but otherwise the message may be stored elsewhere, e.g. in the (U)SIM. The user may be able to override the default meaning and select their own routing. When a mobile terminated message is Class 2 ((U)SIM-specific), an MS shall ensure that the message has been transferred to the SMS data field in the (U)SIM before sending an acknowledgement to the SC. The MS shall return a "protocol error, unspecified" error message (see 3GPP TS 24.011 [6]) if the short message cannot be stored in the (U)SIM and there is other short message storage available at the MS. If all the short message storage at the MS is already in use, the MS shall return "memory capacity exceeded". This behaviour applies in all cases except for an MS supporting (U)SIM Application Toolkit when the Protocol Identifier (TP-PID) of the mobile terminated message is set to "(U)SIM Data download" (see 3GPP TS 23.040 [4]). When a mobile terminated message is Class 3, the MS shall send an acknowledgement to the SC when the message has successfully reached the MS and can be stored, irrespectively of whether the MS supports an SMS interface to a TE, and without waiting for the message to be transferred to the TE. Thus the acknowledgement to the SC of a TE-specific message does not imply that the message has reached the TE. Class 3 messages shall normally be transferred to the TE when the TE requests "TE-specific" messages (see 3GPP TS 27.005 [8]). The user may be able to override the default meaning and select their own routing. The message class codes may also be used for mobile originated messages, to provide an indication to the destination SME of how the message was handled at the MS. The MS will not interpret reserved or unsupported values but shall store them as received. The SC may reject messages with a Data Coding Scheme containing a reserved value or one which is not supported.

3GPP

Release 10

11

3GPP TS 23.038 V10.0.0 (2011-03)

5 CBS Data Coding Scheme


The CBS Data Coding Scheme indicates the intended handling of the message at the MS, the character set/coding, and the language (when applicable). Any reserved codings shall be assumed to be the GSM 7 bit default alphabet (the same as codepoint 00001111) by a receiving entity. The octet is used according to a coding group which is indicated in bits 7..4. The octet is then coded as follows:

Coding Group Bits 7..4 0000

Use of bits 3..0 Language using the GSM 7 bit default alphabet Bits 3..0 indicate the language: 0000 German 0001 English 0010 Italian 0011 French 0100 Spanish 0101 Dutch 0110 Swedish 0111 Danish 1000 Portuguese 1001 Finnish 1010 Norwegian 1011 Greek 1100 Turkish 1101 Hungarian 1110 Polish 1111 Language unspecified 0000 GSM 7 bit default alphabet; message preceded by language indication. The first 3 characters of the message are a two-character representation of the language encoded according to ISO 639 [12], followed by a CR character. The CR character is then followed by 90 characters of text. 0001 UCS2; message preceded by language indication

0001

The message starts with a two GSM 7-bit default alphabet character representation of the language encoded according to ISO 639 [12]. This is padded to the octet boundary with two bits set to 0 and then followed by 40 characters of UCS2encoded message. An MS not supporting UCS2 coding will present the two character language identifier followed by improperly interpreted user data. 0010..1111 Reserved 0000 Czech 0001 Hebrew 0010 Arabic 0011 Russian 0100 Icelandic 0101..1111 0011 0000..1111 Reserved for other languages using the GSM 7 bit default alphabet, with unspecified handling at the MS Reserved for other languages using the GSM 7 bit default alphabet, with unspecified handling at the MS

0010..

3GPP

Release 10

12

3GPP TS 23.038 V10.0.0 (2011-03)

Coding Group Bits 7..4 01xx

Use of bits 3..0 General Data Coding indication Bits 5..0 indicate the following: Bit 5, if set to 0, indicates the text is uncompressed Bit 5, if set to 1, indicates the text is compressed using the compression algorithm defined in 3GPP TS 23.042 [13] Bit 4, if set to 0, indicates that bits 1 to 0 are reserved and have no message class meaning Bit 4, if set to 1, indicates that bits 1 to 0 have a message class meaning: Bit 1 0 0 1 1 Bit 0 0 1 0 1 Message Class: Class 0 Class 1 Default meaning: ME-specific. Class 2 (U)SIM specific message. Class 3 Default meaning: TE-specific (see 3GPP TS 27.005 [8])

1000 1001

Bits 3 and 2 indicate the character set being used, as follows: Bit 3 Bit 2 Character set: 0 0 GSM 7 bit default alphabet 0 1 8 bit data 1 0 UCS2 (16 bit) [10] 1 1 Reserved Reserved coding groups Message with User Data Header (UDH) structure: Bit 1 0 0 1 1 Bit 0 0 1 0 1 Message Class: Class 0 Class 1 Default meaning: ME-specific. Class 2 (U)SIM specific message. Class 3 Default meaning: TE-specific (see 3GPP TS 27.005 [8])

1010..1100 1101 1110 1111

Bits 3 and 2 indicate the alphabet being used, as follows: Bit 3 Bit 2 Alphabet: 0 0 GSM 7 bit default alphabet 0 1 8 bit data 1 0 USC2 (16 bit) [10] 1 1 Reserved Reserved coding groups I1 protocol message defined in 3GPP TS 24.294 [19] Defined by the WAP Forum [15] Data coding / message handling Bit 3 is reserved, set to 0. Bit 2 0 1 Bit 1 0 0 1 1 Message coding: GSM 7 bit default alphabet 8 bit data Bit 0 Message Class: 0 No message class. 1 Class 1 user defined. 0 Class 2 user defined. 1 Class 3 default meaning: TE specific (see 3GPP TS 27.005 [8])

These codings may also be used for USSD and MMI/display purposes. The message length specified in this subclause is not applicable for UTRAN and E-UTRAN but only applicable for GSM. See 3GPP TS 24.090 [11] for specific coding values applicable to USSD for MS originated USSD messages and MS terminated USSD messages. USSD messages using the default alphabet are coded with the GSM 7-bit default alphabet given in clause 6.2.1. The message can then consist of up to 182 user characters.
3GPP

Release 10

13

3GPP TS 23.038 V10.0.0 (2011-03)

Cell Broadcast messages using the default alphabet are coded with the GSM 7-bit default alphabet given in clause 6.2.1. The message then consists of 93 user characters. If the GSM 7 bit default alphabet extension mechanism is used then the number of displayable characters will reduce by one for every instance where the GSM 7 bit default alphabet extension table is usedCell Broadcast messages using 8-bit data have user-defined coding, and will be 82 octets in length. UCS2 character set indicates that the message is coded in UCS2 [10]. The General notes specified in clause 6.1.1 override any contrary specification in UCS2, so for example even in UCS2 a <CR> character will cause the MS to return to the beginning of the current line and overwrite any existing text with the characters which follow the <CR>. Cell Broadcast messages encoded in UCS2 consist of 41 characters. When a CBS message received by the MS is message class 0 and the MS has the capability of displaying CBS messages, the MS shall display the message immediately. The message shall not be automatically stored in the (U)SIM or ME. The ME may make provision through MMI for the user to selectively prevent the message from being displayed immediately. If the ME is incapable of displaying CBS messages or if the immediate display of the message has been disabled through MMI then the ME shall treat the CBS message as though there was no message class, i.e. it will ignore bits 0 and 1 in the TP-DCS but may store the message either on the ME or on the (U)SIM. Class 1 and Class 2 messages may be routed by the ME to user-defined destinations, but the user may override any default meaning and select their own routing. Class 3 messages will normally be selected for transfer to a TE, in cases where a ME supports an SMS/CBS interface to a TE, and the TE requests "TE-specific" cell broadcast messages (see 3GPP TS 27.005 [8]). The user may be able to override the default meaning and select their own routing. Messages with a User Data Header Structure are encoded as described in 3GPP TS 23.040 [4] for SMS, in subclauses 3.10 and 9.2.3.24. The use of Cell Broadcast DCS values for messages with a User Data Header structure implies that the 82-bytes CB payload has a User Data Header structure. The CBS message information field will contain the IEs as described in 3GPP TS 23.040. The concatenation IEs will not be used, as CB concatenation will rely in that case on the existing CB mechanism. Note that IEs that cannot be split and that IEs that are too large to fit in one CB segment cannot be transmitted using this mechanism. Also, some IEs as defined for SMS are not applicable for CB:

VALUE (hex) 00 01 06 08 20 23 70-7F 80-89

MEANING Concatenated short messages, 8-bit reference number Special SMS Message Indication SMSC Control Parameters Concatenated short message, 16-bit reference number RFC 822 E-Mail Header Enhanced Voice Mail Information (U)SIM Toolkit Security Headers SME to SME specific use

3GPP

Release 10

14

3GPP TS 23.038 V10.0.0 (2011-03)

6 Individual parameters
6.1 General principles
6.1.1 General notes
Except where otherwise indicated, the following shall apply to all character sets: 1: The characters marked "1)" are not used but are displayed as a space. 2: The characters of this set, when displayed, should approximate to the appearance of the relevant characters specified in ISO 1073 [16]and the relevant national standards. 3: Control characters:

Code LF CR SP

Meaning Line feed: Any characters following LF which are to be displayed shall be presented as the next line of the message, commencing with the first character position. Carriage return: Any characters following CR which are to be displayed shall be presented as the current line of the message, commencing with the first character position. Space character.

4: The display of characters within a message is achieved by taking each character in turn and placing it in the next available space from left to right and top to bottom.

6.1.2 Character packing


6.1.2.1
6.1.2.1.1

SMS Packing
Packing of 7-bit characters

If a character number is noted in the following way: b7 b6 b5 b4 b3 b2 b1 a b c d e f g The packing of the 7-bitscharacters in octets is done by completing the octets with zeros on the left. For examples, packing: one character in one octet: bits number: 7 6 5 4 3 2 1 0 0 1a 1b 1c 1d 1e 1f 1g two characters in two octets: bits number: 7 6 5 4 3 2 1 0 2g 1a 1b 1c 1d 1e 1f 1g 0 0 2a 2b 2c 2d 2e 2f

3GPP

Release 10

15

3GPP TS 23.038 V10.0.0 (2011-03)

three characters in three octets: bits number: 7 2g 3f 0 6 1a 3g 0 5 1b 2a 0 4 1c 2b 3a 3 1d 2c 3b 2 1e 2d 3c 1 1f 2e 3d 0 1g 2f 3e

seven characters in seven octets: bits number: 7 2g 3f 4e 5d 6c 7b 0 6 1a 3g 4f 5e 6d 7c 0 5 1b 2a 4g 5f 6e 7d 0 4 1c 2b 3a 5g 6f 7e 0 3 1d 2c 3b 4a 6g 7f 0 2 1e 2d 3c 4b 5a 7g 0 1 1f 2e 3d 4c 5b 6a 0 0 1g 2f 3e 4d 5c 6b 7a

eight characters in seven octets: bits number: 7 2g 3f 4e 5d 6c 7b 8a 6 1a 3g 4f 5e 6d 7c 8b 5 1b 2a 4g 5f 6e 7d 8c 4 1c 2b 3a 5g 6f 7e 8d 3 1d 2c 3b 4a 6g 7f 8e 2 1e 2d 3c 4b 5a 7g 8f 1 1f 2e 3d 4c 5b 6a 8g 0 1g 2f 3e 4d 5c 6b 7a

The bit number zero is always transmitted first. Therefore, in 140 octets, it is possible to pack (140x8)/7=160 characters.

6.1.2.2
6.1.2.2.1

CBS Packing
Packing of 7-bit characters

If a character number is noted in the following way: b7 b6 b5 b4 b3 b2 b1 a b c d e f g

3GPP

Release 10

16

3GPP TS 23.038 V10.0.0 (2011-03)

the packing of the 7-bits characters in octets is done as follows: bit number 7 6 5 4 3 2 1 0 octet number 1 2 3 4 5 6 7 8 81 82 2g 1a 3f 3g 4e 4f 5d 5e 6c 6d 7b 7c 8a 8b 10g 1b 1c 2a 2b 4g 3a 5f 5g 6e 6f 7d 7e 8c 8d 9a 9b . . 93d 93e 0 0 0 0 1d 2c 3b 4a 6g 7f 8e 9c 1e 2d 3c 4b 5a 7g 8f 9d 1f 2e 3d 4c 5b 6a 8g 9e 1g 2f 3e 4d 5c 6b 7a 9f 9g 92a 93b 92b 93c 92c 92d

93f93g 0 93a

The bit number zero is always transmitted first. Therefore, in 82 octets, it is possible to pack (82x8)/7 = 93.7, that is 93 characters. The 5 remaining bits are set to zero as stated above.

6.1.2.3 6.1.2.3.1

USSD packing Packing of 7 bit characters

If a character number is noted in the following way: b7 b6 b5 b4 b3 b2 b1 a b c d e f g The packing of the 7-bit characters in octets is done by completing the octets with zeros on the left. For example, packing: one character in one octet: bits number: 7 6 5 4 3 2 1 0 0 1a 1b 1c 1d 1e 1f 1g two characters in two octets: bits number: 7 6 5 4 3 2 1 0 2g 1a 1b 1c 1d 1e 1f 1g 0 0 2a 2b 2c 2d 2e 2f three characters in three octets: bits number: 7 2g 3f 0 6 1a 3g 0 5 1b 2a 0 4 1c 2b 3a 3 1d 2c 3b 2 1e 2d 3c 1 1f 2e 3d 0 1g 2f 3e

3GPP

Release 10

17

3GPP TS 23.038 V10.0.0 (2011-03)

six characters in six octets: bits number: 7 2g 3f 4e 5d 6c 0 6 1a 3g 4f 5e 6d 0 5 1b 2a 4g 5f 6e 0 4 1c 2b 3a 5g 6f 0 3 1d 2c 3b 4a 6g 0 2 1e 2d 3c 4b 5a 0 1 1f 2e 3d 4c 5b 6a 0 1g 2f 3e 4d 5c 6b

seven characters in seven octets: bits number: 7 2g 3f 4e 5d 6c 7b 0 6 1a 3g 4f 5e 6d 7c 0 5 1b 2a 4g 5f 6e 7d 0 4 1c 2b 3a 5g 6f 7e 1 3 1d 2c 3b 4a 6g 7f 1 2 1e 2d 3c 4b 5a 7g 0 1 1f 2e 3d 4c 5b 6a 1 0 1g 2f 3e 4d 5c 6b 7a

The bit number zero is always transmitted first. eight characters in seven octets: bits number: 7 2g 3f 4e 5d 6c 7b 8a 6 1a 3g 4f 5e 6d 7c 8b 5 1b 2a 4g 5f 6e 7d 8c 4 1c 2b 3a 5g 6f 7e 8d 3 1d 2c 3b 4a 6g 7f 8e 2 1e 2d 3c 4b 5a 7g 8f 1 1f 2e 3d 4c 5b 6a 8g 0 1g 2f 3e 4d 5c 6b 7a

nine characters in eight octets: bits number: 7 2g 3f 4e 5d 6c 7b 8a 0 6 1a 3g 4f 5e 6d 7c 8b 9a 5 1b 2a 4g 5f 6e 7d 8c 9b 4 1c 2b 3a 5g 6f 7e 8d 9c 3 1d 2c 3b 4a 6g 7f 8e 9d 2 1e 2d 3c 4b 5a 7g 8f 9e 1 1f 2e 3d 4c 5b 6a 8g 9f 0 1g 2f 3e 4d 5c 6b 7a 9g

3GPP

Release 10

18

3GPP TS 23.038 V10.0.0 (2011-03)

fifteen characters in fourteen octets: bits number: 7 6 5 4 3 2 1 0 2g 1a 1b 1c 1d 1e 1f 1g 3f 3g 2a 2b 2c 2d 2e 2f 4e 4f 4g 3a 3b 3c 3d 3e 5d 5e 5f 5g 4a 4b 4c 4d 6c 6d 6e 6f 6g 5a 5b 5c 7b 7c 7d 7e 7f 7g 6a 6b 8a 8b 8c 8d 8e 8f 8g 7a 10g 9a 9b 9c 9d 9e 9f 9g 11f11g 10a 10b 10c 10d 12e 12f12g 11a 11b 11c 13d 13e 13f13g 12a 12b 14c 14d 14e 14f14g 13a 15b 15c 15d 15e 15f15g 0 0 0 1 1 0 1 15a

10e 11d 12c 13b 14a

10f 11e 12d 13c 14b

sixteen characters in fourteen octets: bits number: 7 6 5 4 3 2 1 0 2g 1a 1b 1c 1d 1e 1f 1g 3f 3g 2a 2b 2c 2d 2e 2f 4e 4f 4g 3a 3b 3c 3d 3e 5d 5e 5f 5g 4a 4b 4c 4d 6c 6d 6e 6f 6g 5a 5b 5c 7b 7c 7d 7e 7f 7g 6a 6b 8a 8b 8c 8d 8e 8f 8g 7a 10g 9a 9b 9c 9d 9e 9f 9g 11f11g 10a 10b 10c 10d 10e 12e 12f12g 11a 11b 11c 11d 13d 13e 13f13g 12a 12b 12c 14c 14d 14e 14f14g 13a 13b 15b 15c 15d 15e 15f15g 14a 16a 16b 16c 16d 16e 16f16g

10f 11e 12d 13c 14b 15a

The bit number zero is always transmitted first. Therefore, in 160 octets, is it possible to pack (160*8)/7 = 182.8, that is 182 characters. The remaining 6 bits are set to zero as stated above. Packing of 7 bit characters in USSD strings is done in the same way as for SMS (clause 6.1.2.1). The character stream is bit padded to octet boundary with binary zeroes as shown above. If the total number of characters to be sent equals (8n-1) where n=1,2,3 etc. then there are 7 spare bits at the end of the message. To avoid the situation where the receiving entity confuses 7 binary zero pad bits as the @ character, the carriage return or <CR> character (defined in clause 6.1.1) shall be used for padding in this situation, just as for Cell Broadcast. If <CR> is intended to be the last character and the message (including the wanted <CR>) ends on an octet boundary, then another <CR> must be added together with a padding bit 0. The receiving entity will perform the carriage return function twice, but this will not result in misoperation as the definition of <CR> in clause 6.1.1 is identical to the definition of <CR><CR>. The receiving entity shall remove the final <CR> character where the message ends on an octet boundary with <CR> as the last character.

3GPP

Release 10

19

3GPP TS 23.038 V10.0.0 (2011-03)

6.2 Character sets and coding


This section provides list of character sets and codings to be supported by SMS, CBS and USSD. Implementation of the GSM 7 bit default alphabet is mandatory. Support of other character sets is optional. It should be noted that support of Latin and non-Latin languages by GSM 7 bit default alphabet is limited. It is therefore essential to introduce UCS 2 character set in mobile stations, SCs and systems handling SMSs, CBSs and USSDs.

6.2.1 GSM 7 bit Default Alphabet


Bits per character: CBS/USSD pad character: Character table: 7 CR

b 7 b 6 b 5 b 4 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 b 3 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 b 2 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 b 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1

0 0 0 0 @ $ L C 1 1 _ 1

0 0 0 2 S ! " # % & ' ( ) * + , . /

0 1 1 3 0 1 2 3 4 5 6 7 8 9 0 : 1 ; 2 < 3 = 4 > 5 ?

0 1 0 4 A B C D E F G H I F J K L R M N O

1 0 1 5 P Q R S T U V W X Y Z )

1 0 0 6 P a b c d e f g h i j k l m n o

1 1 1 7 p q r s t u v w x y z

1 1

3GPP

Release 10

20

3GPP TS 23.038 V10.0.0 (2011-03)

NOTE 1): This code is an escape to an extension of this table (either to the GSM 7 bit default alphabet extension table, see subclause 6.2.1.1, or a National Language Single Shift Table, see subclause 6.2.1.2.2). A receiving entity which does not understand the meaning of this escape mechanism shall display it as a space character.

6.2.1.1

GSM 7 bit default alphabet extension table

The table below is reserved for symbols of international significance (e.g currency symbols). It also contains a mechanism to permit escape (Note 1) to additional tables for symbols of international significance in the event that the table below becomes fully populated.

3GPP

Release 10

21

3GPP TS 23.038 V10.0.0 (2011-03)

b b b 5 b 4 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 b 3 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 b 2 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 b 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 \ 3 1 { } ^ 0 0 1 1 0 2

7 6 1 3 0 4 |

0 0 1 5

0 1 0 6

0 1 1 7

1 0

1 0

0 1 [ 2 ~ 3 ] 4 5

) )

In the event that an MS receives a code where a symbol is not represented in the above table then the MS shall display either the character shown in the main GSM 7 bit default alphabet table in subclause 6.2.1., or the character from the National Language Locking Shift Table in the case where the locking shift mechanism as defined in subclause 6.2.1.2.3 is used. NOTE 1): This code is reserved for the extension to another extension table. On receipt of this code, a receiving entity shall display a space until another extension table is defined. It is not intended that this extension mechanism should be used as an alternative to UCS2 to enhance the 7bit default alphabet character repertoire for national specific character sets. NOTE 2): Void NOTE 3): This code is defined as a Page Break character and may be used for example in compressed CBS messages. Any mobile station which does not understand the GSM 7 bit default alphabet table extension mechanism will treat this character as Line Feed.

3GPP

Release 10

22

3GPP TS 23.038 V10.0.0 (2011-03)

6.2.1.2 National Language Identifier


6.2.1.2.1 Introduction

The national language tables are used for adding the special characters of certain languages that cannot be expressed using the GSM default 7 bit alphabet. The principle is to use the National Language Identifier to indicate to a receiving entity that the message has been encoded using a national language table. Both single shift and locking shift mechanisms are defined. The single shift mechanism, as defined in subclause 6.2.1.2.2, applies to a single character and it replaces the GSM 7 bit default alphabet extension table defined in subclause 6.2.1.1 with a National Language Single Shift Table (see subclause A.2). The locking shift mechanism, as defined in subclause 6.2.1.2.3, applies throughout the message, or the current segment in case of a concatenated message, and it replaces the GSM 7 bit default alphabet defined in subclause 6.2.1 with a National Language Locking Shift Table (see subclause A.3) that defines the whole character set needed for the language. In case that several languages are used, which require different national language tables, it is recommended to encode the message in UCS-2, however it is possible to use both single shift and locking shift with the corresponding tables in a single message. Implementations based on older reference versions (so-called "legacy implementations") will use the fallback mechanisms that are defined in the earlier versions of the specification for handling of unknown characters.

6.2.1.2.2

Single shift mechanism

In the case where single shift is not combined with locking shift, single shift means that the receiving entity shall decode all characters in the message (or the current segment in case of a concatenated message) using the GSM 7 bit default alphabet unless the escape mechanism is used, i.e <escape><character>, as defined in subclause 6.2.1. The case where single shift and locking shift (which may be for the same or different languages) are combined is described in subclause 6.2.1.2.3. If the escape mechanism is used then instead of the GSM 7 bit default alphabet extension table in subclause 6.2.1.1 the receiving entity shall decode the subsequent character using the National Language Single Shift Table for the indicated language in table 6.2.1.2.4.1. Each time a sending entity requires to send a character from the National Language Single Shift Table the sending entity shall encode this as <escape><character>, where the <character> is encoded using the indicated National Language Single Shift Table.

6.2.1.2.3

Locking shift mechanism

Locking Shift means that the receiving entity shall decode all characters in the message (or the current segment in case of a concatenated message) using the National Language Locking Shift Table unless the escape mechanism is used. i.e. <escape><character>, as defined in subclause 6.2.1. If the escape mechanism is used and no National Language Single Shift Table is indicated (see subclause 6.2.1.2.4), the receiving entity shall decode the message (or the current segment in case of a concatenated message) using the GSM 7 bit default alphabet extension table as defined in subclause 6.2.1.1. If the escape mechanism is used and a National Language Single Shift Table is indicated (see subclause 6.2.1.2.4), the receiving entity shall decode the message (or the current segment in case of a concatenated message) using the National Language Single Shift Table as defined in subclause 6.2.1.2.2.

6.2.1.2.4

National Language Identifier

A National Language Single Shift IE and a National Language Locking Shift IE can be included in the TP User Data Header, as defined in 3GPP TS 23.040 [4]. The receiving entity shall decode using single shift or locking shift as applicable for the language indicated in the National Language Identifier within these IEs.

3GPP

Release 10

23

3GPP TS 23.038 V10.0.0 (2011-03)

The National Language Identifier octet is encoded as shown in table 6.2.1.2.4.1. Table 6.2.1.2.4.1 Language code b7b0 00000000 00000001 00000010 Language Reserved Turkish Spanish National Language Single Shift Table n/a Subclause A.2.1 Subclause A.2.2 National Language Locking Shift Table n/a Subclause A.3.1 Not defined fallback to GSM 7 bit default alphabet (see subclause 6.2.1) Subclause A.3.3 Subclause A.3.4 Subclause A.3.5 Subclause A.3.6 Subclause A.3.7 Subclause A.3.8 Subclause A.3.9 Subclause A.3.10 Subclause A.3.11 Subclause A.3.12 Subclause A.3.13 n/a

00000011 00000100 00000101 00000110 00000111 00001000 00001001 00001010 00001011 00001100 00001101 00001110 to 11111111

Portuguese Bengali Gujarati Hindi Kannada Malayalam Oriya Punjabi Tamil Telugu Urdu Reserved

Subclause A.2.3 Subclause A.2.4 Subclause A.2.5 Subclause A.2.6 Subclause A.2.7 Subclause A.2.8 Subclause A.2.9 Subclause A.2.10 Subclause A.2.11 Subclause A.2.12 Subclause A.2.13 n/a

6.2.1.2.5

Processing of national language characters

When supporting a specific national language, the sending entity shall support the encoding of messages using the corresponding National Language Identifier defined in subclause 6.2.1.2.4. The receiving entity should be able to decode messages usingthe National Language Identifiers defined in subclause 6.2.1.2.4 for the languages that are supported by that entity. If a message is received, containing a National Language Identifier indicating a reserved value or a value that is not supported by the receiving entity, the receiving entity shall ignore the IE (see 3GPP TS 23.040 [4]) in which the National Language Identifier was indicated. The receiving entity shall be capable of processing both single shift and locking shift within the same message. It is an implementation option for the sending entity whether to use the single shift mechanism, the locking shift mechanism or both. NOTE 1: A message using the locking shift mechanism cannot make use of characters from the GSM 7 bit Default Alphabet table unless such characters are replicated in the National Language Locking Shift Table or (in the case of locking shift and single shift), the National Language Single Shift table.

3GPP

Release 10

24

3GPP TS 23.038 V10.0.0 (2011-03)

NOTE 2: Encoding of a message using the national locking shift mechanism is not intended to be implemented until a formal request is issued by the relevant national regulatory body. This is because a receiving entity not supporting the relevant locking-shift decoding will present different characters from the ones intended by the sending entity. NOTE 3: An SMS message using a locking shift table for a language may not be properly displayed when the terminal does not support the locking shift table for that language. When the network is aware of the list of the locking shift tables supported by the UE, the network can deliver the SMS messages using an appropriate encoding.

6.2.2 8 bit data


8 bit data is user defined Padding: CR in the case of an 8 bit character set Otherwise - user defined Character table: User Specific

6.2.3 UCS2
Bits per character: CBS/USSD pad character: Character table: 16 CR ISO/IEC 10646 [10]

3GPP

Release 10

25

3GPP TS 23.038 V10.0.0 (2011-03)

Annex A (normative): National Language Tables A.1 Introduction

This annex contains character tables for specific languages whose characters are not wholly or partially present within the GSM 7 bit default alphabet.

3GPP

Release 10

26

3GPP TS 23.038 V10.0.0 (2011-03)

A.2

National Language Single Shift Tables


A.2.1 Turkish National Language Single Shift Table

b 7 b 6 b 5 b 4 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 b 3 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 b 2 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 B 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 4

0 0 0 0

0 0 1 1

0 1 0 2

0 1 1 3

1 0 0 4 |

1 0 1 5

1 1 0 6

1 1 1 7

{ } 3 1 0 1 [ 2 ~ 3 ] 4 \ 5 ) ) )

NOTE 1): This code is reserved for the extension to another extension table. On receipt of this code, a receiving entity shall display a space until another extension table is defined. NOTE 2): Void NOTE 3): This code is defined as a Page Break character and may be used for example in compressed CBS messages. Any mobile station which does not understand the GSM 7 bit default alphabet table extension mechanism will treat this character as Line Feed. NOTE 4): This code represents a control character and therefore must not be used for language specific characters.

3GPP

Release 10

27

3GPP TS 23.038 V10.0.0 (2011-03)

A.2.2 Spanish National Language Single Shift Table


NOTE: This table also includes the character "" used in Catalan.

b 7 b 6 b 5 b 4 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 b 3 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 b 2 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 b 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 4

0 0 0 0

0 0 1 1

0 1 0 2

0 1 1 3

1 0 0 4 |

1 0 1 5

1 1 0 6

1 1 1 7

{ 3 1 } 0 1 [ 2 ~ 3 ] 4 \ 5 ) ) )

NOTE 1): This code is reserved for the extension to another extension table. On receipt of this code, a receiving entity shall display a space until another extension table is defined. NOTE 2): Void NOTE 3): This code is defined as a Page Break character and may be used for example in compressed CBS messages. Any mobile station which does not understand the GSM 7 bit default alphabet table extension mechanism will treat this character as Line Feed. NOTE 4): This code represents a control character and therefore must not be used for language specific characters.

3GPP

Release 10

28

3GPP TS 23.038 V10.0.0 (2011-03)

A.2.3 Portuguese National Language Single Shift Table

b 7 b 6 b 5 b 4 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 b 3 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 b 2 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 b 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1

0 0 0 0

0 0 1 1

0 1 0 2

0 1 1 3

1 0 0 4 |

1 0 1 5

1 1 0 6

1 1 1 7

^ 3 4 \ 1 { } 0 1 [ 2 ~ 3 ] 4 5 ) ) )

NOTE 1): This code is reserved for the extension to another extension table. On receipt of this code, a receiving entity shall display a space until another extension table is defined. NOTE 2): Void. NOTE 3): This code is defined as a Page Break character and may be used for example in compressed CBS messages. Any mobile station which does not understand the GSM 7 bit default alphabet table extension mechanism will treat this character as Line Feed. NOTE 4): This code represents a control character and therefore must not be used for language specific characters.

A.2.4 Bengali National Language Single Shift Table


NOTE: In the table below, the Bengali characters are represented using Unicode.

3GPP

Release 10

29

3GPP TS 23.038 V10.0.0 (2011-03)

b 7 b 6 b 5 b 4 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 b 3 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 b 2 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 b 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1

0 0 0 0

0 0 1 1

0 1 0 2

0 1 1 3

1 0 0 4

1 0 1 5

1 1 0 6

1 1 1 7

0 1 2 3 4 5

NOTE 1): This code is reserved for the extension to another extension table. On receipt of this code, a receiving entity shall display a space until another extension table is defined. NOTE 2): Void NOTE 3): This code is defined as a Page Break character and may be used for example in compressed CBS messages. Any mobile station which does not understand the GSM 7 bit default alphabet table extension mechanism will treat this character as Line Feed. NOTE 4): This code represents a control character and therefore must not be used for language specific characters.

3GPP

Release 10

30

3GPP TS 23.038 V10.0.0 (2011-03)

A.2.5 Gujarati National Language Single Shift Table


NOTE: In the table below, the Gujarati characters are represented using Unicode.

b 7 b 6 b 5 b 4 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 b 3 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 b 2 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 b 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1

0 0 0 0

0 0 1 1

0 1 0 2

0 1 1 3

1 0 0 4

1 0 1 5

1 1 0 6

1 1 1 7

0 1 2 3 4 5

NOTE 1): This code is reserved for the extension to another extension table. On receipt of this code, a receiving entity shall display a space until another extension table is defined. NOTE 2): Void NOTE 3): This code is defined as a Page Break character and may be used for example in compressed CBS messages. Any mobile station which does not understand the GSM 7 bit default alphabet table extension mechanism will treat this character as Line Feed. NOTE 4): This code represents a control character and therefore must not be used for language specific characters.

3GPP

Release 10

31

3GPP TS 23.038 V10.0.0 (2011-03)

A.2.6 Hindi National Language Single Shift Table


NOTE: In the table below, the Hindi characters are represented using Unicode.

b b b b 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 b 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 b 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 b 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 4 0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1

0 0 0 3 0

0 0 1 2 1

0 1 0 1 2

7 0 6 1 5 1 3

1 0 0 4

1 0 1 5

1 1 0 6

1 1 1 7

0 1 2 3 4 5

NOTE 1): This code is reserved for the extension to another extension table. On receipt of this code, a receiving entity shall display a space until another extension table is defined. NOTE 2): Void NOTE 3): This code is defined as a Page Break character and may be used for example in compressed CBS messages. Any mobile station which does not understand the GSM 7 bit default alphabet table extension mechanism will treat this character as Line Feed. NOTE 4): This code represents a control character and therefore must not be used for language specific characters.

3GPP

Release 10

32

3GPP TS 23.038 V10.0.0 (2011-03)

A.2.7 Kannada National Language Single Shift Table


NOTE: In the table below, the Kannada characters are represented using Unicode.

b b b b 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 b 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 b 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 b 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 4 0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1

0 0 0 3 0

0 0 1 2 1

0 1 0 1 2

7 0 6 1 5 1 3

1 0 0 4

1 0 1 5

1 1 0 6

1 1 1 7

0 1 2 3 4 5

NOTE 1): This code is reserved for the extension to another extension table. On receipt of this code, a receiving entity shall display a space until another extension table is defined. NOTE 2): Void NOTE 3): This code is defined as a Page Break character and may be used for example in compressed CBS messages. Any mobile station which does not understand the GSM 7 bit default alphabet table extension mechanism will treat this character as Line Feed. NOTE 4): This code represents a control character and therefore must not be used for language specific characters.

3GPP

Release 10

33

3GPP TS 23.038 V10.0.0 (2011-03)

A.2.8 Malayalam National Language Single Shift Table


NOTE: In the table below, the Malayalam characters are represented using Unicode.

b 7 b 6 b 5 b 4 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 b 3 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 b 2 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 b 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1

0 0 0 0

0 0 1 1

0 1 0 2

0 1 1 3

1 0 0 4

1 0 1 5

1 1 0 6

1 1 1 7

0 1 2 3 4 5

NOTE 1): This code is reserved for the extension to another extension table. On receipt of this code, a receiving entity shall display a space until another extension table is defined. NOTE 2): Void NOTE 3): This code is defined as a Page Break character and may be used for example in compressed CBS messages. Any mobile station which does not understand the GSM 7 bit default alphabet table extension mechanism will treat this character as Line Feed. NOTE 4): This code represents a control character and therefore must not be used for language specific characters.

3GPP

Release 10

34

3GPP TS 23.038 V10.0.0 (2011-03)

A.2.9 Oriya National Language Single Shift Table


NOTE: In the table below, the Oriya characters are represented using Unicode.

b 7 b 6 b 5 b 4 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 b 3 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 b 2 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 b 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1

0 0 0 0
@ $ " % & ' 3 * + 4 /

0 0 1 1
< = > ^ _ # * 06 94 0 9 1

0 1 0 2

0 1 1 3

1 0 0 4
| A

1 0 1 5
P Q R S T U V W X Y Z )

1 1 0 6

1 1 1 7

06 BC 0D B6

B C D E F G

{ }

H I

0 1 2 [ 3 ~ 4 ] 5

)J K L )M N O

NOTE 1): This code is reserved for the extension to another extension table. On receipt of this code, a receiving entity shall display a space until another extension table is defined. NOTE 2): Void NOTE 3): This code is defined as a Page Break character and may be used for example in compressed CBS messages. Any mobile station which does not understand the GSM 7 bit default alphabet table extension mechanism will treat this character as Line Feed. NOTE 4): This code represents a control character and therefore must not be used for language specific characters.

3GPP

Release 10

35

3GPP TS 23.038 V10.0.0 (2011-03)

A.2.10 Punjabi National Language Single Shift Table


NOTE: In the table below, the Punjabi characters are represented using Unicode.

b 7 b 6 b 5 b 4 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 b 3 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 b 2 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 b 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1

0 0 0 0
@ $ " % & ' 3 * + 4 /

0 0 1 1
< = > ^ _ # * 06 94 0 9 1

0 1 0 2

0 1 1 3

1 0 0 4
| A

1 0 1 5
P Q R S T U V W X Y Z )

1 1 0 6

1 1 1 7

06 AC OD A6

B C D E F G

{ }

H I

0 1 2[
0 A

)J K L )M N O

3~ 4] 5

NOTE 1): This code is reserved for the extension to another extension table. On receipt of this code, a receiving entity shall display a space until another extension table is defined. NOTE 2): Void NOTE 3): This code is defined as a Page Break character and may be used for example in compressed CBS messages. Any mobile station which does not understand the GSM 7 bit default alphabet table extension mechanism will treat this character as Line Feed. NOTE 4): This code represents a control character and therefore must not be used for language specific characters.

3GPP

Release 10

36

3GPP TS 23.038 V10.0.0 (2011-03)

A.2.11 Tamil National Language Single Shift Table


NOTE: In the table below, the Tamil characters are represented using Unicode.

b 7 b 6 b 5 b 4 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 b 3 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 b 2 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 b 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1

0 0 0 0
@ $ " % & ' 3 * + 4 /

0 0 1 1
< = > ^ _ # * 06 94 0 9 1

0 1 0 2

0 1 1 3

1 0 0 4
| A

1 0 1 5
P Q R S T U V W X Y Z )

1 1 0 6

1 1 1 7

0E BC 0D BE

B C D E F G

{ }

H I

0 1 2[
B 08 F 0F BA

)J K L )M N O

3~ 4] 5

NOTE 1): This code is reserved for the extension to another extension table. On receipt of this code, a receiving entity shall display a space until another extension table is defined. NOTE 2): Void NOTE 3): This code is defined as a Page Break character and may be used for example in compressed CBS messages. Any mobile station which does not understand the GSM 7 bit default alphabet table extension mechanism will treat this character as Line Feed. NOTE 4): This code represents a control character and therefore must not be used for language specific characters.

3GPP

Release 10

37

3GPP TS 23.038 V10.0.0 (2011-03)

A.2.12 Telugu National Language Single Shift Table


NOTE: In the table below, the Telugu characters are represented using Unicode.

b 7 b 6 b 5 b 4 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 b 3 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 b 2 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 b 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1

0 0 0 0
@ $ " % & ' 3 * + 4 /

0 0 1 1
< = > ^ _ # *

0 1 0 2
07 CD

0 1 1 3

1 0 0 4
|

1 0 1 5
P Q R S T U V W X Y Z )

1 1 0 6

1 1 1 7

07 CE 0C 6C 0D 6C 0CF 7

A B C D E F G

{ }

H I

0
1

)J K L )M N O

1 2[
0 C 07 CC

3~ 4] 5

NOTE 1): This code is reserved for the extension to another extension table. On receipt of this code, a receiving entity shall display a space until another extension table is defined. NOTE 2): Void NOTE 3): This code is defined as a Page Break character and may be used for example in compressed CBS messages. Any mobile station which does not understand the GSM 7 bit default alphabet table extension mechanism will treat this character as Line Feed. NOTE 4): This code represents a control character and therefore must not be used for language specific characters.

3GPP

Release 10

38

3GPP TS 23.038 V10.0.0 (2011-03)

A.2.13 Urdu National Language Single Shift Table


NOTE: In the table below, the Urdu characters are represented using Unicode.

b 7 b 6 b 5 b 4 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 b 3 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 b 2 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 b 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1

0 0 0 0
@ $ " % & ' 3 * + 4 /

0 0 1 1
< = > ^ _ # * 00 60 0 6 1

0 1 0 2

0 1 1 3

1 0 0 4
|

1 0 1 5
P Q R S T U V W X Y Z )

1 1 0 6

1 1 1 7

1 0 0F 66 07 6F 06B 1 06F 1

A B C D E F

4 6

06 6B { } 6 0

G H I

C 6

0 1 2[ 3~ 4] 50D 6

)J K L )M N O

NOTE 1): This code is reserved for the extension to another extension table. On receipt of this code, a receiving entity shall display a space until another extension table is defined. NOTE 2): Void NOTE 3): This code is defined as a Page Break character and may be used for example in compressed CBS messages. Any mobile station which does not understand the GSM 7 bit default alphabet table extension mechanism will treat this character as Line Feed. NOTE 4): This code represents a control character and therefore must not be used for language specific characters.

3GPP

Release 10

39

3GPP TS 23.038 V10.0.0 (2011-03)

A.3

National Language Locking Shift Tables


A.3.1 Turkish National Language Locking Shift Table

b 7 b 6 b 5 b 4 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 b 3 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 b 2 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 b 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1

0 0 0 0 @ $

0 0 1 1 _

0 1 0 2 S ! " # % & ' ( ) * + ,

0 1 1 3 0 1 2 3 4 5 6 7 8 9 : 0 ; 1 < 2 = 3 > 4 ? 5

1 0 0 4

1 0 1 5 P

1 1 0 6 P a b c d e f g h i j k l m n o

1 1 1 7 p q r s t u v w x y z

A B C D E F G H I F J K L R M N O

Q R S T U V W X Y Z )

. /

NOTE 1): This code is an escape to an extension of this table (either to the GSM 7 bit default alphabet extension table, see subclause 6.2.1.1, or a National Language Single Shift Table, see subclause 6.2.1.2.2). A receiving entity which does not understand the meaning of this escape mechanism shall display it as a space character.

3GPP

Release 10

40

3GPP TS 23.038 V10.0.0 (2011-03)

A.3.2 Void A.3.3 Portuguese National Language Locking Shift Table

b 7 b 6 b 5 b 4 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 b 3 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 b 2 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 b 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 C

0 0 0 0 @ $ L

0 0 1 1 _ ^ \ | 1

0 1 0 2 S ! " # % & ' ( ) * + , . /

0 1 1 3 0 1 2 3 4 5 6 7 8 9 : 0 ; 1 < 2 = 3 > 4 ? 5

1 0 0 4 A B C D E F G H I F J K L R M N O

1 0 1 5 P Q R S T U V W X Y Z )

1 1 0 6 P ~ a b c d e f g h i j k l m n o

1 1 1 7 p q r s t u v w x y z `

NOTE 1): This code is an escape to an extension of this table (either to the GSM 7 bit default alphabet extension table, see subclause 6.2.1.1, or a National Language Single Shift Table, see subclause 6.2.1.2.2). A receiving entity which does not understand the meaning of this escape mechanism shall display it as a space character.

A.3.4 Bengali National Language Locking Shift Table


NOTE: In the table below, the Bengali characters are represented using Unicode.

3GPP

Release 10

41

3GPP TS 23.038 V10.0.0 (2011-03)

0C 9 3 094 C

09C 7

09C B 09C C

NOTE 1): This code is an escape to an extension of this table (either to the GSM 7 bit default alphabet extension table, see subclause 6.2.1.1, or a National Language Single Shift Table, see subclause 6.2.1.2.2). A receiving entity which does not understand the meaning of this escape mechanism shall display it as a space character.

3GPP

Release 10

42

3GPP TS 23.038 V10.0.0 (2011-03)

A.3.5 Gujarati National Language Locking Shift Table


NOTE: In the table below, the Gujarati characters are represented using Unicode.

NOTE 1): This code is an escape to an extension of this table (either to the GSM 7 bit default alphabet extension table, see subclause 6.2.1.1, or a National Language Single Shift Table, see subclause 6.2.1.2.2). A receiving entity which does not understand the meaning of this escape mechanism shall display it as a space character.

3GPP

Release 10

43

3GPP TS 23.038 V10.0.0 (2011-03)

A.3.6 Hindi National Language Locking Shift Table


NOTE: In the table below, the Hindi characters are represented using Unicode.

04 9 3 094 4 094 5 094 6 094 7 094 8 094 9 094 A 094 B 094 C 094 D
NOTE 1): This code is an escape to an extension of this table (either to the GSM 7 bit default alphabet extension table, see subclause 6.2.1.1, or a National Language Single Shift Table, see subclause 6.2.1.2.2). A receiving entity which does not understand the meaning of this escape mechanism shall display it as a space character.

097 2 097 B 097 C 097 E

3GPP

Release 10

44

3GPP TS 23.038 V10.0.0 (2011-03)

A.3.7 Kannada National Language Locking Shift Table


NOTE: In the table below, the Kannada characters are represented using Unicode.

NOTE 1): This code is an escape to an extension of this table (either to the GSM 7 bit default alphabet extension table, see subclause 6.2.1.1, or a National Language Single Shift Table, see subclause 6.2.1.2.2). A receiving entity which does not understand the meaning of this escape mechanism shall display it as a space character.

3GPP

Release 10

45

3GPP TS 23.038 V10.0.0 (2011-03)

A.3.8 Malayalam National Language Locking Shift Table


NOTE: In the table below, the Malayalam characters are represented using Unicode.

NOTE 1): This code is an escape to an extension of this table (either to the GSM 7 bit default alphabet extension table, see subclause 6.2.1.1, or a National Language Single Shift Table, see subclause 6.2.1.2.2). A receiving entity which does not understand the meaning of this escape mechanism shall display it as a space character.

3GPP

Release 10

46

3GPP TS 23.038 V10.0.0 (2011-03)

A.3.9 Oriya National Language Locking Shift Table


NOTE: In the table below, the Oriya characters are represented using Unicode.

03 B3

NOTE 1): This code is an escape to an extension of this table (either to the GSM 7 bit default alphabet extension table, see subclause 6.2.1.1, or a National Language Single Shift Table, see subclause 6.2.1.2.2). A receiving entity which does not understand the meaning of this escape mechanism shall display it as a space character.

3GPP

Release 10

47

3GPP TS 23.038 V10.0.0 (2011-03)

A.3.10 Punjabi National Language Locking Shift Table


NOTE: In the table below, the Punjabi characters are represented using Unicode.

NOTE 1): This code is an escape to an extension of this table (either to the GSM 7 bit default alphabet extension table, see subclause 6.2.1.1, or a National Language Single Shift Table, see subclause 6.2.1.2.2). A receiving entity which does not understand the meaning of this escape mechanism shall display it as a space character.

3GPP

Release 10

48

3GPP TS 23.038 V10.0.0 (2011-03)

A.3.11 Tamil National Language Locking Shift Table


NOTE: In the table below, the Tamil characters are represented using Unicode.

NOTE 1): This code is an escape to an extension of this table (either to the GSM 7 bit default alphabet extension table, see subclause 6.2.1.1, or a National Language Single Shift Table, see subclause 6.2.1.2.2). A receiving entity which does not understand the meaning of this escape mechanism shall display it as a space character.

3GPP

Release 10

49

3GPP TS 23.038 V10.0.0 (2011-03)

A.3.12 Telugu National Language Locking Shift Table


NOTE: In the table below, the Telugu characters are represented using Unicode.

NOTE 1): This code is an escape to an extension of this table (either to the GSM 7 bit default alphabet extension table, see subclause 6.2.1.1, or a National Language Single Shift Table, see subclause 6.2.1.2.2). A receiving entity which does not understand the meaning of this escape mechanism shall display it as a space character.

3GPP

Release 10

50

3GPP TS 23.038 V10.0.0 (2011-03)

A.3.13 Urdu National Language Locking Shift Table


NOTE: In the table below, the Urdu characters are represented using Unicode.

04 6D 060 5 064 F

NOTE 1): This code is an escape to an extension of this table (either to the GSM 7 bit default alphabet extension table, see subclause 6.2.1.1, or a National Language Single Shift Table, see subclause 6.2.1.2.2). A receiving entity which does not understand the meaning of this escape mechanism shall display it as a space character.

3GPP

Release 10

51

3GPP TS 23.038 V10.0.0 (2011-03)

Annex B (informative): Guidelines for creating language tables B.1 Introduction

This annex provides guidelines for creating language tables. It is recommended that the characters and their positions in the table are checked by people fluent in the appropriate language, and preferably endorsed by an appropriate responsible body. It is recommended that character positions are carefully selected so that receiving entities, which do not support the specific table, display symbols (glyphs) similar to the wanted symbols (glyphs) as far as possible.

B.2

Template for Single Shift Language Tables

The format and structure of the table below shall be used to document the Language specific character codes used in the National Language selection mechanism. It is recommended that a National Language Single Shift Table includes the characters represented in the GSM 7 bit default alphabet extension table (as defined in subclause 6.2.1.1) in the same character position. This ensures the availability of these characters in case when the single shift mechanism is used.

Language (Note. The actual Country and table content will be annotated when the country is known).

3GPP

Release 10

52

3GPP TS 23.038 V10.0.0 (2011-03)

b 7 b 6 b 5 b 4 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 b 3 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 b 2 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 b 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1

0 0 0 0

0 0 1 1

0 1 0 2

0 1 1 3

1 0 0 4

1 0 1 5

1 1 0 6

1 1 1 7

3 1

0 1 2

) )

3 4 5

NOTE 1): This code is reserved for the extension to another extension table. On receipt of this code, a receiving entity shall display a space until another extension table is defined. NOTE 2): Void NOTE 3): This code is defined as a Page Break character and may be used for example in compressed CBS messages. Any mobile station which does not understand the GSM 7 bit default alphabet table extension mechanism will treat this character as Line Feed. NOTE 4): This code represents a control character and therefore must not be used for language specific characters.

3GPP

Release 10

53

3GPP TS 23.038 V10.0.0 (2011-03)

B.3

Template for Locking Shift Language Tables

The format and structure of the table below shall be used to document the Language specific character codes used in the National Language selection mechanism.

Language (Note. The actual Country and table content will be annotated when the country is known).

b 7 b 6 b 5 b 4 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 b 3 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 b 2 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 b 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 C

0 0 0 0

0 0 1 1

0 1 0 2 S

0 1 1 3

1 0 0 4

1 0 1 5

1 1 0 6 P

1 1 1 7

L 1

0 1 2 3 4 5 R

F )

NOTE 1): This code is an escape to an extension of this table (either to the GSM 7 bit default alphabet extension table, see subclause 6.2.1.1, or a National Language Single Shift Table, see subclause 6.2.1.2.2). A receiving entity which does not understand the meaning of this escape mechanism shall display it as a space character.

3GPP

Release 10

54

3GPP TS 23.038 V10.0.0 (2011-03)

Annex C (Informative): Example for locking shift and single shift mechanisms C.1 Introduction

This annex gives an overview on how the national language extension mechanism of the GSM 7 bit default alphabet works. This annex shows how a message with an indication of the Turkish National Language Identifier is decoded, but the same principles apply to other languages.

C.2

Example of single shift

This example outlines the behaviour of both supporting and non-supporting receiving entities where the Turkish National Language Single Shift Table is indicated in the received message. In this example there is no locking shift mechanism used in parallel. A non-supporting receiving entity will ignore the National Language Single Shift IE, and decode the message contents using the GSM 7 bit default alphabet table defined in subclause 6.2.1, including possible escape characters to the GSM 7 bit default alphabet extension table specified in subclause 6.2.1.1. For example the Turkish word "Trke" will be displayed as "Trkce". A receiving entity that supports the Turkish National Language Single Shift Table will detect a National Language Single Shift IE in a TP User Data Header. This IE tells the receiving entity that the single shift mechanism is used. A supporting receiving entity will notice the language code, in this example coded as '0000 0001', and therefore use the Turkish National Language Single Shift Table defined in subclause A.2.1 instead of the GSM 7 bit default alphabet extension table defined in subclause 6.2.1. If the next character is any character except <escape>, then the GSM 7 bit default alphabet table is used for the decode. If the next character is <escape> then the Turkish language specific table is used for the decode of the one character that follows the <escape>. This process will be repeated until the end of the received message, or until the end of the current segment of a concatenated message. The Language selection at the start of a message takes 4 octets which correspond to five 7 bit characters which reduces the maximum number of characters per single message to 155. Thereafter, the number of characters within that single message will be dependent upon the number of times a character is used that is within the National Language Single Shift Table. Every character used from the National Language Single Shift Table will need an additional character to identify the escape to the National Language Single Shift Table. The available 155 character capacity of a single message will therefore be reduced accordingly. This reduction of overall message length also applies when using characters from the GSM 7 bit default alphabet extension table (see subclause 6.2.1.1) when the National Language Single Shift IE is not used.

C.3

Example of locking shift

This example outlines the behaviour of both supporting and non-supporting receiving entities where the Turkish National Language Locking Shift Table is indicated in the received message. A non-supporting receiving entity will ignore the National Language Locking Shift IE, and decode the message contents using the GSM 7 bit default alphabet defined in subclause 6.2.1, including possible escape characters to the GSM 7 bit default alphabet extensions specified in subclause 6.2.1.1. A receiving entity that supports the scheme will detect a National Language Locking Shift IE in a TP User Data Header. This IE tells the receiving entity that the locking shift mechanism is used. If no National Language Single Shift IE is indicated additionally to the National Language Locking Shift IE, then the whole message is decoded using the National Language Locking Shift Table defined for Turkish language in subclause 6.2.1.2.4.1.
3GPP

Release 10

55

3GPP TS 23.038 V10.0.0 (2011-03)

If, in addition to the National Language Locking Shift IE (which may be for Turkish or another language), a National Language Single Shift IE for the Turkish language is indicated, then <escape> makes an exception to the use of the National Language Locking Shift Table for the Turkish or another language. In that case a character following <escape> is decoded using the National Language Single Shift Table for the Turkish language, after which the use of the National Language Locking Shift Table for the Turkish or another language is resumed until the next <escape> or the end of the message is met. The Language selection at the start of a message takes 4 octets which corresponds to five 7 bit characters which reduces the maximum number of characters per single message to 155, unless the National Language Single Shift IE has also been included, in which case there is a further reduction of 3 octets making 7 octets in total, which corresponds to eight 7 bit characters, which reduces the maximum number of characters per single message to 152. Thereafter, if the single shift mechanism is used additionally to the locking shift mechanism, the number of characters within that single message will be dependent upon the number of times a character is used that is within the National Language Single Shift Table. Every character in the National Language Single Shift Table will use an additional character. The available 152 character single message length will therefore be reduced accordingly. This reduction of overall message length also applies when using characters from the GSM 7 bit default alphabet extension table (see subclause 6.2.1.1) when the National Language Single Shift IE is not used.

3GPP

Release 10

56

3GPP TS 23.038 V10.0.0 (2011-03)

Annex D (informative): Document change history


TSG# TDoc T#4 T#4 T#5 T#6 T#8 T#10 T#11 T#13 T#14 T#15 T#21 T#25 T#25 CT#31 CT-39 CT-39 CT-40 VERS NEW_ VERS 3.0.0 TP-99124 3.0.0 3.1.0 TP-99177 3.1.0 3.2.0 TP-99237 3.2.0 3.3.0 TP-000074 3.3.0 4.0.0 TP-000195 4.0.0 4.1.0 TP-010029 4.1.0 4.2.0 TP-010194 TP-010280 TP-020015 TP-030173 TP-040205 TP-040171 CP-060126 CP-080223 CP-080138 CP-080361 4.2.0 4.3.0 4.4.0 5.0.0 6.0.0 6.0.0 6.1.0 7.0.0 7.0.0 8.0.0 4.3.0 4.4.0 5.0.0 6.0.0 6.1.0 6.1.0 7.0.0 8.0.0 8.0.0 8.1.0 8.1.0 8.2.0 9.0.0 9.0.0 9.1.0 9.1.0 9.1.1 10.0.0 CR New 001 002 003 004 005 006 007 008 009 010 013 REV Rel R99 R99 R99 Rel4 Rel4 Rel4 Rel4 Rel4 Rel5 Rel6 Rel6 Rel6 Rel-7 Rel-8 Rel-8 Rel-8 Rel-8 Rel-8 Rel-9 Rel-9 Rel-9 Rel-9 Rel-9 Rel-10 CAT A B F B B C F F F C F F F B B F F F B F F B WORK ITEM MExE TEI TEI TEI TEI UICC1CPHS TEI4 TEI4 TEI5 TEI6 TEI6 SUBJECT Creation of 3GPP TS 23.038 v1.0.0 out of GSM 03.38 v7.1.0 Data Coding Scheme for WAP over USSD and CB Language codes for Hebrew,Arabic and Russian Adaptations for UMTS Automatic removal of read SMS Data coding scheme value for the Icelandic language Message Waiting Indication Status storage on the USIM

014 015r1 0017 5 0019 0020 0021 3 0229 1 0231 3 0232 0235 3 0236

CT-40 CP-080361 8.0.0 CT-41 CP-080536 8.1.0 CT-45 CP-090682 8.2.0 CT-45 CP-090682 8.2.0 CT-46 CP-090912 9.0.0 CT-46 CP-090925 9.0.0 9.1.0 CT-51 9.1.1

Support to UCS2 and editorial corrections Deletion of GSM 01.04 reference User Data Header support over CBS Additional Indications in SMS DCS Message Waiting Indication how to handle Multiple Subscriber Profiles TEI6 Enhanced Voice Mail Information not applicable for CBS TEI7 CBS Reference removal TEI8 SMS default alphabet. Generic solution for all languages TEI8 SMS-addition of turkish national language locking shift table TEI8 Corrections to single shift language tables for Turkish and Spanish TEI8 Addition of national language tables for Portuguese TEI8 Clarification of Locking / Single shift IEs for different languages in a single SM TEI9 Addition of language tables for India TEI9 CBS Message Class 0 ETWS Define the use of data coding scheme for the ETWS warning message IMS_SCC- Uniquely identify the I1 protocol in USSD ICS_I1 Correction of a typo error in the change history table (wrong version number) Upgrade to Rel-10

3GPP

You might also like