0% found this document useful (0 votes)
82 views58 pages

Arwi: Case Study of Arabic, Syriac, and Diacritical Unicode Characters

This document presents a case study of the Arwi script, which is used to write the Tamil language using Arabic script. Arwi was widely used by Muslims in Tamil Nadu, India and Sri Lanka to write religious texts and letters. It combined the Arabic abjad with additional diacritical marks and characters to write Tamil from right to left. While Arwi usage has declined due to lack of printing support, some religious texts remain important to communities. The document outlines the history and components of the Tamil and Arabic scripts, compares their alphabets, and discusses Unicode character coverage for representing Arwi.

Uploaded by

ALI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views58 pages

Arwi: Case Study of Arabic, Syriac, and Diacritical Unicode Characters

This document presents a case study of the Arwi script, which is used to write the Tamil language using Arabic script. Arwi was widely used by Muslims in Tamil Nadu, India and Sri Lanka to write religious texts and letters. It combined the Arabic abjad with additional diacritical marks and characters to write Tamil from right to left. While Arwi usage has declined due to lack of printing support, some religious texts remain important to communities. The document outlines the history and components of the Tamil and Arabic scripts, compares their alphabets, and discusses Unicode character coverage for representing Arwi.

Uploaded by

ALI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/270278754

Arwi: Case Study of Arabic, Syriac, and Diacritical Unicode Characters

Conference Paper · January 2008

CITATIONS READS
0 589

1 author:

Seyed Buhari
King Abdulaziz University
113 PUBLICATIONS   810 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

master thesis View project

All content following this page was uploaded by Seyed Buhari on 04 January 2015.

The user has requested enhancement of the downloaded file.


Arwi: Case study of Arabic, Syriac and
Diacritical Unicode characters

M.I. Seyed Mohamed Buhari


Department of Computer Science,
Faculty of Science,
Universiti Brunei Darussalam,
Brunei Darussalam
Email: [email protected]
[email protected]

1
Arwi – An Introduction

n Called as Arabu-Tamil or Arabic-Tamil Script


n Used by Muslims of Tamil Nadu (in India) and
Sri Lanka
n Used to write their religious texts as well as
communication letters
n Writing Tamil language using Arabic style of
Script
n Like Malay language written in Jawi Script

Arwi (also called as Arabu-Tamil or Arabic-Tamil) Script was widely used by


Muslims of Tamil Nadu (in India) and Sri Lanka to write their religious texts as
well as communication letters. Arwi Script is writing Tamil language using
Arabic style of scripts. This is similar to writing Malay language both in English
and Jawi Script. Actually, Tamil language is written in Tamil Script, which is of
left-to-right pattern. Arwi is written using Arabic script with an addition of
certain diacritics and characters, which is of right-to-left pattern.

2
Tamil Script
n Historical information: 100 BC
q http://www.xs4all.nl/~wjsn/tekst/taalschriften.htm#QXQ

n Based on Brahmi Script


q Like Devanagari, Malayalam, Telugu, etc.

n Left-to-right writing
n Uses 65 characters and variants
q Uses combinations of them also

n Uses HTML codes 2944 to 3071


q Unicode (U+0B80 – U+0BFF)
n Ref: http://www.xs4all.nl/~wjsn/tamil.htm

Spoken in Tamil Nadu, Malaysia, Singapore, etc.

3
Arabic Script

n Belongs to Semitic Language family


q Recorded back to thousands of years

n Right-to-left writing
q Like Persian, Urdu

n Contains 28 characters and 6 vowels


n Character representation:
q Initial, Middle and final forms

n Diacritical marks: Damma (/u/), Fatha (/a/), Kasrah (/i/)


n Has influence on many other languages

Official language in most countries in Middle-East and North Africa.

4
Arwi Script

n Outcome of cultural relations between Arabs and


Tamil-speaking Muslims of Tamil Nadu
n Spread to Sri Lanka, Malaysia, Thailand, etc
n Based on Arabic Script
q Addition (13 in number)

n Right-to-left writing
n Used to write variety of Islamic books
q Belief, Law, Sufism, Medicine, etc

n http://en.wikipedia.org/wiki/Arwi_language

5
Arwi Script
n Arwi script – A sample page from the book titled
“Sumthu Subyan”

6
Arwi Script

n Achievement of this script:


q Has provided necessary information (both

religious and society)


n People still use certain Arwi words, which were
borrowed from Arabic
q Example (to note a very few):

n Amma (Mother) is used as Umma (Ummun


from Arabic)
n Raahat, Kithaab, Mowth, etc.

Arwi Script has helped the Muslim community to learn write Arabic te xt faster,
which is the language of the Holy Quran.

7
Status of Arwi Script
n Arwi is still used in certain Islamic schools
(madrashas) in Tamil Nadu
n Some famous books are preserved in libraries
n Lack of printing facility has affected the further
usage of this script
q Few books written in Arwi have been
translated into Tamil Script
q This shows the importance of those texts to
the public
n As per the knowledge of the author, no ARWI
font exists

8
Status of Arwi Script - Wikipedia

Ref.: http://en.wikipedia.org/wiki/Arwi_language
Famous books by great scholars like Imaam Shaafi (Radiallahu Anhu - May
Allah be pleased with him) and Imaam Abu Hanifa (May Allah be pleased with
him) have been translated into Arabic-Tamil. Authors also indicate that decline
of Arwi has caused a steady decline in the education of the women in the latter
half of the 20th century. Characters mentioned as work in progress are
handled in our work.

9
Related Works

n Wikipedia indicates that Arwi was taught in


Indonesia, Thailand, Malaysia, Myanmar and
Pakistan
n Tschacher[1] notes about the use of Arwi for
poem writing
n Shuayb Alim[2] indicates that Arwi was used
by Malaysians in their daily life
n Few authors quote the use of Arwi in Sri
Lanka

1. Tschacher. T., "How to die before dying? Sharia and Sufism in a 19th
century Arabic-Tamil Poem", Panel 38, 18th European Conference on
Modern South Asian Studies , at Lund, Sweden, 6 – 9 July 2004.
http://www.sasnet.lu.se/panelabstracts/38.html
2. Shuayb Alim, "Arabic, Arwi and Persian in Sarandib and Tamil Nadu",
Madras, 1993.

10
Related Works

n Reasons for decline of usage of Arwi as quoted


[2,3]:
q Lack of printing facilities

q Use of Urdu as the teaching medium in many

Muslim schools in Tamil Nadu


n Nuhman[4] quotes about discussion of having
Tamil as one of the official languages in Sri
Lanka. Multiple Arabic characters are present for
one Tamil character

3. http://www.armu.com/armu/works/archives/12dec1998/amc1.html
[Accessed on: 22nd April 2008]
4. Nuhman MA, "Sri Lankan Muslims: Ethnic Identity within Cultural Diversity",
International Centre For Ethnic Studies, Colombo, Sri Lanka, 2007.
Nuhman [7] quotes about the bill on whether to make Sinhala as the only
official language. In that discussion, some speakers have quoted that Arwi
used as a writing script by Muslims is Tamil language. Those speakers
were stating the importance of making Tamil as one of the official
languages. Nuhman [7] refers to the issues of understanding Arwi scripts
by people who understand Arabic and those who understand Tamil. People
who understand Arabic can read Arwi but can't understand and those who
know Tamil and not Arabic can't read but if someone reads for them they
can understand Arwi. Author describes about the use of Arabic script for
languages like Malayalam and Bengali.

11
Related Works

n Nuhman[4] indicates:
q Arabic: 28 characters and 6 vowels

q Tamil: 30 letters (12 vowels and 18

consonants); Also has 216 syllabic symbols


apart from basic vowels and consonants
q Joining of drastically different languages was

handled
n Mohan [5] quotes the presence of many literary
works in Arwi

Nuhman[4] indicates that there were 200 published and around 2000
unpublished literary works written in Arwi. Thus, two drastically different
languages where combined to form a scripting language Arwi instead of
developing a brand new language. Author concludes by saying that we
could use any writing script to write any other languages with some
modifications except for those languages like Chinese which uses
ideograph.
5. Mohan V., "Muslims of Sri Lanka", Aalekh Publishers, Jaipur , India, 1985.

12
Arwi Books

n Religious Rules
q Maani (The Treasure) - Maapillai Lebbe Alim

@ Seyed Mohamed Ibnu Ahamed Lebbe (May


Allah be pleased with him) (1816 – 1898) from
Kayalpattinam, Tamil Nadu, India
q Sumthu Subyaan

n Poems:
q Adabumalai (About Morale and Discipline)

q Thakkasuruth (About Rules for Prayer)

Note: Shamu Sihabudeen Appa has written many poets (including Adabumalai
and Thakkasuruth) in Arwi. Mapillai Lebbe Alim has written many books in
Arwi.

13
Tamil and Arwi Alphabets – A Comparison

14
Arwi Script – Available Unicode Equivalents

15
Arwi Script – Available Unicode Equivalents
n Dot below 0643 (Kaf) is also needed
n Number representation in Arabic: (U+0661 –
U+0669)

n Number representation in Arabic: (U+06F1 –


U+06F9)

Arabic Numerals are not exactly followed in Arwi. There is slight difference
between them with regards to numerals 4, 5 and 6. Sometimes, eve n the
numeral 7 is expressed slightly different (something like the English character
‘L’).

16
Arwi Script – Available Unicode Equivalents

n Note that we have used Unicode characters


from:
q Arabic (U+0600 – U+06FF)

q Syriac (U+0700 – U+074F)

q Combinational Diacritical Marks (U+0300 –


U+036F)
q Additionally, we did have a look at Forms-A
and Forms-B. But, none of them was used.
n Arabic Presentation Forms-A (U+FE70 – U+FEFF)
n Arabic Presentation Forms-B (U+FB50 – U+FDFF)

17
Font Development

n Issues to be considered:
q Needs to consider cursive nature, joining,

diacritical characters and forms of the


characters
q Cater all kinds of Operating system and

Editing Software
n Rendering issues of different editing tools

q Have to consider the development of keymap

which is closely related to Arabic keymap

18
Font Development

n Two Approaches:
q Development of a web page where people can type in

Tamil Script directly or type in English but the


characters will be changed to Arwi Script
n Constrained on the fonts available on the user’s
PC.
q Development of a new font

n Need to install and cater for different operating


systems
n User has to learn to type using the new Arwi
Keymap.

19
JavaScript based Arwi Typing Webpage

n Install necessary fonts


q Windows: Install Complex script and right-to-
left languages
q Linux: Generally BD (Bidirectional) or
Multilingual Support is there by default
n Features
q Uses JavaScript (Client-Side)

q Works on Windows and Linux

q Users who know typing in Tamil can type in


Tamil directly.

http://en.wikipedia.org/wiki/Help:Multilingual_support_(Indic)
At first, to enable type in Arabic or Arwi Font, Install files for complex script and
right-to-left languages (including Thai) option must be enabled on the users'
PC. This is done using Regional and Language Settings in Control Panel.

20
JavaScript based Arwi Typing Webpage

Virtual Keypad

Expected Webpage

This software is made using JavaScript and thus does not require any server
side support. You could just get the whole code and run on any machine. This
runs both on Windows and Linux machines. This software provides options for
users to mix both Tamil and Arwi scripts even though that is not the normal
method of writing in Arwi Script. When users types in Arwi, the character
alignment becomes right-to-left.

21
JavaScript based Arwi Typing Webpage
n Tamil keymap vary based on:
q If we use certain fonts like Bamini or Sarukesi
q If we use Unicode fonts like Latha font
n JavaScript based Webpage permits both the
keymap options using the radio button for
selection
n Users who are not aware of Tamil typing, can
use the virtual keypad provided.
n Arwi can also be typed using virtual keypad.
No need for Arwi keymap setup on the PC

22
JavaScript based Arwi Typing Webpage
n Scripting changes from left-to-right to right-to-left once
user decides to go to Arwi typing
n Issues faced:
q Windows Vista: Webpage works as expected on

Internet Explorer (Version: 7.0.6000.16386) and


Firefox (Version: 2.0.0.13)
q Windows XP:

n Internet Explorer
(6.0.2900.2180.xpsp_sp2_rtm.040803-2158)
n Problem displaying: 0656 (Arabic Subscript Alef),
0657 (Arabic Inverted Damma), 065C (Arabic
Vowel Sign Dot Below), 0328 (Combining Ogonek)

23
JavaScript based Arwi Typing Webpage
n Windows XP:
q Upgraded Internet Explorer to 7.0.5730.13 version:

n Same problems persist

q Firefox 2.0.0.13

n 0656, 0657 and 0328 did not appear properly on


the virtual keypad
n 0746 (Syriac three dots below) and 0734 (Syriac
Zqapha below) did not join with the previous
character and appeared separately
n General diacritical marks which belong to Arabic
Script (Like Fathah, Damma, etc) needed increase
in size, to appear properly in Virtual keypad

After finding that few characters don’t appear properly on Windows XP, we did
check for the presence of the Unicode character in Windows XP and compare
that with Windows Vista. This is done using Charmap with Advanced view. We
did select Unicode subrange in "Group by" option and selected Combining
Diacritical Marks and Arabic to verify for the presence of the Unicode
characters. We could conclude that few characters were not present in
Windows XP and thus could not be displayed.

24
JavaScript based Arwi Typing Webpage
n Joining of Syriac character was done using Unicode
character 0640 (hyphen), 200D (Zero Width Joiner) and
070F (Syriac Abbreviation Mark):
document.write('<INPUT type="button" style="font-size: 30; font-
weight:bold" name="\u0746" value=" \u0746 "
onclick=AppendCharacter("\u0640\u200d\u070f\u0746")>');
Internet Explorer – Win XP Firefox 2.0.0.16 – Win XP

Internet Explorer (Version: 6.0.2900.2180.xpsp_sp2_rtm.040803-2158) does


not display few characters like those with Unicode numbers 0656 (Arabic
Subscript Alef), 0657 (Arabic Inverted Damma), 065C (Arabic Vowel Sign Dot
Below), 0328 (Combining Ogonek, part of Combining Diacritical Marks)
properly. Even after upgrading the Internet Explorer to 7.0.5730.13 version,
same problems persist.
We did download the Arial Font from the Internet (Arial32.exe) and used it with
Windows XP. After doing this, Unicode character 0328 did work fine but did
not appear properly in the display.

25
JavaScript based Arwi Typing Webpage
n In Mozilla Firefox 3, the Arabic
HTML appears better expect for
U+0657 character. Also, Syrian
characters need not include
\u0640\u200d\u070f to join
properly

Firefox 3.0.1 – Win XP

26
JavaScript based Arwi Typing Webpage
n SuSe 10.2 (Firefox 2.0.0.6):
q Appearance issues

n Did not appear properly on the virtual keypad,


but worked fine while typing:
q Characters that did not work

n 0657 and 0328 appeared as a separate


character and did not join with the previous
character
n Unicode characters like 06E9, 065C, 0734 and
0746 did not appear properly in virtual keypad

27
JavaScript based Arwi Typing Webpage
n In SuSe 10.2, with Firefox 3.0.1, Webpage worked fine. There is no
need for U+0640, U+200D and U+070F
n Further Analysis:
q Windows Vista has more Unicode characters compared to

Windows XP
n Can be verified using Charmap

q Presence or usage of Syriac and few other combinational

diacritical marks have problems appearing on the virtual keypad


and also in the text

Firefox 3.0.1 – OpenSuSe 10.2

28
JavaScript based Arwi Typing Webpage
n In Safari Version 3.1.2 (525.21) on Windows XP:
q All the characters seem to work fine both on

the virtual keypad display and when used.

29
Tamil Keyboard - Comparison
Tamil – Latha – Unicode

Tamil – Bamini OR Sarukesi – Non-Unicode

Differences in Keypad for Unicode and non-unicode based Tamil font has
been a concern for those who wish to type in Tamil. Those who have learnt
Tamil using Typewriter find it difficult to move on to Unicode based Tamil
Fonts. There exists certain software that could convert text from one font to
another, just make sure that the rendering works fine.

30
Arabic Keyboard - Proposed Arwi
Arabic Keyboard

Arwi Keyboard

We have designed the keypad for Arwi to be similar to that of Arabic, so as to


make it easier for those who already knew Arabic Typing. If a user does not
know how to type in Tamil or Arwi, he could use the keypad provided in the
software.

31
Font Rendering – Editor Tools
n Rendering of Arabic, Tamil and Arwi characters vary
between Notepad, WordPad, Microsoft Word,
OpenOffice.org tools, etc
n Table shows characters typed with Webpage and
copied and pasted in different editors

32
Font Rendering – Editor Tools
n Editor Tool Issues:
q Authors in [6] describe about rendering problems for

Unicode characters in various Indian Languages


(including Tamil). They also quote that the Zero Width
Non Joiner (ZWNJ) are permitted by Wordpad but not
by Notepad.
q Zero Width Joiner (ZWJ) is rendered properly by

Wordpad but not by Notepad


n U+200D (Keep words closer and make then join)

q Wordpad 6.0 seems to render Arwi better than

Notepad 6.0, Microsoft Word 2003 and


OpenOffice.org Writer 2.1

6. http://acharya.iitm.ac.in/multi_sys/unicode/render/ren_07.php [Accessed on:


23rd April 2008]

33
Font Rendering – Issues

n Characters coupled with characters such as


the zero width joiner, zero width non-joiner
etc., can cause serious headaches to the text
processing applications if the displayed text
was composed using these codes [7]
n Zero width glyphs are very important for
Indian language fonts [7]

7. http://acharya.iitm.ac.in

34
Font Rendering – Issues

n Example showing how Tamil website


(Unicode) appears in different browsers [8]
Internet Explorer 7 Mozilla Firefox 2.0.0.16

Mozilla Firefox 3.0.1

8. http://zeyarath.blogspot.com

35
Font Development - fontforge

n Operating System used: SuSe 10.2


n FontForge (2008032 – 2 Mar 2008) software
is used to develop Arwi font
n DejaVuSans font was used as the base font
n After appropriate changes Arwi font was
generated as TTF font and tested on Linux
and Windows platform with various Editor
Tools.

Users can make HTML pages using ARWI font by including the tag <font
face="arwi">

36
fontforge – Characters Added
n We have added new characters needed for
Arwi font to the DejaVuSans Unicode font
n For each character, we need to have medi,
init and final forms

DejaVuSans.ttf font present in the /usr/share/fonts/truetype folder was selected as the base
font. Installation of fontforge software on SuSe machine was straight forward with rpm (rpm –
ivf fontforge-i386.rpm). In fontforge, when we open the DejaVuSans font, we could see that
each character is shown as two cells. The cell on the top indicates the character and the
bottom cell indicates the drawing or representation of the character.
To add new glyphs to the given font, we need to add slot and proceed to enter the glyph and
then link this glyph to the base character. This is done as below: Encoding à Add Encoding
Slots (Indicate the number of glyphs you want to add)
Select each one of the newly added slot and do the following:
Element à Glyph Info: For a Glyph (Unicode Name: uni06A1.init; Unicode Value: -1)
For a base character (Unicode Name: uni06A1; Unicode Value: U+06a1)
Then click 'Set From Name' (in Glyph Info) and click 'Ok'.
For each and every glyph created, we need to click File à Generate Fonts à Save as
True type. Note: User rights are to be considered when saving the fonts in the
respective folders.
Make sure that the glyph is added to the substitutions option in the main or base Unicode font.
After doing the above, if we wish to see the impact on OpenOffice.org Writer, we need to close
OpenOffice.org Writer and re-open it. Also, make sure the keymap entry is removed and
added again (if there needs to be a change in the keymap).

37
fontforge – Characters Added

Example of
Initial, Middle,
and Final
Forms

The name of the font can be changed using Font Info under Element menu in
fontforge software. To generate the font, use Generate Fonts option under
File menu and then select TTF type. It was noted that we need to close the
fontforge before having the font to be available for typing in any editor
software.
For each character, we have substitutions like the initial, middle and final one.
We need to link the substitution glyph with the original base character. This
is done using Element à Glyph Info à Substitutions. Select the
appropriate substitution like 'init' or 'medi' or 'fina' and link to the newly
created glyph.
'medi': Medical forms in Arabic Lookup 8 subtable
'fina': Terminal forms in Arabic Lookup 9 subtable
'init': Initial forms in Arabic Lookup 10 subtable

38
Keyboard Layout Setup - Windows

n When someone wishes to type using Arwi


language, he needs to add Arwi language to
the language bar
n Software used: Keyboard Layout Manager
n No Arwi layout exists
n Arabic(Yemen) used for testing

From the glyph, we can see to which base character it is linked using: Element
à Show Dependent à Substitutions.

39
Keyboard Layout Setup - Windows

n Select the appropriate Unicode character


according to the Arwi Keyboard setup
required

Using the Keyboard Layout Manager software, we provide the users with the
Keyboard Layout file, which is named as ArabuTamil.klm2000. To i nstall the
given ArabuTamil keypad on to any Windows machine, users need to use the
Keyboard Layout Manager Software. Install the software and open the
software. Then, click New under Keyboards and in the layout type
"ArabuTamil" and select any language that you are not using or not planning to
use (we have selected Arabic(Yemen)). Then click on Create. Once, the
Option is created, select the ArabuTamil option and click Edit. Click on Import
and select the ArabuTamil.klm2000 file given. Then, click Open followed by
OK twice. Finally, you need to Confirm changes. Then, in your la nguage bar,
you will find the necessary ArabuTamil (as Arabic(Yemen)) option present.
Now that you can type ArabuTamil in any editor software by selecting
ArabuTamil in the language bar.

40
Keyboard Layout – Windows - Issues

n Diacritics alignment was a problem when the


character from Syriac was used with Arabic
and Arabic supplements
n Syriac Qushshaya (U+0741 – dot above) and
Syriac Rukkakha (U+0742 – dot below) can
be used with only specific Syriac letters [9]
n Difference in alignment between Opera,
Mozilla Firefox and Internet Explorer

9. The Unicode Standard - Chapter 8

41
Keyboard Layout – Linux

n Check for language support


q locale -a

n To make the font as default font, add these line


to your profile (~/.profile)
q export LANG=en_US.UTF-8

q export LANG=ar_SA.UTF-8

q To make change into effect, press

Ctrl+Alt+Backspace (login again)

In order to verify whether Unicode is enabled on the machine, locale command


was used. You could use locale –a and find out what are the languages that
are supported by the machine. To change the locale settings for your account,
open the ~/.profile file present in your home folder and add the line
export LANG=en_US.UTF-8
If you wish to make the default font to be an Arabic font, you could change the
LANG option as to something like ar_SA.UTF -8, which stands for the Unicode
of Arabic (Saudi Arabia). Setting the LANG option to ar_SA.UTF-8 seems to
make the Firefox browser turn to Arabic language in SuSe 10.2. In order to
make the change in profile to take effect, we need to login again. This could be
done by clicking Ctrl+Alt+BackSpace.

42
Keyboard Layout – vim - Linux

n Vim keymaps are present at:


q /usr/share/vim/current/keymap

q Keymap exists for Unicode and non-Unicode.

q Arabic Unicode: arabic_utf-8.vim

n Creating Arwi Keymap


q Arwi is related to Arabic so a copy of Arabic Unicode

was made
q New file is named as: arwi_utf-8.vim

q This file maps characters of keyboard with

hexadecimal and decimal representation of Unicode


characters

The necessary keymaps for vim is present in /usr/share/vim/current/keymap


folder. For each language, you could find keymap for both Unicode and non-
Unicode characters. The Unicode keymap for Arabic language is arabic_utf-
8.vim.
In order to write our own keymap, we could copy the arabic_utf-8.vim file into
arwi_utf-8.vim. The contents of this file indicate the link between the
characters of the keyboard with that of the hexadecimal and decimal
representation of Unicode characters. You could provide a comment to
indicate what each character means.

43
Keyboard Layout – vim - Linux
let b:keymap_name = "arwi"
loadkeymap
q <char-0x0636> " (1590) - DAD
w <char-0x0635> " (1589) - SAD
n The character 'q' (Lowercase) is mapped to Arabic Dad
which is represented in hexadecimal and decimal as
0x0636 and 1590 respectively
n Once the keymap is ready, you could type in the vim
using the statement as set keymap=arwi, after pressing
Esc+:

44
Keyboard Layout – Input Locale - Linux

n Input locales:
q Personal Settings (Configure Desktop) à Regional &

Accessibility à Keyboard Layout


q Select the language from Available layout and click

Add
n Creating new Arwi Keymap
q Go to xkb folder in either /usr/share/X11 or /etc/X11

or /usr/X11R6/lib/X11 folder
q Add an entry (anywhere) under the ! layout section of

the base.lst file, which is present in


/usr/share/X11/xkb/rules folder

Note: We have used /usr/share/X11/xkb folder.


If you want enable the language bar in the task bar of the desktop, you need to
do the following steps:
Click Personal Settings (Configure Desktop).
Select Regional & Accessibility.
Select Keyboard Layout.
Select the language that you want to present on the language bar from the
Available Layout and click Add >>.
With these steps, you can click on the language bar and type in different
languages. But, this will work only for those languages from which the
keymap is already available. For this work, we wish to design a new
Keymap for the Arwi script.

45
Keyboard Layout – Input Locale - Linux

n The base.lst file is listed alphabetically on the


Keyboard Layout (Regional & Accessibility)

n Add the entry for the Arwi script in the base.xml


file which is present in /usr/share/X11/xkb/rules
folder. Copy the Arabic one and change
<name>, <shortDescription> & <description>
tags

In order to design a new keymap, we need to go to the /usr/share/X11/xkb or


/etc/X11/xkb or /usr/X11R6/lib/X11/xkb folder. Different Linux variants have
different folders for the xkb. We then need to add an entry under the ! layout
section of the base.lst file, which is present in /usr/share/X11/xkb/rules folder.
The next step is to add the entry for the Arwi script in the base.xml file which is
present in /usr/share/X11/xkb/rules folder.

46
Keyboard Layout – Input Locale - Linux
n The keymap needs to be added to
/usr/share/X11/xkb/symbols folder
n Copy the existing ara (Arabic) to arwi (Arwi)
n AE stands for 1234 row in keyboard
n AD stands for QWERT row in keyboard
n AC stands for ASDF row in keyboard
n AB stands for ZXCV row in keyboard
q AB01 stands for ‘z’ character

q AB02 stands for ‘x’ character and so on

q TLDE stands for ‘~’ character

Then, we have to make the actual keymap available in the


/usr/share/X11/xkb/symbols folder. After the reserved keyword "key" we use
some representation to indicate each rows of the keyboard. AE stands for the
row with numbers 1, 2, 3,… . AE01 indicates the first key, which is the number
1 key. AE02 indicates the second key, which the number 2 key and so on. AD
stands for the row starting as QWERT. AD01 represents the key "q". AC
stands for the row starting as ASDF. AC01 represents the key "a". AB stands
for the row starting as ZXCV. AB01 represents the key "z". In the keymap, the
lowercase representation and uppercase representation are separated by a
comma.

47
Keyboard Layout – Linux
n After changing keymap file (in
/usr/share/X11/xkb/symbols):
q Remove keymap from the layout

q Add again from Available layout (Regional &

Accessibility) and click Apply

Whenever we make a change to the keymap file (present in the


/usr/share/X11/xkb/symbols folder), we need to remove the keymap from the
layout and add then again from the Available layout and click Apply. Then,
only these changes will come to effect.

48
Keyboard Layout – Linux
n Keymap can be temporarily enabled using
anyone of the commands (Example, arwi
keymap):
q setxkbmap –symbols 'arwi'
q setxkbmap –symbols arwi
q setxkbmap –layout 'arwi'
q setxkbmap –layout arwi

49
Issues Faced – Rendering Differences

n OpenType capable text rendering engines:


q OpenOffice.org and SIL's XeTeX use IBM's International
Components for Unicode (ICU)
q QT4 has its own engine based on HarfBuzz
q GTK+ uses Pango which is using HarfBuzz internally
q New TeX engine LuaTeX has its own OpenType engine
q Microsoft has its Uniscribe engine
q Adobe has a hybrid Apple Advanced Typography (AAT)-
OpenType engine used on Mac OS X.
q Conclusion: One engine shared between GTK/QT4; another
in OpenOffice.org; Microsoft has its own.

50
Issues Faced

n Further, new diacritical characters were added


n Joining problems of the diacritical character to the
previous character was noted
n Testing was done using:
q SuSe 10.2: Did not display properly
n OpenOffice.org 2.0.4 build 2.0.4.7
n Opcion Font Viewer 1.1.1
q Ubuntu 7.04: Did not display properly
q Ubuntu 8.04:
n OpenOffice.org 2.4.0, GTK+ Editor, gedit, kate, Opcion Font
Viewer – Worked fine
n QT Editor – Diacritics not placed properly

51
Issues Faced
n Simple test was done by copying the existing
diacritical character at a different position and testing
with different tools PDF by XeTeX
& ConTeXt

Specimen
Font
Previewer

52
Issues Faced
n Testing on Ubuntu 8.04

Acknowledgements: Author wishes to acknowledge the help offered by Mr.


Mohammed Khaled [www.eglug.org].

53
Characters that needs to be added
n Make sure that the initial, final and middle glyphs of
these Characters are present:
q 0686, 068A, 068D, 0693, 0694, 06A3, 06B9, 06BA,

06A0, 06FB, 0767


n Diacritical Marks that should be verified:
q 0653, 0656, 0657, 0670, 0746

q 0734 needs to be checked if it is just the mirror of

0657
n Character that needs to be added:
q 0643 WITH A DOT BELOW

q 0635 WITH A DOT BELOW

q Number 7: Similar to the English character “L”

54
Further References
http://www.arabeyes.org/download/download/3rd/arabic.xkb
http://www.vim.org/htmldoc/arabic.html
http://countrystudies.us/sri-lanka/38.htm [Accessed on: 22nd
April 2008]
http://www.armu.com/armu/works/archives/12dec1998/amc1.
html [Accessed on: 22nd April 2008]
Tschacher, Arwi (Arabic-Tamil) – An Introduction [Accessed
on: 23rd April 2008],
http://web.archive.org/web/20040822180630/www.fas.nus.ed
u.sg/journal/kolam/vols/kolam5&6/1AOldLit/Arwi.htm
http://www.klm32.com/ [Accessed on: 22nd April 2008]
http://acharya.iitm.ac.in/multi_sys/unicode/render/ren_07.php
[Accessed on: 23rd April 2008]

55
Conclusion

n This work has tried to address the issue of


lack of font for Arwi Script
n Need to work closely with researchers in
Unicode area to bring Arwi as part of Unicode
characters

56
Arwi: Case study of Arabic, Syriac and
Diacritical Unicode characters

Thank You
Questions are most welcome

Acknowledgements: Author wishes to acknowledge the help offered by Dr.


Jaidi, Mr. Mohammed Khaled, Dr. Hussain Miya, Mr. Shah Nawas, Ms.
Daphne, Ms. Rosyzie, Mr. Hong, Mr. Arif, Ms. Rosnah and Mr. Ashraf.

57

View publication stats

You might also like