0% found this document useful (0 votes)
6K views4 pages

SiyaBasScript - Mozilla Firefox and Google Chrome Extension For Converting Non-Unicode Sinhala Text To Unicode

Use of Sinhala language in computer technology have been present since the late 1980s But no standard character representation system was put in place which resulted in proprietary character representation systems and fonts. Then the Unicode standard, which has the explicit aim of transcending the limitations of traditional character encodings, was introduced to Sinhala in 1998. But still, some major news sites in Sinhala has not adopted to the standard, and misuses styling hacks to display Sinhala text in various other font-faces. This causes lot of compatibility issues when viewed in different browsers and operating systems. SiyaBasScript extension solves the problem by converting that text to Sinhala Unicode. It will help to increase the content of world wide web in Sinhala Unicode, allowing a far easier standard of representing, searching, sorting and processing knowledge. For Downloading Instructions visit http://galpotha.wordpress.com

Uploaded by

keheliya
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6K views4 pages

SiyaBasScript - Mozilla Firefox and Google Chrome Extension For Converting Non-Unicode Sinhala Text To Unicode

Use of Sinhala language in computer technology have been present since the late 1980s But no standard character representation system was put in place which resulted in proprietary character representation systems and fonts. Then the Unicode standard, which has the explicit aim of transcending the limitations of traditional character encodings, was introduced to Sinhala in 1998. But still, some major news sites in Sinhala has not adopted to the standard, and misuses styling hacks to display Sinhala text in various other font-faces. This causes lot of compatibility issues when viewed in different browsers and operating systems. SiyaBasScript extension solves the problem by converting that text to Sinhala Unicode. It will help to increase the content of world wide web in Sinhala Unicode, allowing a far easier standard of representing, searching, sorting and processing knowledge. For Downloading Instructions visit http://galpotha.wordpress.com

Uploaded by

keheliya
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

1

SiyaBasScript - Mozilla Firefox and Google


Chrome Extension for converting web sites with
non-Unicode Sinhala fonts to Unicode
Keheliya Bandara Gallaba, Department of Computer Science and Engineering, University of
Moratuwa, Sri Lanka
Committee on Adaptation of National Languages in IT
Abstract - Use of Sinhala language in computer technology (CANLIT), which agreed on a unique Sinhala alphabet and
have been present since the late 1980s But no standard character alphabetical order.
representation system was put in place which resulted in
proprietary character representation systems and fonts. Then the
The CINTEC Internet Committee agreed that one of the
Unicode standard, which has the explicit aim of transcending the major impediments to the development and use of the Internet
limitations of traditional character encodings, was introduced to in Sri Lanka, especially into rural areas is the lack of local
Sinhala in 1998. But still, some major news sites in Sinhala has language content. The Committee agreed that the availability
not adopted to the standard, and misuses styling hacks to display of a high quality, free, and standards-conformant Sinhala font
Sinhala text in various other font-faces. This causes lot of would enable content providers to create Sinhala language
compatibility issues when viewed in different browsers and
operating systems. SiyaBasScript extension solves the problem by
content. As a first measure, the Internet Committee decided
converting that text to Sinhala Unicode. It will help to increase that a Committee on Unicode Compatible Sinhala Fonts
the content of world wide web in Sinhala Unicode, allowing a far should be formed. This Committee would define the basic
easier standard of representing, searching, sorting and minimum requirements for Unicode compatible Sinhala fonts;
processing knowledge. define the essential features which should be present in a
Sinhala character set, character combinations and their input,
Index Terms—Internet, Languages, Localization address the requirements for a standard Sinhala keyboard, key
board stroke sequences, and issues relating to the glyphs and
keyboard drivers. In 1998 SLS1134/Unicode standards for
I.INTRODUCTION Sinhala was released by CINTEC for the first time.

W ith the introduction of microcomputers in the early


eighties, Sri Lanka too embarked on the use of
computers with local language input and output. The
With the establishment of ICTA in 2003 the responsibilities
of the Fonts Committee was assigned to ICTA and it set up a
Language Requirements Committee to take the Sinhala
University of Colombo developed a Sinhala screen output for Unicode initiative forward.
television displays and went on to provide election result
displays in the three languages Sinhala, Tamil and English
within a few years. Software like 'DOS Word Perfect', 'Super
77', 'Wadan Tharuwa' and 'Sarasavi' was introduced later to
enable Sinhala word processing and printing. However, no
standard character representation system was put in place
which resulted in proprietary character representation systems
and fonts. But later, the requirement for a standard code was
identified and steps were taken by the Computer and
Information Technology Council of Sri Lanka (CINTEC) to
establish a committee for the use of Sinhala & Tamil in
The Unicode range for Sinhala is U+0D80–U+0DFF. Grey areas indicate
Computer Technology in 1985, soon after its inception. non-assigned code points.
This committee quite correctly took steps to meet the
immediate need to agree on an acceptable Sinhala alphabet Currently Sinhala Unicode does not come built in with
and an alphabetical order. Thus this committee joined with a Windows XP, unlike Tamil and Hindi. However, all versions
committee appointed by the Natural Resources, Energy and of Windows Vista come with Sinhala Unicode support by
Science Authority of Sri Lanka (NARESA) to form the default, and do not require external fonts to be installed to
read Sinhalese script.

Manuscript received February 21, 2010. This extension for Google
Chrome and Mozilla Firfox has been created as per the guidelines of CS3200
For OS X, Sinhala font and keyboard support can be found
– Programming Project module of Bsc (Computer Science and Engineering) at http://web.nickshanks.com/typography/ and at
degree course of University of Moratuwa. http://www.xenotypetech.com/osxSinhala.html
For Linux, the scim input method selector allows to use
G.M. Keheliya Bandara Gallaba is an undergraduate student at
University of Moratuwa, Sri Lanka and currently having his internship at Sinhala script in applications like terminals or web browsers.
WSO2 Inc. (Phone: +94-715518881; fax: +94-112412236; e-mail: If you are using Fedora 7 or later then it already has the
keheliya.gallaba@ gmail.com). required input methods, which can be installed using
2

Applications->Add/Remove Software menu item. For other B.Drawbacks of using Styling techniques to display
GNU/Linux distributions such as Debian or Ubuntu follow the localized content
instructions at the Sinhala GNU/Linux site for complete
Sinhala Unicode support. B.1.Font Proliferation
The purpose of <font face> as mentioned in W3C
Recommendation is for controlling style of a webpage by
II.THE NEED FOR SIYABASSCRIPT mentioning a set of one or more fonts, in one or more sizes,
designed with stylistic unity, each comprising a coordinated
A.Initial Survey set of glyphs, but does not address the problems it creates
Although Unicode has been considered as the standard for when misused in multilingual documents.
creating and viewing Sinhala language content, some web The point is that if <font face> is used, and specify a
sites including some famous news sites still create content in font for a different script, it is in fact lying to the browser
non-Unicode and misuse methods that are for styling about the identity of the characters that are supposedly
webpages, to force the browser to render Sinhala text. identified by the underlying codes in the computer.
There are a number of problems with the above approach.
For Example: The most evident is that bad things happen if the user looking
at the page does not have exactly the font that has been
Online edition of Lankadeepa (http://www.lankadeepa.lk/) specified: he will see the text in his browser's default font,
uses following snippet of code for displaying Sinhala text which will not be Sinhala and will not have glyphs to display
using a non-Unicode font. Sinhala characters, whereas he may have a perfectly good
@font-face { standard Sinhala Unicode font on his system, which could
font-family: Wijeya; have been used if developer had coded the text properly.
font-style: normal; The characters (actually glyphs) in a font are numbered, the
font-weight: 700; set of glyph-number associations forming what is known as
src: url(../../../02\WIJEYA1.prf); the coding of the font. But there are a large number of these,
} even for a given language or script. If simplistic font mapping
is used (which is what <font face> does) to encode text,
The Mirisa.org (http://mirisa.org/) uses following snippet of you are at the mercy of the particular coding of the font you
code for displaying Sinhala text using a non-Unicode font. chose.
Since users will have to install all the fonts specified by
@font-face { different sites, this proliferation creates useless redundancy
font-family: IsiMalithi; without addressing styling which it was intended for. And the
font-style: normal; Webspace becomes fragmented, with mutually
font-weight: normal; incomprehensible parts.
src: url(ISIWMAL0.eot);
} B.2.Incompatibility Issues
Although user can get around the problem of missing font
The Lanka-E-News site (http://lankaenews.com/) uses files by installing them in the computer or using embedded
following snippet of code for displaying Sinhala text using a online font files, this leads to lot of incompatibility issues in
non-Unicode font. different browsers and in different operating systems.
The ability to embed fonts on web pages was originally
<style type="text/css"> implemented by Microsoft in Internet Explorer 4.0 - the catch
@font-face { was that these font files needed to be in a custom form of
font-family: sandaru-n; OpenType format, with an EOT file extension. The other
font-style: normal; catch is that embedding EOT files only works in Internet
font-weight: normal; Explorer.
src: url(SANDARU0.eot); Embedded OpenType is a proprietary standard supported
} exclusively by Internet Explorer but was submitted to the
</style> W3C in 2007 as part of CSS3, which was rejected and
resubmitted as a standalone submission March 18, 2008. The
After further more research it was found some more news W3C team comment on the submission states that the "W3C
sites including Online Edition of Lakbima plans to submit a proposal to the W3C members for a working
(http://www.lakbima.lk/), Rivira (http://www.rivira.lk/), group whose goal is to try and develop EOT into a W3C
LankaScreen (http://www.lankascreen.com/), GossipLanka Recommendation."
(http://gossiplanka.blogspot.com/), HotHotLanka More recently, the new CSS 3 added a specification for
(http://hothotlanka.blogspot.com/) uses similar styling embedding fonts on web pages in a more open, standardized
techniques to display Sinhala text without using standard way. Browsers that support the full CSS 3 specification can
Unicode encoding. render web pages which embed a TrueType font file.
3

New browsers such as Firefox 3.5 therefore support measures to convert previously added non-Unicode text
TrueType Fonts to be embedded on pages, whereas Internet content to Unicode so that they will be available to be
Explorer supports OpenType Fonts. Firefox 3.5 won't render searched, sorted, indexed and properly represented.
OpenType, and Internet Explorer won't render TrueType. To
get around this problem, multiple types of fonts must be
embedded on a page at the same time. III. SOLUTION BY SIYABASSCRIPT
SiyaBasScript Mozilla Firefox and Google Chrome
Extension solves the above problem by recognizing elements
Lowest
Browser Support of of above mentioned web sites that contain non-Unicode texts
version
and maps them respectively in to Unicode characters so that
Internet Embedded OpenType fonts web sites could be viewed in any Unicode enabled browser
4.0
Explorer only running in any operating system without the hassle on
TrueType and OpenType installation of fonts or expensive proprietary software.
3.5 (1.9.1)
fonts only Fundamental idea behind the architecture of this extension
3.6 (1.9.2) came from the Greasemonkey scripting engine, which allows
TrueType and OpenType users to run scripts that are written in JavaScript and
Opera 10.0
fonts only manipulate the contents of a web page using the Document
Safari TrueType and OpenType Object Model interface. These scripts are site-specific and
3.1 (525)
(WebKit) fonts only allows users to install scripts that make on-the-fly changes to
Browser compatibility of different font types with @font-face syntax. HTML web page content on the DOMContentLoaded event,
which happens immediately after it is loaded in the browser
But rather than embedding multiple types of fonts, web (also known as augmented browsing). As Greasemonkey
developers tend to embed the EOT file and recommend the scripts are persistent, the changes made to the web pages are
reader to use Microsoft Internet Explorer to view Sinhala web executed every time the page is opened, making them
pages correctly. effectively permanent for the user running the script.
These constraints create lot of usability issues in a real Greasemonkey scripts can also poll external HTTP
world scenario. Primarily web applications should be written resources via a non-domain-restricted XMLHTTP request.
for the web — not browsers. Developers should strive for And they contain optional metadata, which specifies the name
device-independence rather than targeting specific OS and of the script, a description, relevant resources to the script, a
browser versions. namespace URL used to differentiate identically named
scripts, and URL patterns for which the script is intended to be
B.3. Knowledge Representation issues with searching, invoked or not.
sorting and text processing, However, Greasemonkey scripts are limited due to security
restrictions imposed by Mozilla's XPCNativeWrappers. For
Since non-Unicode fonts does not contain any information example, Greasemonkey scripts do not have access to many of
about the language, search engines or any other bodies can't Firefox's components, such as the download manager, IO
understand text created using these fonts as Sinhala content. processes or its main toolbars. Additionally, Greasemonkey
So they won't be indexed meaningfully and will not be scripts run per instance of a matching webpage. Because of
considered in respective searching queries or sorting this, managing lists of items globally is difficult. However,
operations script writers have been using cookies and Greasemonkey
Sometimes in sinhala scripts the glyph changes according to even offers APIs such as GM_getValue and GM_setValue to
the position of the character within a word (initial, medial, overcome this.
final or isolated). Or there exist compulsory ligatures where For creating SiyaBasScript Firefox add-on, initial
two or more characters turn into a single glyph. Or one
development was done using javaScript and Greasemonkey-
character is displayed as two glyphs that straddle the glyph of
multi-script-compiler was used to make it fully fledged and
another character.
Simple non-Unicode fonts used for mapping complex adhering to Firefox extension's XPToolkit architecture and
scripts are often rather limited in terms of glyphs and Extension Component Interactions.
ligatures, and sometimes use ugly tricks like building
characters from pieces to render barely passable text. By
contrast, an implementation based on a proper coded Unicode
character set can fully use a good font not subject to the
constraints of font mapping, resulting in better quality
rendering. Furthermore, the rendering is independent of the
font used, which means that improvements in the latter can be
leveraged against old documents without recoding them: they
simply display better.
So it is absolutely essential to emphasize on Unicode
Sinhala content creation on the word wide web and take
4

at University of Moratuwa, Sri Lanka and Mr Chulaka


Gunasekara, Lecturer at Department of Computer Science and
Engineering at University of Moratuwa, Sri Lanka.

REFERENCES
[1] J. B. Disanayaka, "Samakālīna Siṃhala lēkhana vyākaraṇaya"
S. Goḍagē saha Sahōdarayō, 1995
[2] J. B. Disanayaka, "Siṃhala bhāṣāvē nava muhuṇuvara"
Saṃskrtika
̥ Kaṭayutu Depārtamēntuva, 1996
[3] Wasantha Deshapriya, “Sri Lankan Country Report on Local Language
Computing Policy”,Re-engineering Government Programme,
Information and Communication Technology Agency of Sri Lanka
[4] Dasun Sameera Weerasingha, “Sinhala Uniketha, Pasubima ha
thakshanika Pathikada”
[5] Sinhala Unicode Character Code Chart, (Available:
http://www.unicode.org/charts/PDF/U0D80.pdf)
[6] Jukka K. Korpela, 'Unicode Explained”, O'Reilly; 1st edition, 2006.
ISBN 0-596-10121-X
[7] The Unicode Consortium, 'The Unicode Standard”, Version 5.0, Fifth
Edition, Addison-Wesley Professional, 27 October 2006. ISBN
0-321-48091-0
[8] Alis Technologies inc. “<FONT FACE> considered harmful”, 1996,
(Available: http://alis.isoc.org/web_ml/html/fontface.en.html)
[9] Warren Steel ([email protected]), "What's wrong with the FONT
element?", 2003, (Available:
http://www.mcsr.olemiss.edu/~mudws/font.html)
Detailed component architecture of Firefox [10] Mark Pilgrim, "Dive Into Greasemonkey", 2005, (Available:
http://diveintogreasemonkey.org)
[11] Cheah Chu Yeow,"Firefox Secrets", 2005, SitePoint Pty. Ltd.
As of February 2010, Google Chrome has started providing [12] Kenneth C. Feldt, "Programming Firefox", 2007, O’Reilly Media, 1005
"native support" for greasemonkey scripts. They are internally Gravenstein Highway North, Sebastopol, CA 95472.
[13] Mark Pilgrim, "Greasemonkey Hacks", 2007, O’Reilly Media, 1005
converted to extensions, and are managed as such. Chrome Gravenstein Highway North, Sebastopol, CA 95472.
ignores @exclude metadata within the scripts, so the scripts [14] Mozilla Developer Center , “@font face”, (Available:
are executed for all domains/pages. On the other hand, https://developer.mozilla.org/en/CSS/@font-face)
Chromium honors the @include directives and executes the [15] SiyaBasScript – Project hosting on Google Code, (Available:
http://code.google.com/p/siyabasscript/downloads/list)
scripts only for the domains/pages specified.
So although, scripts that use one of the GM_setValue or
GM_getValue initiatives will break, and scripts that use the
popular E4X standard will not run, SiyaBasScript was ported
successfully to Google Chrome.

IV. CONCLUSION

This means SiyaBasScript not only allows the hassle-free


viewing of above web pages but allows copying and pasting
of earlier non-Unicode content to Unicode enabled sites using
browsers like Mozilla Firefox and Google Chrome . This
allows quoting, referencing and sharing of above content via
Unicode enabled sites that was nearly impossible when they
were available as non-Unicode content.
Simply this thinking, takes away the need of recommending
or restricting users to certain browsers such as Internet
Explorer for viewing, editing and sharing localized content.
This ensures the ultimate goal of the world wide web as it
intended to be in the beginning, where it seamlessly gives
humanity, the easy access and availability of information.

ACKNOWLEDGMENT
The Author Keheliya Bandara Gallaba, specially thanks
and Sinhala Unicode Group, Dr. Shehan Perera, Senior
Lecturer at Department of Computer Science and Engineering

You might also like