SiyaBasScript - Mozilla Firefox and Google Chrome Extension For Converting Non-Unicode Sinhala Text To Unicode
SiyaBasScript - Mozilla Firefox and Google Chrome Extension For Converting Non-Unicode Sinhala Text To Unicode
Applications->Add/Remove Software menu item. For other B.Drawbacks of using Styling techniques to display
GNU/Linux distributions such as Debian or Ubuntu follow the localized content
instructions at the Sinhala GNU/Linux site for complete
Sinhala Unicode support. B.1.Font Proliferation
The purpose of <font face> as mentioned in W3C
Recommendation is for controlling style of a webpage by
II.THE NEED FOR SIYABASSCRIPT mentioning a set of one or more fonts, in one or more sizes,
designed with stylistic unity, each comprising a coordinated
A.Initial Survey set of glyphs, but does not address the problems it creates
Although Unicode has been considered as the standard for when misused in multilingual documents.
creating and viewing Sinhala language content, some web The point is that if <font face> is used, and specify a
sites including some famous news sites still create content in font for a different script, it is in fact lying to the browser
non-Unicode and misuse methods that are for styling about the identity of the characters that are supposedly
webpages, to force the browser to render Sinhala text. identified by the underlying codes in the computer.
There are a number of problems with the above approach.
For Example: The most evident is that bad things happen if the user looking
at the page does not have exactly the font that has been
Online edition of Lankadeepa (http://www.lankadeepa.lk/) specified: he will see the text in his browser's default font,
uses following snippet of code for displaying Sinhala text which will not be Sinhala and will not have glyphs to display
using a non-Unicode font. Sinhala characters, whereas he may have a perfectly good
@font-face { standard Sinhala Unicode font on his system, which could
font-family: Wijeya; have been used if developer had coded the text properly.
font-style: normal; The characters (actually glyphs) in a font are numbered, the
font-weight: 700; set of glyph-number associations forming what is known as
src: url(../../../02\WIJEYA1.prf); the coding of the font. But there are a large number of these,
} even for a given language or script. If simplistic font mapping
is used (which is what <font face> does) to encode text,
The Mirisa.org (http://mirisa.org/) uses following snippet of you are at the mercy of the particular coding of the font you
code for displaying Sinhala text using a non-Unicode font. chose.
Since users will have to install all the fonts specified by
@font-face { different sites, this proliferation creates useless redundancy
font-family: IsiMalithi; without addressing styling which it was intended for. And the
font-style: normal; Webspace becomes fragmented, with mutually
font-weight: normal; incomprehensible parts.
src: url(ISIWMAL0.eot);
} B.2.Incompatibility Issues
Although user can get around the problem of missing font
The Lanka-E-News site (http://lankaenews.com/) uses files by installing them in the computer or using embedded
following snippet of code for displaying Sinhala text using a online font files, this leads to lot of incompatibility issues in
non-Unicode font. different browsers and in different operating systems.
The ability to embed fonts on web pages was originally
<style type="text/css"> implemented by Microsoft in Internet Explorer 4.0 - the catch
@font-face { was that these font files needed to be in a custom form of
font-family: sandaru-n; OpenType format, with an EOT file extension. The other
font-style: normal; catch is that embedding EOT files only works in Internet
font-weight: normal; Explorer.
src: url(SANDARU0.eot); Embedded OpenType is a proprietary standard supported
} exclusively by Internet Explorer but was submitted to the
</style> W3C in 2007 as part of CSS3, which was rejected and
resubmitted as a standalone submission March 18, 2008. The
After further more research it was found some more news W3C team comment on the submission states that the "W3C
sites including Online Edition of Lakbima plans to submit a proposal to the W3C members for a working
(http://www.lakbima.lk/), Rivira (http://www.rivira.lk/), group whose goal is to try and develop EOT into a W3C
LankaScreen (http://www.lankascreen.com/), GossipLanka Recommendation."
(http://gossiplanka.blogspot.com/), HotHotLanka More recently, the new CSS 3 added a specification for
(http://hothotlanka.blogspot.com/) uses similar styling embedding fonts on web pages in a more open, standardized
techniques to display Sinhala text without using standard way. Browsers that support the full CSS 3 specification can
Unicode encoding. render web pages which embed a TrueType font file.
3
New browsers such as Firefox 3.5 therefore support measures to convert previously added non-Unicode text
TrueType Fonts to be embedded on pages, whereas Internet content to Unicode so that they will be available to be
Explorer supports OpenType Fonts. Firefox 3.5 won't render searched, sorted, indexed and properly represented.
OpenType, and Internet Explorer won't render TrueType. To
get around this problem, multiple types of fonts must be
embedded on a page at the same time. III. SOLUTION BY SIYABASSCRIPT
SiyaBasScript Mozilla Firefox and Google Chrome
Extension solves the above problem by recognizing elements
Lowest
Browser Support of of above mentioned web sites that contain non-Unicode texts
version
and maps them respectively in to Unicode characters so that
Internet Embedded OpenType fonts web sites could be viewed in any Unicode enabled browser
4.0
Explorer only running in any operating system without the hassle on
TrueType and OpenType installation of fonts or expensive proprietary software.
3.5 (1.9.1)
fonts only Fundamental idea behind the architecture of this extension
3.6 (1.9.2) came from the Greasemonkey scripting engine, which allows
TrueType and OpenType users to run scripts that are written in JavaScript and
Opera 10.0
fonts only manipulate the contents of a web page using the Document
Safari TrueType and OpenType Object Model interface. These scripts are site-specific and
3.1 (525)
(WebKit) fonts only allows users to install scripts that make on-the-fly changes to
Browser compatibility of different font types with @font-face syntax. HTML web page content on the DOMContentLoaded event,
which happens immediately after it is loaded in the browser
But rather than embedding multiple types of fonts, web (also known as augmented browsing). As Greasemonkey
developers tend to embed the EOT file and recommend the scripts are persistent, the changes made to the web pages are
reader to use Microsoft Internet Explorer to view Sinhala web executed every time the page is opened, making them
pages correctly. effectively permanent for the user running the script.
These constraints create lot of usability issues in a real Greasemonkey scripts can also poll external HTTP
world scenario. Primarily web applications should be written resources via a non-domain-restricted XMLHTTP request.
for the web — not browsers. Developers should strive for And they contain optional metadata, which specifies the name
device-independence rather than targeting specific OS and of the script, a description, relevant resources to the script, a
browser versions. namespace URL used to differentiate identically named
scripts, and URL patterns for which the script is intended to be
B.3. Knowledge Representation issues with searching, invoked or not.
sorting and text processing, However, Greasemonkey scripts are limited due to security
restrictions imposed by Mozilla's XPCNativeWrappers. For
Since non-Unicode fonts does not contain any information example, Greasemonkey scripts do not have access to many of
about the language, search engines or any other bodies can't Firefox's components, such as the download manager, IO
understand text created using these fonts as Sinhala content. processes or its main toolbars. Additionally, Greasemonkey
So they won't be indexed meaningfully and will not be scripts run per instance of a matching webpage. Because of
considered in respective searching queries or sorting this, managing lists of items globally is difficult. However,
operations script writers have been using cookies and Greasemonkey
Sometimes in sinhala scripts the glyph changes according to even offers APIs such as GM_getValue and GM_setValue to
the position of the character within a word (initial, medial, overcome this.
final or isolated). Or there exist compulsory ligatures where For creating SiyaBasScript Firefox add-on, initial
two or more characters turn into a single glyph. Or one
development was done using javaScript and Greasemonkey-
character is displayed as two glyphs that straddle the glyph of
multi-script-compiler was used to make it fully fledged and
another character.
Simple non-Unicode fonts used for mapping complex adhering to Firefox extension's XPToolkit architecture and
scripts are often rather limited in terms of glyphs and Extension Component Interactions.
ligatures, and sometimes use ugly tricks like building
characters from pieces to render barely passable text. By
contrast, an implementation based on a proper coded Unicode
character set can fully use a good font not subject to the
constraints of font mapping, resulting in better quality
rendering. Furthermore, the rendering is independent of the
font used, which means that improvements in the latter can be
leveraged against old documents without recoding them: they
simply display better.
So it is absolutely essential to emphasize on Unicode
Sinhala content creation on the word wide web and take
4
REFERENCES
[1] J. B. Disanayaka, "Samakālīna Siṃhala lēkhana vyākaraṇaya"
S. Goḍagē saha Sahōdarayō, 1995
[2] J. B. Disanayaka, "Siṃhala bhāṣāvē nava muhuṇuvara"
Saṃskrtika
̥ Kaṭayutu Depārtamēntuva, 1996
[3] Wasantha Deshapriya, “Sri Lankan Country Report on Local Language
Computing Policy”,Re-engineering Government Programme,
Information and Communication Technology Agency of Sri Lanka
[4] Dasun Sameera Weerasingha, “Sinhala Uniketha, Pasubima ha
thakshanika Pathikada”
[5] Sinhala Unicode Character Code Chart, (Available:
http://www.unicode.org/charts/PDF/U0D80.pdf)
[6] Jukka K. Korpela, 'Unicode Explained”, O'Reilly; 1st edition, 2006.
ISBN 0-596-10121-X
[7] The Unicode Consortium, 'The Unicode Standard”, Version 5.0, Fifth
Edition, Addison-Wesley Professional, 27 October 2006. ISBN
0-321-48091-0
[8] Alis Technologies inc. “<FONT FACE> considered harmful”, 1996,
(Available: http://alis.isoc.org/web_ml/html/fontface.en.html)
[9] Warren Steel ([email protected]), "What's wrong with the FONT
element?", 2003, (Available:
http://www.mcsr.olemiss.edu/~mudws/font.html)
Detailed component architecture of Firefox [10] Mark Pilgrim, "Dive Into Greasemonkey", 2005, (Available:
http://diveintogreasemonkey.org)
[11] Cheah Chu Yeow,"Firefox Secrets", 2005, SitePoint Pty. Ltd.
As of February 2010, Google Chrome has started providing [12] Kenneth C. Feldt, "Programming Firefox", 2007, O’Reilly Media, 1005
"native support" for greasemonkey scripts. They are internally Gravenstein Highway North, Sebastopol, CA 95472.
[13] Mark Pilgrim, "Greasemonkey Hacks", 2007, O’Reilly Media, 1005
converted to extensions, and are managed as such. Chrome Gravenstein Highway North, Sebastopol, CA 95472.
ignores @exclude metadata within the scripts, so the scripts [14] Mozilla Developer Center , “@font face”, (Available:
are executed for all domains/pages. On the other hand, https://developer.mozilla.org/en/CSS/@font-face)
Chromium honors the @include directives and executes the [15] SiyaBasScript – Project hosting on Google Code, (Available:
http://code.google.com/p/siyabasscript/downloads/list)
scripts only for the domains/pages specified.
So although, scripts that use one of the GM_setValue or
GM_getValue initiatives will break, and scripts that use the
popular E4X standard will not run, SiyaBasScript was ported
successfully to Google Chrome.
IV. CONCLUSION
ACKNOWLEDGMENT
The Author Keheliya Bandara Gallaba, specially thanks
and Sinhala Unicode Group, Dr. Shehan Perera, Senior
Lecturer at Department of Computer Science and Engineering