Context Sensitive Shape Substitution PDF
Context Sensitive Shape Substitution PDF
Abstract- Urdu is a widely used language in South 2000, Encyclopedia of Writing and [2]). Urdu
Asia and is spoken in more than 20 countries. In has also retained its Persio-Arabic influence in
writing, Urdu is traditionally written in Nastaliq the form of the writing style or typeface. Urdu is
script. Though this script is defined by well-formed written in Nastaliq, a commonly used
rules, passed down mainly through generations of
calligraphers, than books etc, these rules have not
calligraphic style for Persio-Arabic scripts.
been quantitatively examined and published in Nastaliq is derived from two other styles of
enough detail. The extreme context sensitive nature Arabic script ‘Naskh’ and ‘Taleeq’. It was
of Nastaliq is generally accepted by its writers therefore named Naskh-Taleeq which gradually
without the need to actually explore this shortened to “Nastaliq”.
hypothesis. This paper aims to show both. It first
performs a quantitative analysis of Nastaliq and
then explains its contextual behavior. This
behavior is captured in the form of a context
sensitive grammar. This computational model
could serve as a first step towards electronic
Typography of Nastaliq.
I. INTRODUCTION
Urdu is spoken by more than 60 million
speakers in over 20 countries [1]. Urdu is derived
from Arabic script. Arabic has many writing
styles including Naskh, Sulus, Riqah and Fig. 1. Urdu Abjad
Deevani. Urdu however is written in Nastaliq
script which is a mixture of Naskh and an old
obsolete Taleeq styles. This is far more complex II. POSITIONAL AND CONTEXTUAL FORMS
than the others. Arabic is a cursive script in which successive
Firstly, letters are written using a flat nib letters join together. A letter can therefore have
(traditionally using bamboo pens) and both four forms depending on its location or position
trajectory of the pen and angle of the nib define a in a ligature. These are isolated, initial, medial
glyph representing a letter. Each letter has and final forms. Consider the following table 1,
precise writing rules, relative to the length of the in which letter ‘bay’ indicated in gray has a
flat nib. Secondly, this cursive font is highly different shape when it occurs in a) initial, b)
context sensitive. Shape of a letter depends on medial, c) final and d) isolated position. Since
multiple neighboring characters. In addition it Urdu is an derived from Arabic script and
has a complex mark placement and justification Nastaliq is used for writing Urdu, both Urdu and
mechanism. This paper examines the context Nastaliq inherit this property.
sensitive behavior of this script and presents a
context sensitive grammar explaining it. TABLE 1
POSITIONAL FORMS FOR LETTER BAY
A. Urdu Script
The Urdu abjad is a derivative of the Persian
alphabet derived from Arabic script, which in
itself is derived from the Aramaic script (Encarta
Urdu, only half of them need to be looked into.
Letters ‘alif’, ‘dal’, ‘ray’ and ‘vao’ only have Note that only the characters that are used in
two forms. These letters cannot join from front place of multiple similar shapes are shown. The
with the next letter and therefore do not have an rest of the characters in the abjad are used
initial or medial forms. without any such similar-shape classification.
Nastaliq is far more complex than the 4-shape
phenomenon. In addition to position of character
TABLE 4
in a ligature, the character shape also depends on GROUPING OF LETTERS WITH SIMILAR BASE FORM
other characters of the ligature. Thus Nastaliq is Similar Base Forms Letter
inherently context sensitive. Table 2 below
shows a sample of this behavior in which a letter
bay, occurring in initial form in all cases, has
three different shape indicated in grey. This Also نand یin initial and
context sensitivity of Nastaliq can be captured by medial form
substitution grammar. This is discussed in detail
later in this paper.
TABLE 2
CONTEXTUAL FORMS FOR LETTER BAY IN INITIAL FORM
بMedi2 بMedi17
بÆ بMedi3 / __ ج
بÆ بMedi18 / __ ے
بMedi3
بMedi18
بÆ بMedi5 / __ رFinal2 بÆ بMedi23/ __ بMedi3
بMedi23
بMedi5
بÆ بInit6 / __ س
بÆ بMedi25/ __ بMedi5
بInit6 بMedi25
بÆ بMedi8 / __ ص