0% found this document useful (0 votes)

208 views14 pages

AComputationalgrammarof Sinhala

The document describes research conducted to develop a computational grammar for Sinhala. The grammar was created using a context-free grammar framework with linguistic features. It covers a significant subset of Sinhala sentences based on testing with 200 sentences from primary grade textbooks. The grammar accounted for 60% coverage of the test sentences.

Uploaded by

Sivakumar Nishanthan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

208 views14 pages

AComputationalgrammarof Sinhala

Uploaded by

Sivakumar Nishanthan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/235931895

A Computational Grammar of Sinhala

Conference Paper · March 2012

DOI: 10.1007/978-3-642-28604-9_16

CITATIONS READS

14 20,475

4 authors:

Chamila Liyanage Randil Pushpananda

University of Colombo University of Colombo
25 PUBLICATIONS 106 CITATIONS 18 PUBLICATIONS 128 CITATIONS

SEE PROFILE SEE PROFILE

Dulip Herath Ruvan Weerasinghe

University of Colombo University of Colombo
7 PUBLICATIONS 83 CITATIONS 127 PUBLICATIONS 659 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Automatic Speech Recognition for Sinhala View project

Detecting Hate Speech in Social Media Articles in Romanized Sinhala View project

All content following this page was uploaded by Chamila Liyanage on 29 December 2021.

The user has requested enhancement of the downloaded file.

A Computational Grammar of Sinhala

Chamila Liyanage1, Randil Pushpananda1, Dulip Lakmal Herath2,

and Ruvan Weerasinghe1
1,2
University of Colombo School of Computing, 35, Reid Avenue,
Colombo 00700, Sri Lanka
{cml,rpn,arw}@ucsc.lk, [email protected]

Abstract. A Computational Grammar for a language is a very useful resource

for carrying out various language processing tasks for that language such as
Grammar checking, Machine Translation and Question Answering. As is the
case in most South Indian Languages, Sinhala is a highly inflected language
with three gender forms and two number forms among other grammatical
features. While piecemeal descriptions of Sinhala grammar is reported in the
literature, no comprehensive effort to develop a context-free grammar (CFG)
has been made that has been able to account for any significant coverage of the
language. This paper describes the development of a feature-based CFG for
non-trivial sentences in Sinhala. The resulting grammar covers a significant
subset of Sinhala as described in a well-known grammar book. A parser for
producing the appropriate parse tree(s) of input sentences was also developed
using the NLTK toolkit. The grammar also detects and so rejects
ungrammatical sentences. Two hundred sample sentences taken from primary
grade Sinhala grammar books were used to test the grammar. The grammar
accounted for 60% of the coverage over these sentences.

Keywords: Natural Language Processing, Context Free Grammar, Sinhala

Grammar, Computational Grammar.

1 Introduction

Sinhala is the official language of Sri Lanka and it is the language spoken by a
majority of Sri Lankans – nearly 70% of the population [7]. From a historical point of
view, Sinhala is a modern Indo-Aryan language, which is related to the Vedic
language or Old Sanskrit in India [10]. Modern Sinhala has subsequently gained
through its association with Tamil, English, Portuguese, and Dutch [8]. There are two
main varieties of Sinhala based on its usage, namely the literary and the spoken,
which differ from each other in important ways [5]. In addition Sinhala has an
alphasyllabary writing system, also called abugida; it is a segmental writing system
in which consonant-vowel sequences are written as single units [18].
Natural Language Processing (NLP) is an area of research that explores how
computers can be used to understand and manipulate natural languages [16].
Currently there are many research areas related to NLP in Sinhala such as Speech

A. Gelbukh (Ed.): CICLing 2012, Part I, LNCS 7181, pp. 188–200, 2012.
© Springer-Verlag Berlin Heidelberg 2012
A Computational Grammar of Sinhala 189

Processing, Machine Translation, Information Retrieval, Text Summarization among

others. Developing a computational grammar for Sinhala can profit such efforts.
Therefore in this research we report work carried out in developing a feature-based
context-free grammar for Sinhala using the open source Natural Language Tool Kit,
NLTK [3].

2 Related Work

Very little research has been reported in the literature on efforts to develop a formal
grammar for Sinhala. The following section describes a brief survey on grammar
development reported for Indic languages including Sinhala.
Hettige and Karunananda have implemented a computational model of grammar
for Sinhala [9]. Morphological and syntactic analysis of Sinhala has been considered
in this work, which is modeled using a Finite State Transducer (FST) and a Context-
Free grammar. Developed as part of a Machine Translation system, the parser in this
system handles only simple sentences containing 8 constituents, namely, Attributive
adjunct of Subject, Subject, Attributive adjunct of Object, Object, Attributive adjunct
of Predicate, Attributive adjunct of the complement of predicate, Complement of
predicate and Predicate.
In a research carried out for the Kannada language by Sagar et al., noun phrase –
verb phrase agreement in Kannada sentences has been modeled [17]. They have
classified noun phrases in to three sub categories as adjective noun, noun and
pronoun, but they have only considered the gender and number as features of the
grammar. Similar to the case of the Sinhala language, Kannada verbs need to agree
with the subject of their sentences in number and gender. Therefore, the suffix of the
verb is extracted to check masculine, feminine and plural verb endings. Here they
have used the context free grammar (CFG) to write the grammar rules and used
Python as the programming language. A Recursive Descent Parser, a simple top down
parser, from NLTK has been used to test the grammar. This is limited to resolve noun
– verb agreement and indicate whether the sentence is syntactically acceptable or not.
Sagar et al carried out another research which highlights the process of generating
a Context Free Grammar for simple Kannada sentences [16]. Here they have checked
the sentences with both a Top-Down (Recursive Descent) Parser and a Bottom-Up
(Shift-Reduce) Parser. According to the authors, two conflicts; Shift-Reduce and
Reduce-Reduce occurred when the sentences were parsed using the Bottom-Up
parser. Therefore the Top-Down parser was selected as the more suitable parser to
parse the given sentences.
Mosaddeque and Haque have done a research to propose a way of producing a
context-free grammar for the Bangla [2]. This work reports that only sentences of
seven to eight words in length are used for testing. They have taken 10 ad hoc
sentences from a newspaper article as the basis for designing the grammar. They have
then tagged all the words in the sentences with their respective parts-of-speech (POS)
tags and used NLTK’s Shift-Reduce Parser to test the grammar. Only one sentence
has been successfully parsed of these ten sentences.
190 C. Liyanage et al.

Naira Khan and Mumit Khan have implemented a Computational Grammar for
Bengali using the Head-Driven Phrase Structure Grammar (HPSG) formalism [14].
The Linguistic Knowledge Building (LKB) system was used to implement this
grammar, which allows the user to build a parser along with a generator. A set of
instructions for using the HPSG formalism to parse the grammar and to generate
grammatical sentences of Bengali is given in this paper.

3 Structure of Sinhala
Sinhala is a free word order language. Its unmarked word order is SOV; variant orders
are also possible with discourse – pragmatic effects. A sentence can have all the
possible orders of the main constituents with proper intonation [11]. Figure 1 shows
all the free word order forms of the English sentence “Father hit the younger brother
with a stick”.

i. තාත්තා | මල්ලීට | ෙකෝටුවකින් | ගැසුෙව් ය.

Sinhala is a head-final language, in which the complements and modifiers appear

before their heads [11].

(NP) ගෙම් මිනිස්සු

/game: minissu/
Village-GENITIVE people
‘People of the village’
A Computational Grammar of Sinhala 191

(ADJP) ෙබොෙහොම ලස්සන

/bohomǝ lassǝnǝ/
Much beautiful
Very beautiful

(VP) ෙසමින් කියවයි

/semin kiyǝvayi/
Slowly Read-non past/3rd person singular
Read slowly

Traditionally, a sentence is divided in to two parts; Noun Phrase (NP), and Verb
Phrase (VP). In Sinhala grammar, uktha (subject) and akyatha (predicate) are the two
parts of a sentence. Subject and predicate in Sinhala sentences agree in number,
gender and person [12].
The studies of sentence structures of Sinhala have been made by a number of
scholars [8] [1] [4]. According to Abayasingha [1] Sinhala has 25 types of simple
sentence structures. However in the present work, we have covered only the main
sentence structures and a few complex structures. These are described in the
following sections.

3.1 Noun Phrase

The Noun Phrase, denoted by NP, can be a common noun (N), pronoun (PrN) or a
proper noun (PropN). In addition to the head noun, the Sinhala noun phrase consists
of adjectival phrases and determiners. Sinhala NP has a very complex grammatical
structure. It can consist of various clause structures, such as adjectival clauses,
relative clauses, and subordinate clauses. Therefore building a computational
grammar, covering all the NP structures is complex. Figure 2 below shows the NP
structure we have covered in the grammar developed in this work.
An adjectival phrase (ADJP) is constructed with adjectives. According to Sinhala
grammar, an adjectival phrase comes before the Noun (N) and after the Determiner
(Det), if there is any determiner in the noun phrase. If the adjective is a qualitative
adjective, then it can be constructed with Degrees (Deg) to intensify its meaning.

Det Deg ADJ N

Fig. 2. structure of the NP in Sinhala

In the traditional grammar of Sinhala, nama visheshana (adjectives) denote some

quality or attribute of the noun. It can be divided into three classes, namely
qualitative, quantitative and demonstrative [8]. However in our grammar we do not
consider the features of the ADJP. The words which denote the degree of the
adjectives are added only before qualitative adjectives. i.e. the ADJP ‘ඉතා ෙහොඳ’ /itӁa:
192 C. Liyanage et al.

hondӁǝ/ (very good) is a adjectival phrase and it consists of a degree ‘ඉතා’ /itӁa:/ (very)
and a qualitative adjective ‘ෙහොඳ’ /hondӁǝ/ (good), which appears before the adjective.

3.2 Verb Phrase

According to generative grammar, a Verb Phrase (VP) is a phrase headed by a verb.
In addition to the verb, it consists of noun phrases and Adverbial Phrases (ADVP).
The verb in Sinhala can be categorized as single verbs, compound verbs and auxiliary
verbs. In this grammar we only consider single verbs. Generally ADVP occurs before
the verb in Sinhala sentences. However according to the features of the adverb, the
position where the adverb occurs is decided. Figure 3 below shows the VP structure in
Sinhala and what we have covered in the grammar. According to the structure, the
verb appears in the final position. ADVPs may appear both before the verb and after
the NP. If an adverb of manner occurs in the ADVP, the adverb can be combined with
degrees to intensify the meaning. For example ‘ඉතා ෙව්ගෙයන්’ /itӁa: ve:gǝyen/ (very
fast) is an ADVP which appears as an adverb of manner.

NP ADVP Deg ADV V

Fig. 3. structure of the VP in Sinhala

4 Grammatical Features of Sinhala

According to the Sinhala language the noun is inflected for number, gender, person,
tense, case and definiteness. The verb is inflected for number, gender, person, tense
and volition. Subject and predicate agree for the features of number, gender and
person.

4.1 Grammatical Features of the NP

The words that are marked for grammatical features linga (gender), vachana
(number), niyatha-aniyatha (definiteness) and vibhakthi (case), are recognized as
nouns in Sinhala [12]. Therefore in developing the grammar, we consider the features
of number, gender, case and definiteness for common nouns; number, gender and case
for proper nouns; and number, gender, case and person for pronouns.
As a highly inflected language, common nouns in Sinhala are inflected for number,
definiteness and case. Sinhala nouns are also divided into animate and inanimate
classes on the basis of their inflection. Animate nouns inflect for number (singular
and plural), definiteness (definite and indefinite) and five cases (nominative,
accusative, dative, genitive, and instrumental). The definiteness distinction applies
only in the singular form of nouns [6].
A Computational Grammar of Sinhala 193

As enumerated in Table 1, Sinhala nouns, which are inflected for five cases, have five
forms. However the same form may occur in several cases for nouns. For example
form 5 can occur in the cases instrumental, ablative and auxiliary. Therefore we have
defined the five cases in this grammar specification. All inflections relating to animate
nouns are shown in the table.

Table 1. Examples for inflections of animate common nouns

Form Case Singular

Masculine Feminine Plural
Def. Indef. Def. Indef.
/minࡧ isa:/ /minࡧ isek/ /kella/ /Kellak//kellek/ /minࡧ issu/
1 Nominative (the man) (a man) (the girl) (a girl) (men)
/minࡧ isa:/ /minࡧ iseku/ /kella/ /kellak‫ۑ‬/ /minࡧ isun/
2 Accusative (the man) (a man) (the girl) (a girl) (men)
/kelleku‫ۑݚ‬/
/minࡧ isa:‫ۑݚ‬/ /minࡧ iseku‫ۑݚ‬/ /kella‫ۑݚ‬/ /minࡧ isun‫ۑݚ‬/
3 Dative (to the man) (to a man) (to the girl)
kellak‫ۑݚۑ‬
(to men)
(to a girl)
/minࡧ isa:ge:/ /minࡧ isekuge:/ /kellage:/ /kellak‫ۑ‬ge:/ /minࡧ isunge:/
4 Genitive (the man’s) (a man’s) (the girl’s) (a girl’s) (men’s)
/minࡧ isa:genࡧ / /minࡧ isekugenࡧ / /kellagenࡧ / /kellak‫ۑ‬genࡧ / /minࡧ isunࡧ genࡧ /
5 Instrumental (from the man) (from a man) (from the girl) (from a girl) (from men)

Sinhala inanimate nouns are inflected similarly to the animate nouns for number
and definiteness. However they only have four cases – direct, dative, genitive and
instrumental [6]. Forms 3, 4, and 5 in Table 2 are similar to those in Table 1. However
form 1 accounts for the direct case. Table 2 shows all the inflections for an inanimate
noun as covered in the grammar developed.

Table 2. Examples for inflections of inanimate common nouns

Singular
Form Case Definite Indefinite Plural

/gas‫ۑ‬/ /gasak/ /gas/

1 Direct (the tree) (a tree) (trees)
/gas‫ۑݚۑ‬/ /gas‫ۑ‬k‫ۑݚۑ‬/ /gasw‫ۑ‬l‫ۑݚۑ‬/
3 Dative (to the tree) (to a tree) (to the trees)
/gase:/ /gasehi/ /gas‫ۑ‬k‫ۑ‬/ /gasw‫ۑ‬l‫ۑ‬/
4 Genitive on the tree on a tree on the trees
/gasenࡧ / /gasinࡧ / /gas‫ۑ‬kinࡧ / /gasw‫ۑ‬linࡧ /
5 instrumental (from the tree) (from a tree) (from trees)

Determiners of Sinhala do not carry any grammatical features and can engage with
any noun without agreement of features. Therefore grammatical features for
determiners were not considered. e.g. ‘ඒ’ /e:/ is a determiner of Sinhala which
combines with any noun without considering grammatical features of number, gender
or case. The noun phrases ‘ඒ ළමයා’ /e: lamǝja:/ (that child) and ‘ඒ ළමයි’ /e: lamaji/
(those children) differ in the number feature, but have the identical determiner ‘ඒ’.
194 C. Liyanage et al.

4.2 Grammatical Features of the VP

Number, gender, tense, person and volition are considered as the grammatical features
of the verb phrase (VP) in Sinhala. There are two tenses in Sinhala; past and non-past.
The non-past form can refer either to past or future. The future tense is expressed
using time adverbials. Therefore the single form that is used to denote both tenses
Present and Future is termed non-past. For example ‘යයි’/jaji/ is a verb form in
Sinhala which means ‘goes’ and carries the grammatical features singular, 3rd person,
non-past. i.e. the sentence ‘ඔහු පාසල් යයි’ /ohu pa:sal jaji/ (he goes to school) is in the
present tense. If we add the time adverbial ‘ෙහට’/heʈǝ/ to denote the future, then the
sentence would be in future tense; ‘ඔහු ෙහට පාසල් යයි’ /ohu heʈǝ pa:sal jaji/ (he will
go to school tomorrow). In these two sentences the verb ‘යයි’ /jaji/ is a pure verb. In
addition to the pure form, the form which used to denote the non-past tense is called
the krudantha. We can change the above sentence with a krudantha form as ‘ඔහු ෙහට
පාසල් යන්ෙන්ය’ /ohu heʈǝ pa:sal janӁnӁe:jǝ/. In this sentence ‘යන්ෙන්ය’ /janӁnӁe:jǝ/ is
similar to the form ‘යයි’. However, according to Kekulawala [13], the krudantha form
is non-past; which uses -න්ෙන- /-nӁnӁe-/ suffix, is used to denote the future tense in
Sinhala. In modern Sinhala writings, krudantha forms are used more frequently than
the pure forms in both past and non-past tenses. In this grammar, we considered the
tense as past and non-past other than past, present and future.
Volition (VLT) is another feature of the verb which in our grammar is considered
to be either true or false. For example ‘යයි’ is a volitive form of the verb “go”, while
its equivalent involitive form is ‘යැෙවයි’ /jæveji/. The other features of number,
gender and person are the same as their equivalents in the NP. Figure 4 gives an
overview of the grammatical features of the Sinhala Verb.

Verb

Tense Number Gender Person Volition

Past Non-past Singular Plural 1st 2nd 3rd True False

Masculine Feminine Neuter

Fig. 4. Grammatical Features of the Verb Phrase

5 The Sinhala CFG

According to Abhayasinghe (1998) there are 25 types of simple sentence structures in
Sinhala [1]. In this research, we considered the following ten types of sentence
structures in developing a CFG for Sinhala.
A Computational Grammar of Sinhala 195

After analyzing each of the above sentences, all their constituents were identified.
According to this constituent structure, separate CFG productions were generated for
each type of sentence. After identifying all the grammar rules needed to cover the
196 C. Liyanage et al.

phenomena above, they were merged together to form and optimize a generic CFG
for Sinhala. In addition, some more complexity in the grammatical rules was also
introduced to the Sinhala CFG in order to increase its overall coverage. The
grammatical and lexical productions of the CFG developed are given below.

Grammar Productions

---------------------------- S expansion productions--------------------------------

S -> NP[NUM=?n, GEN=?G, PER=?P, DEF=?TF, CASE=F1] VP[NUM=?n, GEN=?G, PER=?P, CASE=?CS]
S -> NP[NUM=?n, GEN=?G, PER=?P, DEF=?TF, CASE=F3] VP[NUM=?n, GEN=?G, PER=?P, CASE=?CS]
S -> NP[NUM=?n, GEN=?G, PER=?P, DEF=?TF, CASE=F4] VP[NUM=?n, GEN=?G, PER=?P, CASE=?CS]
S -> NP[NUM=?n, GEN=?G, PER=?P, DEF=?TF, CASE=F5] VP[NUM=?n, GEN=?G, PER=?P, CASE=?CS]
----------------------------- NP expansion productions------------------------------
NP[NUM=?n, CASE=?CS, GEN=?G, DEF=?TF] -> N[NUM=?n, CASE=?CS, GEN=?G]
NP[NUM=?n, CASE=?CS, GEN=?G, PER=?P] -> PrN[NUM=?n, CASE=?CS, GEN=?G, PER=?P]
NP[NUM=?n, CASE=?CS] -> PropN[NUM=?n, CASE=?CS]
NP[NUM=?n, CASE=?CS, GEN=?G, DEF=?TF] -> Det N[NUM=?n, CASE=?CS, GEN=?G]
NP[NUM=?n, CASE=?CS, GEN=?G, DEF=?TF] -> ADJP N[NUM=?n, CASE=?CS, GEN=?G]
NP[NUM=?n, CASE=?CS, GEN=?G, DEF=?TF] -> Det ADJP N[NUM=?n, CASE=?CS, GEN=?G]
----------------------------- VP expansion productions ------------------------------
VP[TENSE=?t, NUM=?n, GEN=?G, PER=?P] -> IV[TENSE=?t, NUM=?n, GEN=?G, PER=?P]
VP[TENSE=?t, NUM=?n, GEN=?G, PER=?P] -> TV[TENSE=?t, NUM=?n, GEN=?G, PER=?P]
VP[TENSE=?t, NUM=?n, GEN=?G, PER=?P] -> NP TV[TENSE=?t, NUM=?n, GEN=?G, PER=?P]
VP[TENSE=?t, NUM=?n, GEN=?G, PER=?P] -> NP IV[TENSE=?t, NUM=?n, GEN=?G, PER=?P]
VP[TENSE=?t, NUM=?n, GEN=?G, PER=?P] -> NP NP TV[TENSE=?t, NUM=?n, GEN=?G, PER=?P]
VP[TENSE=?t, NUM=?n, GEN=?G, PER=?P] -> NP NP IV[TENSE=?t, NUM=?n, GEN=?G, PER=?P]
VP[TENSE=?t, NUM=?n, GEN=?G, PER=?P] -> NP NP ADVP TV[TENSE=?t, NUM=?n, GEN=?G, PER=?P]
VP[TENSE=?t, NUM=?n, GEN=?G, PER=?P] -> ADVP IV[TENSE=?t, NUM=?n, GEN=?G, PER=?P]
VP[TENSE=?t, NUM=?n, GEN=?G, PER=?P] -> ADVP TV[TENSE=?t, NUM=?n, GEN=?G, PER=?P]
VP[TENSE=?t, NUM=?n, GEN=?G, PER=?P] -> NP ADVP IV[TENSE=?t, NUM=?n, GEN=?G, PER=?P]
VP[TENSE=?t, NUM=?n, GEN=?G, PER=?P] -> NP ADVP TV[TENSE=?t, NUM=?n, GEN=?G, PER=?P]
------------------------------ADJP expansion productions-----------------------------
ADJP -> Adj
ADJP -> Adj ADJP
----------------------------- ADVP expansion productions-----------------------------
ADVP -> Adv
ADVP -> Adv ADVP
------------------------------Sample Lexical Productions-------------------------------
N[NUM=sg, GEN=MA, CASE=F1, DEF=TRue] -> 'බල්ලා' | 'මිනිසා' | 'ළමයා' | 'තාත්තා' | 'හිඟන්නා' | 'මල්ලී' | 'අයියා'
N[NUM=sg, GEN=MA, CASE=F2, DEF=TRue] -> 'බල්ලා' | 'මිනිසා' | 'ළමයා' | 'ෙකොල්ලා' | 'තාත්තා' | 'සර්පයා'
N[NUM=sg, GEN=MA, CASE=F3, DEF=TRue] -> 'බල්ලාට' | 'මිනිසාට' | 'ළමයාට' | 'පුතාට' | 'සර්පයාට' | 'අයියාට'
N[NUM=sg, GEN=MA, CASE=F1, DEF=False] -> 'බල්ෙලක්' | 'මිනිෙසක්' | 'ළමෙයක්' | 'ෙකොල්ෙලක්' | 'හිඟන්ෙනක්' | 'පුෙතක්'
N[NUM=sg, GEN=NE, CASE=F1, DEF=False] -> 'තෑග්ගක්' | 'රුපියලක්' | 'වත්තක්' | 'සින්දුවක්' | 'ෙපොතක්' | 'ෙගඩියක්'
N[NUM=sg, GEN=NE, CASE=F3, DEF=TRue] -> 'ගසට' | 'මලට' | 'අත්තට' | 'බතට' | 'ෙගදරට' | 'ෙපොතට' | 'ෙකෝටුවට' | 'බිමට'
N[NUM=sg, GEN=NE, CASE=F5, DEF=TRue] -> 'ගෙසන්' | 'මෙලන්' | 'ෙගදරින්' | 'ෙපොතින්' | 'ෙකෝටුෙවන්' | 'ෙපොරෙවන්'
N[NUM=sg, GEN=NE, CASE=F1, DEF=TRue] -> 'ගස' | 'මල' | 'අත්ත' | 'අඹ' | 'ෙගදර' | 'ෙපොත' | 'ෙකෝටුව' | 'කළය' | 'සඳ'
N[NUM=pl, GEN=MA, CASE=F1] -> 'බල්ෙලෝ' | 'මිනිස්සු' | 'ළමයි'
N[NUM=pl, GEN=FE, CASE=F2] -> 'ගැහැණුන්' | 'ෙකල්ලන්'
PrN[NUM=sg, CASE=F3, PER=F] -> 'මට'
PrN[NUM=sg, CASE=F5, PER=F] -> 'මෙගන්' | 'මාෙගන්'
PrN[NUM=pl, CASE=F1, PER=T] -> 'ඔවුහු' | 'ඒෙගොල්ෙලො'
Det -> 'ඒ' | 'ෙම්' | 'අර' | 'ඔය' | 'සමහර' | 'ඇතැම්'
Adj -> 'ලස්සන' | 'කැත' | 'මහත' | 'සුදු' | 'කලු' | 'ෙලොකු' | 'ෙපොඩි' | 'පුංචි' | 'උස'
Adv -> 'පන්සල්' | 'ෙගදර' | 'පාසලට' | 'නගරයට' | 'ෙව්ගෙයන්' | 'ලස්සනට' | 'ෙහොඳට' | 'ඉක්මනින්' | 'ෙසෙමන්'
A Computational Grammar of Sinhala 197

Following are the parse trees that have been produced using the Recursive Decent
parser from the NLTK toolkit [3].

----------------------------------Sentence 3--------------------------------
(S[]
(NP[CASE='F1', DEF=?TF, GEN='MA', NUM='pl']
(N[CASE='F1', GEN='MA', NUM='pl'] බල්ෙලෝ))
(VP[GEN=?G, NUM='pl', PER='T', TENSE='pres']
(IV[NUM='pl', PER='T', TENSE='nPast', +VLT] බුරති)))

-----------------------------------Sentence 5--------------------------------
(S[]
(NP[CASE='F1', DEF=?TF, GEN='MA', NUM='sg']
(N[CASE='F1', DEF='TRue', GEN='MA', NUM='sg'] තාත්තා))
(VP[GEN='MA', NUM='sg', PER='T', TENSE='pres']
(NP[CASE='F3', DEF=?TF, GEN='MA', NUM='sg']
(N[CASE='F3', DEF='TRue', GEN='MA', NUM='sg'] පුතාට))
(NP[CASE='F1', DEF=?TF, GEN='NE', NUM='sg']
(N[CASE='F1', -DEF, GEN='NE', NUM='sg'] තෑග්ගක්))
(TV[GEN='MA', NUM='sg', PER='T', TENSE='nPast', +VLT]
ෙදයි)))

--------------------------------------Sentence 9--------------------------------
(S[]
(NP[CASE='F3', GEN=?G, NUM='sg', PER='F']
(PrN[CASE='F3', NUM='sg', PER='F'] මට))
(VP[GEN=?G, NUM='sg', PER=?P, TENSE='pres']
(NP[CASE='F1', DEF=?TF, GEN='NE', NUM='sg']
(N[CASE='F1', -DEF, GEN='NE', NUM='sg']
සින්දුවක්))
(TV[NUM='sg', TENSE='nPast', -VLT] ඇෙසයි)))

6 Evaluation and Results

In order to test and evaluate the grammar, two hundred sample sentences taken from
primary grade Sinhala Grammar books [19] [20] were used. According to the test, 118
sentences were parsed the grammar correctly and 82 were not parsed. Out of 82
198 C. Liyanage et al.

sentences two sentences are structured incorrectly and therefore they were restricted
from the grammar. Several sentences were not parsed because of the free word order.
For example, in this grammar ADVP is used before the verb and after the NP.
However, the sentences which have ADVP at the beginning were also not parsed
through the grammar.
If an inanimate noun occurs in the subject NP, it does not agree on number with the
predicate VP. i.e. the following sentence ‘මල පිෙපයි’ /malǝ pipeji/ (the flower
blooms) contains a singular NP and singular VP, while ‘මල් පිෙපයි’ /mal pipeji/
(flowers bloom) contains a plural NP and singular VP. According to Sinhala
language, both of these sentences are correct. However the second type of sentences;
which does not consider the number, has not been covered in this grammar. Sentences
which have compound verbs, auxiliary verbs, present participles, past participles, the
verbs which have imperative mood and negation of the verbs are also not parsed
through this grammar.

The test results are shown below.

Total Number of Sentences 200
Correct sentences parsed 118
Correct sentences not parsed 80
Incorrect sentences not parsed 2
According to the result, accuracy of this grammar is 60%.

7 Discussion
Free Word order
The grammar developed covers the default Sinhala sentence structure in the SOV
order. The first two sentences of Figure 1 are in SOV order, and only they can be
successfully parsed using the grammar developed. The rest of the sentence structures
can’t be parsed using the existing grammar. In natural language processing,
dependency grammars are used to solve the free word-order problem.

Word segmentation
In written Sinhala there is no unique method for word segmentation. The linguistics
literature reports on collections of rules for segmenting Sinhala words [15]. However
most users of the language are not aware of these rules and do not follow them closely
for word segmentation. For example the word-ending particle ‘ය’ is often used
inconsistently. The Sinhala language has two types of verbs, namely shudda kriya
‘pure verbs’ and krudanta kriya ‘participial verbs’. When a participial verb occurs in
the sentence ending position there are two ways to write it. One is by separating the
sentence-ending particle as in the case of ‘ගිෙය් ය’ “(he) went” and adding it to the
participial verb as ‘ගිෙය්ය’. Owing to this, it is desirable to have a word segmentation
algorithm to check whether the text is in a normalized form before the CFG parser is
employed.
A Computational Grammar of Sinhala 199

Non verbal sentences

There are number of sentence structures in Sinhala which do not contain a verb. These
types of sentences end with adjectives, oblique nominals, locative predicates and
adverbials among others, and the current grammar does not cover such non-verbal
sentences of Sinhala.

8 Conclusion and Future Work

This paper describes the development of a CFG for a non-trivial subset of Sinhala
using the NLTK toolkit. Ten simple sentence structures were selected and used to
design the grammar. Two hundred simple sentences were used to test the grammar
and 60% sentences were analyzed accurately the parser. In the future, it is hoped to
use a morphological analyzer and a word segmentation algorithm to develop a more
wide-coverage grammar for Sinhala.

Acknowledgment. We are grateful to all the members of Language Technology

Research Laboratory of the University of Colombo School of Computing, Sri Lanka,
who helped in various ways to make this work bear fruit.

References
1. Abhayasinghe, A.A.: Sinhala bhashave sarala vakya vibagaya (1998)
2. Ayesha Binte Mosaddeque, A.B., Haque, N.: Context-Free Grammar for Bangla. BRAC
University, Dhaka
3. Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text
with the Natural Language Toolkit. O’Reilly Media (2009)
4. Disanayaka, J.B.: Bashavaka rata samudaya. Lake house investment Co. Ltd., Colombo 2
(1969)
5. Fairbanks, G.H., Gair, J.W., Silva, M.W.S.D.: Colloquial Sinhalese. Cornell University,
New York (1968)
6. Gair, J.W., Karunatilaka, W.S.: Literary Sinhala inflected forms: A Synopsis with a
Translation Guide to Sinhala script. Cornell University, New York
7. Gair, J.W., Karunatilaka, W.S.: Literary Sinhala. Cornell University, New York (1974)
8. Gunasekara, A.M.: A Comprehensive Grammar of the Sinhalese Language. Godage
International Publishers (PVT) Ltd. (2008)
9. Hettige, B., Karunananda, A.S.: Computational Model of Grammar for English to Sinhala
Machine Translation. In: Proceedings of the International Conference on Advances in ICT
for Emerging Regions (2011)
10. Jayawardhane, T.: The surface case system in Sinhala. KALYANI, pp. 264–277.
University of Kelaniya (1996)
11. Kariyakarawana, S.M.: The Syntax of Focus and Wh-Questions in Sinhala. Karunaratne &
Sons Ltd. (1998)
12. Karunatilaka, W.S.: Sinhala bhasha vyakaranaya. M. D. Gunasena & Co. Ltd. (2009)
13. Kekulawala, S.L.: The future tense in Sinhalese – an ‘unorthodox’ point of view. Journal
of the Vidyalankara University of Ceylon (1972)
200 C. Liyanage et al.

14. Khan, N., Khan, M.: Developing a Computational Grammar for Bengali Using the HPSG
Formalism. In: Proceedings of the 9th International Conference on Computer and
Information Technology, ICCIT 2006 (2006)
15. Rajapaksha, D.: Sinhala bhashave pada bedima saha virama lakshana bhavithaya (2008)
16. Sagar, B.M., Shobha, G., Kumar, R.: Context Free Grammar (CFG) Analysis for simple
Kannada sentences. In: Proceedings of the International Conference [ACCTA-2010] on
Special Issue of IJCCT, vol. 1(2, 3, 4) (2010)
17. Sagar, B.M., Shobha, G., Kumar, R.: Solving the Noun Phrase and Verb Phrase Agreement
in Kannada Sentences. International Journal of Computer Theory and Engineering 1(3)
(August 2009)
18. Wikipedia (English), http://en.wikipedia.org/wiki/Sinhala_language
19. Dasanayaka, A.E.S.: Kumara rachanaya; Grade 4. M. D. Gunasena & Co. Ltd. (1990)
20. Dasanayaka, A.E.S.: Kumara rachanaya; Grade 5, M. D. Gunasena & Co. Ltd. (2005)

View publication stats

Ittt Unit 5 Answers
100% (2)
Ittt Unit 5 Answers
8 pages
DLL - English 4 - Q4 - W5
No ratings yet
DLL - English 4 - Q4 - W5
6 pages
دليل اول ثانوي
100% (1)
دليل اول ثانوي
159 pages
Natural Language Processing (NPL) : Group Name: Goal Diggers
No ratings yet
Natural Language Processing (NPL) : Group Name: Goal Diggers
22 pages
Body Language - An Effective Communication Tool
No ratings yet
Body Language - An Effective Communication Tool
7 pages
Lesson 6
No ratings yet
Lesson 6
35 pages
Lesson Plan Module 2 Plot Synopsis
No ratings yet
Lesson Plan Module 2 Plot Synopsis
8 pages
Part 1-Written Task 1: Creative Text and Rationale
No ratings yet
Part 1-Written Task 1: Creative Text and Rationale
17 pages
At The Beginning Stages of Language Study, Listening That
No ratings yet
At The Beginning Stages of Language Study, Listening That
14 pages
REDLABEL - Get Unlimited Models On IG
100% (2)
REDLABEL - Get Unlimited Models On IG
22 pages
Whole Group Lesson Plan
No ratings yet
Whole Group Lesson Plan
5 pages
Jones 2016 Digital Literacies in Hinkle Ed PDF
No ratings yet
Jones 2016 Digital Literacies in Hinkle Ed PDF
14 pages
Норми редагування перекладів
No ratings yet
Норми редагування перекладів
11 pages
Paper 4
No ratings yet
Paper 4
14 pages
NLP Chapter 1
No ratings yet
NLP Chapter 1
32 pages
Sto. Rosario Montessori School: Condition Sample Space
No ratings yet
Sto. Rosario Montessori School: Condition Sample Space
3 pages
CourseMarial - 32b8bcommunicative French - II
No ratings yet
CourseMarial - 32b8bcommunicative French - II
4 pages
2023 Dravidianlangtech-1
No ratings yet
2023 Dravidianlangtech-1
330 pages
English Vocabulary in Use Advanced With Answers
No ratings yet
English Vocabulary in Use Advanced With Answers
9 pages
ED Pronunciation EnglishwithLucy
No ratings yet
ED Pronunciation EnglishwithLucy
14 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
63 pages
Lesson 1 Greetings
No ratings yet
Lesson 1 Greetings
2 pages
NLP Notes (Ch1-5) PDF
100% (1)
NLP Notes (Ch1-5) PDF
41 pages
Deep Learning For Dravidian Codemix Problem
No ratings yet
Deep Learning For Dravidian Codemix Problem
10 pages
Vocabulary-On The Banks of Plum Creek Lesson Plan
No ratings yet
Vocabulary-On The Banks of Plum Creek Lesson Plan
3 pages
Seminar ON: Natural Language Processing
100% (1)
Seminar ON: Natural Language Processing
28 pages
Second Language Acquisition Chapter 2
No ratings yet
Second Language Acquisition Chapter 2
3 pages
JEsseie
No ratings yet
JEsseie
6 pages
Final Research Paper
100% (1)
Final Research Paper
5 pages
Noun-Verb Agreement Kannada
No ratings yet
Noun-Verb Agreement Kannada
5 pages
Journalpaperpublished12005 34172 1 PB
No ratings yet
Journalpaperpublished12005 34172 1 PB
7 pages
Temp Research Paper
No ratings yet
Temp Research Paper
5 pages
Pinakafinal References
No ratings yet
Pinakafinal References
10 pages
SIOP Model Making Content Comprehensible - Complete Document
100% (1)
SIOP Model Making Content Comprehensible - Complete Document
9 pages
Project Report
No ratings yet
Project Report
12 pages
NLP-Lect 4-01.02.2021
No ratings yet
NLP-Lect 4-01.02.2021
16 pages
NLP Digital Notes
No ratings yet
NLP Digital Notes
128 pages
Parsing Bangla Grammar Using Context Free Grammar
No ratings yet
Parsing Bangla Grammar Using Context Free Grammar
19 pages
Natural Language Processing
No ratings yet
Natural Language Processing
5 pages
Unit 4 NLP Notes
No ratings yet
Unit 4 NLP Notes
35 pages
How To Deliver Criticism So Employees Pay Attention
No ratings yet
How To Deliver Criticism So Employees Pay Attention
6 pages
Grammar Tules
No ratings yet
Grammar Tules
5 pages
AI Excite Class 6 Second Period Natural Language Processing
No ratings yet
AI Excite Class 6 Second Period Natural Language Processing
3 pages
Personalized Learning Activity
No ratings yet
Personalized Learning Activity
4 pages
Lecture-1-Introduction To Natural Language Processing-2021
No ratings yet
Lecture-1-Introduction To Natural Language Processing-2021
46 pages
Audio Transcription Instruction (Praat)
No ratings yet
Audio Transcription Instruction (Praat)
16 pages
Study On NLP Applications and Ambiguity Problems
No ratings yet
Study On NLP Applications and Ambiguity Problems
14 pages
Survey Paper 2
No ratings yet
Survey Paper 2
31 pages
(IJCST-V11I4P14) :DR Arzoo
No ratings yet
(IJCST-V11I4P14) :DR Arzoo
4 pages
An In-Depth Exploration of Natural Language Processing: Evolution, Applications, and Future Directions
100% (8)
An In-Depth Exploration of Natural Language Processing: Evolution, Applications, and Future Directions
5 pages
Grade 9 Term 2 Lesson WEEK 4&5 OF 2024
No ratings yet
Grade 9 Term 2 Lesson WEEK 4&5 OF 2024
2 pages
NLP Unit I
No ratings yet
NLP Unit I
30 pages
Speech and Natural Language Processing: Data, The Same May Be Assumed and State The Assumption Made in The Answer
No ratings yet
Speech and Natural Language Processing: Data, The Same May Be Assumed and State The Assumption Made in The Answer
2 pages
Unit 1 2 3 4 5 NLP Notes Merged
100% (1)
Unit 1 2 3 4 5 NLP Notes Merged
105 pages
Enclosure No. 4 School-Based INSET Plan
No ratings yet
Enclosure No. 4 School-Based INSET Plan
7 pages
CL Unit 1
No ratings yet
CL Unit 1
11 pages
Unit I Inroduction
No ratings yet
Unit I Inroduction
52 pages
Study On Ambiguity and NLP Application
No ratings yet
Study On Ambiguity and NLP Application
14 pages
NLP Paper
No ratings yet
NLP Paper
5 pages
Introduction To NLP: Prof: Vraj M Hingu Dept: Computer
No ratings yet
Introduction To NLP: Prof: Vraj M Hingu Dept: Computer
87 pages
AI Unit 5
No ratings yet
AI Unit 5
10 pages
Chapter 1
No ratings yet
Chapter 1
5 pages
70 410 Practice Test Free
No ratings yet
70 410 Practice Test Free
1 page
Unit 1 Extra
No ratings yet
Unit 1 Extra
6 pages
Text Analysis Based On Natural Language Processing NLP
No ratings yet
Text Analysis Based On Natural Language Processing NLP
5 pages
Introduction To Natural Language Processing NLP
No ratings yet
Introduction To Natural Language Processing NLP
9 pages
NLP Question Bank
No ratings yet
NLP Question Bank
27 pages
Kiran 2019
No ratings yet
Kiran 2019
4 pages
What Is NLP?: Natural Language Processing in AI
No ratings yet
What Is NLP?: Natural Language Processing in AI
5 pages
2010 Morph Generator
No ratings yet
2010 Morph Generator
7 pages
A Survey NLP Natural Language Processing and Trans
No ratings yet
A Survey NLP Natural Language Processing and Trans
12 pages
Harambe University
No ratings yet
Harambe University
8 pages
Parulrrst2011 9091 25181 1 PB
No ratings yet
Parulrrst2011 9091 25181 1 PB
4 pages
Language Detection Using Natural Language Processing
No ratings yet
Language Detection Using Natural Language Processing
7 pages
Module 1 Part1 NLP
No ratings yet
Module 1 Part1 NLP
24 pages
Application of NLP in Big Data
No ratings yet
Application of NLP in Big Data
10 pages
NLP M1 Students
No ratings yet
NLP M1 Students
17 pages
Sinhala Language Corpora and Stopwords From A Decade of Sri Lankan Facebook
No ratings yet
Sinhala Language Corpora and Stopwords From A Decade of Sri Lankan Facebook
11 pages
PDF142
No ratings yet
PDF142
6 pages
pxc3900006
No ratings yet
pxc3900006
6 pages
International Journal of Computer Science and Informatics International Journal of Computer Science and Informatics
No ratings yet
International Journal of Computer Science and Informatics International Journal of Computer Science and Informatics
7 pages
Abstract PramudiRajamanthri
No ratings yet
Abstract PramudiRajamanthri
2 pages
NLP Module1-4
No ratings yet
NLP Module1-4
100 pages
Intelligent Digitalization of The Sinhala Form Templates
No ratings yet
Intelligent Digitalization of The Sinhala Form Templates
7 pages
NLPofSinhala Sgallege
No ratings yet
NLPofSinhala Sgallege
6 pages
DLL - Bartending August 19-23, 2024
No ratings yet
DLL - Bartending August 19-23, 2024
4 pages
Module1 Chapter1
No ratings yet
Module1 Chapter1
23 pages
NLP Assignment 1
No ratings yet
NLP Assignment 1
4 pages

AComputationalgrammarof Sinhala

Uploaded by

AComputationalgrammarof Sinhala

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

A Computational Grammar of Sinhala

Conference Paper · March 2012

Chamila Liyanage Randil Pushpananda

SEE PROFILE SEE PROFILE

Dulip Herath Ruvan Weerasinghe

SEE PROFILE SEE PROFILE

Automatic Speech Recognition for Sinhala View project

The user has requested enhancement of the downloaded file.

Chamila Liyanage1, Randil Pushpananda1, Dulip Lakmal Herath2,

Abstract. A Computational Grammar for a language is a very useful resource

Keywords: Natural Language Processing, Context Free Grammar, Sinhala

Processing, Machine Translation, Information Retrieval, Text Summarization among

i. තාත්තා | මල්ලීට | ෙකෝටුවකින් | ගැසුෙව් ය.

Sinhala is a head-final language, in which the complements and modifiers appear

(NP) ගෙම් මිනිස්සු

(ADJP) ෙබොෙහොම ලස්සන

(VP) ෙසමින් කියවයි

3.1 Noun Phrase

Det Deg ADJ N

Fig. 2. structure of the NP in Sinhala

In the traditional grammar of Sinhala, nama visheshana (adjectives) denote some

3.2 Verb Phrase

NP ADVP Deg ADV V

Fig. 3. structure of the VP in Sinhala

4 Grammatical Features of Sinhala

4.1 Grammatical Features of the NP

Table 1. Examples for inflections of animate common nouns

Form Case Singular

Table 2. Examples for inflections of inanimate common nouns

/gas‫ۑ‬/ /gasak/ /gas/

4.2 Grammatical Features of the VP

Tense Number Gender Person Volition

Past Non-past Singular Plural 1st 2nd 3rd True False

Masculine Feminine Neuter

5 The Sinhala CFG

---------------------------- S expansion productions--------------------------------

6 Evaluation and Results

The test results are shown below.

Non verbal sentences

8 Conclusion and Future Work

Acknowledgment. We are grateful to all the members of Language Technology

View publication stats

You might also like