PP 2 Man
PP 2 Man
PP 2 Man
PLATFORM 2
The Official Manual
J. R. Bhaddacak
Version 2.0
Copyright © 2023 J. R. Bhaddacak
Release History
Version Built on Description
2.0 10 Nov 2022 First release, bundled with the program
Preface
iii
learn, and I have already provided some helps in several places.
Still, the users need to spend time playing with the program
to gain familiarity.
I divide the manual into five parts. The first part, Essen-
tial Starter, is crucial. It is supposed to be read deliberately.
The main purpose of this part is to help the users start the
program successfully in various contexts, and to provide an
initial guidance and troubleshooting.
The second part is all about grammatical tools. It is enough
to just go through the part quickly and come back when neces-
sary. Several chapters are short. The longest one is Chapter
9 (Prosody), which needs an elaborate treatment.
The third part is about the Pāli collection. You will learn
how to find a document and open it, how to use the viewer,
and how to deal with the overall term list. This part is more
substantial than the previous one and each chapter is not long.
So, it should be read carefully.
The fourth part is about advanced search tools. It has one
big chapter about Lucene Finder. This chapter needs a care-
ful read because of the complexity of its functions. Another
chapter, about Tokenizer, is short, not because it is simple, but
rather it requires the understanding of the previous chapter.
The tool itself is quite complicated and needs an exploration
by the users themselves.
The fifth part is the rest of the all above. You will learn var-
ious tools, like the program’s text editor, batch script trans-
former, and more importantly the text Reader. The Reader,
together with its companion tool, Sentence Manager, is an in-
novative tool that can help the learners read Pāli texts more
conveniently. Furthermore, translations can be added to the
texts at sentence level by this set of tools.
The last chapter is a short treatment of regular expression.
Because I add this search function to the program in various
places, some guidance is needed. For the topic itself is big
and beyond our main concern, what I can do is just a survival
introduction.
My target readers of this manual are those who want to
make use of Pāli Platform to its full capacity, both for learn-
ing and researching purpose. Some basic knowledge of Pāli is
helpful, particularly the terminology used in the field. For the
fundamental of the language, see the books mentioned. Users
iv
Preface
3 I usually work 5–6 hours a day, no weekend. I use only one day in a week
to connect to the Internet, mostly for updating information and searching for
needed materials. Yet, I still work 2–3 hours that day. I choose to make this
manual with LATEX, despite its laborious process of writing, because it looks
authoritative and it is easier and more pleasurable to read.
4 jakratep at gmail dot com
v
Contents
Preface iii
Contents vii
List of Tables x
List of Figures xi
I. Essential Starter 1
vii
Contents
3. Dictionaries 24
4. Letters 29
5. Declension table 31
5.1. Pronouns . . . . . . . . . . . . . . . . . . . . . . . 31
5.2. Nouns/Adjectives . . . . . . . . . . . . . . . . . . 33
5.3. Numbers . . . . . . . . . . . . . . . . . . . . . . . 34
6. Verbs 36
7. Conjugation table 38
8. Roots 41
9. Prosody 43
9.1. A survival introduction to Pāli prosody . . . . . . 44
9.2. Two types of prosodic patterns . . . . . . . . . . 45
9.3. Verse types of mattāvutti . . . . . . . . . . . . . . 48
9.4. Verse types of vaññavutti . . . . . . . . . . . . . . 53
viii
Contents
15. Tokenizer 91
V. Miscellaneous Tools 94
Colophon 114
ix
List of Tables
x
List of Figures
xi
List of Figures
xii
List of Figures
xiii
Part I.
Essential Starter
1
Begin at the beginning
1
In this starting chapter, I will tell you a short story of the de-
velopment of the program. At the end, we will learn how to
make it run in your computer.
3
1. Begin at the beginning
the time. Another reason was the Swing UI is really well-documented, easy
to learn and use.
4
1.2. Why I take Pāli seriously?
5
1. Begin at the beginning
6
1.4. How to run the program
7
1. Begin at the beginning
Now you are ready to run the program. The principle is sim-
ple: The Java Virtual Machine (JVM), version 11 or newer,
will pick up the executable and run it. The JVM is a part of
Java Development Kit (JDK). For the users, you do not need
the whole JDK, just the part called Java Runtime Environ-
ment (JRE) is enough to run the program. The process can
be different from platform to platform as shown below. The
8
1.4. How to run the program
1.4.1. Windows
Even though my developing environment is Linux, I make it
easy to run in Windows (7 or newer). For Windows, I propose
three use contexts:
(1) Using the bundled JRE This is the easiest way. The required
JRE is already shipped with the program under jre folder.
The only action you need is double-clicking PaliPlatform2.exe
to make the program start. That is all. You also can make a
desktop shortcut from this exe file.
9
1. Begin at the beginning
(3) Using manual method This approach does not use the exe file
but use the bundled JRE, and you have to do it by hand. First
open a console terminal (command prompt)7 at the program’s
root, then enter this:
» jre\bin\java -p .;lib -m paliplatform/paliplatform.PaliPlatform
If you have a full JRE installed, you can use java directly,
hence:
» java -p .;lib -m paliplatform/paliplatform.PaliPlatform
1.4.2. GNU/Linux
This is my beloved operating system (OS). Although the num-
ber of Linux’s users is less than Windows and macOS, Linux
is very important OS nowadays. It liberates us. For Linux’s
users, mostly power users, it looks trivial to tell what to do to
run a Java program. So, I will leave out some details.
(2) Using a custom JRE If you do not want to install Java into
your system, you can just download an archive (zip or gz) ver-
sion and unpack it to the program’s directory. Rename it to
jrefx. Then type this launcher script:
» ./runfx.sh
7 In Windows 10, you can open a terminal at any place in File Explorer
by using the File menu. In Windows 7, I find it more difficult. You have to
search for command prompt or cmd and open it, or hit WINDOWS-R and enter cmd, then
make your way to the target directory by dir command.
8 If you already have a higher version of Java installed, just openjfx is
enough.
10
1.4. How to run the program
1.4.3. macOS
11
1. Begin at the beginning
(2) Using a custom JREIf you do not want to mess up with the
system, you can download an archive package instead and un-
pack it in the program’s directory. Rename it to jrefx, then
type this:
» ./runfx.sh
12
1.6. When things go wrong
Even though the program was well-tested, many bugs are still
waiting. They can show up when encountering unexpected
use cases. If the users find that the program behaves in an
unusual way or even crashes, please report the bugs.
To see error messages fired by the JVM, you have to run
the program with a console, i.e., you have to run the program
manually and leave the terminal opened. Error messages may
look unintelligible to the users, as shown in Figure 1.4, but
they indeed give useful information to trace the bugs’ causes.
When this occurs to you, please record the symptom and
save the message, then send it to me.10 I will find the causes
and fix the bugs.
13
1. Begin at the beginning
(2) Azul Zulu JDK Azul Systems also has JRE with JavaFX (32-
and 64-bit) for many platforms, but few have installers. I used
JRE from this vendor in the program’s bundle. When down-
loading, select JRE FX.
https://www.azul.com/downloads/?version=java-11-lts ¬
&package=jre-fx
(3) Eclipse Temurin JDK This provider does not provide JavaFX,
just JRE. This may be suitable for a competent user who knows
how to combine the two components together. (It is not that
difficult, particularly when you use Linux. See the manual
method described above.)
https://adoptium.net/temurin/releases/?version=11
14
1.7. Download links
(4) Gluon’s JavaFX This is the official site of JavaFX. You rarely
need this, except if you want the newest version of it (really
unnecessary for us). When downloading, select SDK binary
package, not jmods.
https://gluonhq.com/products/javafx
15
Basic operations and settings
2
Now, I suppose that the reader can run the program success-
fully and get the first screen. Unlike Pāli Platform 1 which
integrates all working areas in one window, this version uses
multiple windows approach. So, you will see many windows
doing various jobs. There are two kinds of window: single-
ton and multi-instance. The former has only one instance in
the program’s lifetime, while the later can have multiple in-
stances. Let us start with the main window.
16
2.2. Common tool bar
When you change theme by the tool bar, it changes only that
instance. If you click this button in the main tabs, the whole
main window is changed. This change is not persistent. If you
want permanent theme change, use menu Options>Global theme
instead. Other buttons are intuitively understandable, but
read more about font in its section below. Another common
button you can see here and there is help (Question-Circle) button. When
clicked, the related help of that part will show up.
17
2. Basic operations and settings
18
2.4. Pāli input
move them away). Then, the program will use a system font
instead as a fallback, currently set to Sans.
19
2. Basic operations and settings
STICKY-NOTE
Note: All user’s preferences, such as the
global theme, bookmarks, and those in Settings,
are saved in an external property file, named
PaliPlatform2.properties. This file may exist in
the user’s home directory or the program’s root or
somewhere else depending on how the program is
invoked. If you mess up with the settings, you can
return to the original state by just deleting the property
file and restart the program.
20
2.5. Minor concerns
(2) Context menus Context menus are menus that pop up when
you right-click (or left-click in some places) at a certain ob-
ject. These menus are context-dependent. You have to test
and learn by yourself. In many cases, context menus provide
more solid functions, than drag-and-drop does.
21
2. Basic operations and settings
(4) Tooltips When you are stuck in the using, try Question-Circle button
first. If the help is not available, try reading tooltips by hover-
ing the mouse over an unknown button. Tooltips can be seen
in other places as well, such as in a list or table that has trun-
cated data entries.
22
Part II.
Grammatical Tools
23
Dictionaries
3
We will start with grammatical tools, because they are not
complex and rather easy to use and understand. All of these
tools can be invoked by menu Grammar (Figure 3.1). Only Dic-
tionaries has its button (BOOK) in the main tool bar.
24
Figure 3.2.: Dictionaries window
25
3. Dictionaries
1. Simple search
2. Wildcard search
Sometimes you have a vague idea what the word looks like.
You can use wildcard mode by check at Use */?. In this mode
you can use ? to stand for any one character and * to stand for
any characters (including none). Figure 3.4 shows the result
of searching ‘*mutt?’. Remember that in wildcard mode the
result does not come up immediately. You have to press Search
button or hit the Enter key.
As shown in the picture, * can mean zero character, and
? always represents one character. So, we also see mutta,
muttā, and mutti in the result list. In the result of CPED,
if the selected term is declinable (either noun or adj.), we will
see Show declension . When we click on this box-button, a
declension window of that term will be opened.
26
Figure 3.4.: Dictionaries’ wildcard search
3. Meaning search
Yet sometimes you have no idea what the word should be. In
this case, searching in dictionaries’ meanings can help. Fig-
ure 3.5 shows the result of searching ‘emancipate.’ In this
mode, using wildcards is not allowed, and do not forget to press
Search or hit Enter to submit the query.
There is a technical limitation in highlighting the search
result, when you search in meanings. Also, you cannot search
further in the display result. To do this, you have to open the
result in a text editor, and do the search there. This can be
done by hitting PENCIL-ALT button in the tool bar. Then the result will
be opened in the program’s editor. Then the users can use
search function in the editor, as shown in Figure 3.6.
27
3. Dictionaries
28
4
Letters
29
4. Letters
30
Declension table
5
This is one of the most useful tool for new learners. Tradi-
tional students have to remember many of these tables. While
remembering some key paradigms is still important in learn-
ing process, this tool can enhance your ability to search and
experiment in just a few clicks. The Declension window can
be opened by menu Grammar>Declension table. The results of
Declension table can be grouped into three: nouns and adjec-
tives, pronouns, and numbers.
5.1. Pronouns
If you are new to this window, I suggest you select Pronouns
first because the pronoun list is finite. You will see some-
thing like Figure 5.1. The result of declension tables will show
one gender at a time. If there are multiple genders to be de-
clined, you can choose whether m (masculine), f (feminine), or
nt (neuter). Short meaning of each pronoun can also be added
when this option is selected.
For new students, the expansion of abbreviations used in
the table, as shown in Table 5.1, may do some help.
An amazing feature of this tool is that you can check the de-
clined terms against term list in the collection. When LIST but-
ton is hit, you will see the right pane opened (see Figure 5.2).
This may take a little time in the first load. In the list, terms
31
5. Declension table
32
5.2. Nouns/Adjectives
5.2. Nouns/Adjectives
Nouns and adjectives in Pāli undergo the same inflectional
rules, and sometimes a word takes dual position. So, I group
them together. Since terms in this group are numerous, you
have to select what you want by entering some query. Only
simple search is available here. The source of terms is the Con-
cise Pāli-English Dictionary (CPED). An example of noun’s de-
clension is shown in Figure 5.3.
33
5. Declension table
What about the Compute button then? It has two uses. First,
when a term has both regular and irregular declensions. And
second, just when you want to play around with experiments.
I will give an example of the first case. For the second, the
users can do by themselves. There are many words that use
irregular paradigms, and I put a lot of effort to incorporate
them here seamlessly.1
A term I give here is sara. When it means ‘pond’ it declines
irregularly as mano-gaṇa (m. only). This is what is shown
when you enter ‘sara’ at the search box. But when it means
‘sound’ or ‘arrow’ it declines like normal nouns (m. and nt.). To
reach the latter case you have to hit Compute after you enter
‘sara’. See the comparison in Figure 5.4.
For those who want to experiment with generic paradigms,
you simply input a word with a proper ending and hit Compute.
The proper endings are a, i, ī, u, ū for masculine words, ā, i, ī,
u, ū for feminine words, and a, i, u for neuter words. There are
also two exceptions which use their special paradigms: -ant
and -ar.
5.3. Numbers
Numeral declension is one of the most difficult parts of the pro-
gram to implement.2 All output in this part uses calculation.
That means you can enter any number up to 6 digits, and it
will be converted to its Pāli numeral3 with declension tables,
both cardinal and ordinal. To help the beginners, however,
there are pre-built lists that can be easily selected. Figure 5.5
shows the result of 100250.
I will skip other details because everything is self-explained
and it is better to learn by playing.
1 If you find any irregular word that declines wrongly, according to the tra-
ules are Tokenizer, Pāli Text Reader & Sentence Manager, and the viewer of
Pāli documents including script transformers.
3 The result list is not exhaustive, because one complex number can be ren-
dered in many ways. The results here show only some compound units that
can be calculated by the computer. Try 150, 250, 350, and so on; you will see
why the conversion is so difficult.
34
5.3. Numbers
35
Verbs
6
Verbs listed here come from CPED. Fortunately, the dictio-
nary has already provided us the composition of verbs and
their related forms. New students can learn a lot just by going
through these lists.
We have only two main options: to see main (canonical) verb
form (singular, present, third-person, active), typically those
ending with -ti, and the rest. In main verbs, you can also filter
the list by the verbs themselves, or their root and prefix, or
their ending component (paccaya)1 , or their meaning. The use
is straightforward. Figure 6.1 shows a search result.
Figure 6.1.: Result of main verbs when search ‘vim’ and select
vimuccati
1 The name of roots and paccayas can be slightly different from Kac-
cāyana/Saddanīti’s convention.
36
In other verb forms, you have to choose the group you want
(see Figure 6.2). The names of verb forms here come directly
from the dictionary. So, some name may look unfamiliar, for
example, Potential Participle is called commonly by other text-
books as Future Passive Participle.
37
Conjugation table
7
Rendering verb forms is a formidable task for new students be-
cause there are many things to remember, as well as to know
how to put them together. Having a good tool that can provide
a good number of examples can speed up the learning process.
Unlike a declension table that can be created upon any declin-
able word, a conjugation table cannot be created for any root
because of the lack of complete uniformity.
With this variety of verb formation, we can do, at best, only
by providing some typical verb samples, as shown in the left-
sided list. And only a handful of verbs that have wide range of
forms covering most of tenses and moods, for example, pacati
(paca), gacchati (gamu), and bhavati (bhū). That is why the
program starts with pacati.1
There are two main modes in this module: main and deriva-
tion. Main verb formation is about ākhyāta. So, conjugation
makes sense only for main verb forms. Derivative verbs have
no conjugation, because they undergo different rules resulting
in different verb forms. Most of derivative verbs are declin-
able. So, we will see declension tables as a result of derivation
mode.
Like declension tables, we can check the words rendered
against an internal term list by opening the right pane (LIST
1 Rendering paca is straightforward and clean, having no weird variation.
38
button). Opening the right pane can make rendering a new
verb sluggish, so use it when necessary.
Rendering a conjugation table has additional two options (Check-double
button): (1) whether Attanopada (middle voice) will be shown,
(2) whether Augment (a-)2 will be added. Figure 7.1 shows
aorist forms of pacati in full options.
39
7. Conjugation table
English Pāli
Present tense Vattamānā
Imperative mood Pañcamī
Optative mood Sattamī
Perfect tense Parokkhā
Imperfect tense Hiyyattanī
Aorist tense Ajjatanī
Future tense Bhavissanti
Conditional mood Kālātipatti
40
Roots
8
In the traditional way of learning, to know a verb means to
know its root. So, it is really important to identify a root when
we encounter a verb. Learning Pāli roots, unfortunately, is not
an easy task because different schools name roots differently.
Furthermore, some roots can behave strangely in certain con-
texts. So, in modern way of learning Pāli, knowing roots is
less pressing. But knowing them still brings great benefits.
Roots listed here come from the compilation of Ven. U Silana-
nda in Pali Roots in Saddanīti Dhātu-Mālā compared with
Pāṇinīya-Dhātupāṭha.1 So, the roots are familiar to those who
follow Kaccāyana/Saddanīti school.
Things the users can do with the roots are: you can select
only a specific verb group, you can search in the roots’ name, in
Pāli meaning, and in English meaning. Figure 8.1 show Roots
window with an option opened.
In some entries, you can see an asterisk (*) mark in the root’s
name. In some, you can also see double asterisks (**) in the
Pāli meaning. These marks tell us that there is a comment in
the entry. You can see the comment by clicking the row in the
table. Comments are directly taken from the original work.
So, some of them can be difficult to understand exactly.2
1 Edited by U Nandisena, available at https://archive.org/details/
ThePaliRootsInSaddaniti
2 Smith mentioned by comments is the publication of Saddanīti edited by
41
8. Roots
(Dhātumālā) of that work. The 5 volumes of the first edition can be found at
https://archive.org/details/SaddanitiAggavamsasPaliGrammar01 (to 05).
42
Prosody
9
Pāli prosody, or chanda, is a big, difficult topic to learn. I am
not keen on this subject, nor I am a poetry enthusiast. I see it
from an engineering point of view that if we have a good tool
we can learn this hard topic easier, or even with fun. There are
several things to learn before you can use this tool effectively.
When it is first opened, it shows just a list of prosodic types.
When you click on a row, you just see a bizarre formula at the
bottom border (see Figure 9.1).
43
9. Prosody
word/search.html?q=prosody
3 Many textbooks on the subject use ` for lahu and – for garu, or vice versa
(see Vut.7). I find this very confusing, so I avoid using these symbols.
4 The symbol 2 has double meaning. It means garu in the analyzed result
(as we shall see shortly). And if the symbol appears in a formula, it means
any combination that give 2 measures long, i.e., both ll (1+1) and g (2) can be
44
9.2. Two types of prosodic patterns
nant (which does not belong to the next syllable)?” Put it bluntly, there is no
such a thing. You may see tad here and there in Pāli books, but not in the
traditional textbooks. We will not see tad stands alone. It is taṃ with d sub-
stitution as a result of word-joining (Sandhi). So, that syllable may be either
of lahu or garu type depending on whether the d is doubled or not.
45
9. Prosody
Table 9.1.
46
9.2. Two types of prosodic patterns
47
9. Prosody
Symbol Meaning
Vertical bar (|) logical OR
Exclamation mark (!) logical NOT
Semicolon (;) line separator
Hyphen (-) mere separator
We will see how logical OR and NOT work when I put a real
formula into explanation. The line separator (;), if present,
tells us that the pattern needs 2 lines of verse to make the
analysis complete. If the input passage is not long enough, or
does not have 2 lines as required, incomplete will be shown
in the status bar. But, if the analyzed text has more syllables
than the formula requires, ‘over-required’ will be shown in-
stead. Hyphen (-) is just a separator, having no meaning. It
only makes formulas easier to read.
48
9.3. Verse types of mattāvutti
below.
49
9. Prosody
!j-4-!j-4-!j-j|n-!j-g; (30)
!j-4-!j-4-!j-l-!j-g (27)
The pattern needs 2 lines (see ;). The first line is read
NOT j, 4, NOT j, 4, NOT j, j OR n, NOT j, g. The second
is read NOT j, 4, NOT j, 4, NOT j, l, NOT j, g. As shown
in Table 9.2, 4 means any of meter groups is applicable, NOT j
means any meter group but j, and j OR n means either j or n
is valid.
Let us see a real example. This stanza is the first one of
Ganthārambhakathā in Dīghanikāya-aṭṭhakathā.
Karuṇāsītalahadayaṃ, paññāpajjotavihatamohatamaṃ;
112-211-112, 22-22-1111-211-2;
s-b-s, m-m-n-b-g;
Sanarāmaralokagaruṃ, vande sugataṃ gativimuttaṃ.
112-112-112, 22 112 1-112-2.
s-s-s, m s l-s-g.
50
9.3. Verse types of mattāvutti
tions, or extra considerations, to the rules. I do not use that approach in the
program. We just find the closest candidates and regard the verse as it is.
I think this is the normal way how poetry is composed. To achieve certain
artistic result, breaking some rules seems natural. Of course, there can be
cases of half-baked poetry that just ignores or plays with rules. And, finally,
there can be many cases that rules are held so firmly that the meter trumps
the clarity of the words used—words are oddly changed to conform with rules.
51
9. Prosody
of those.
Generally, metrical pause has two kinds: First, the pause af-
ter each foot ends. This is common in recitation and writing.
When you read a verse, you are supposed to pause at points to
make a sense of rhythm. And it is natural to pause at the end
of a foot, probably a small pause after the odd feet, a bigger
one after the first line (first even foot), and the biggest one af-
ter the second line (last even foot). The second kind of pause is
rarer, the pause inside the feet. Some verse types have rules
about pauses, particularly the Ariyā group. Pauses at the end-
ings is easy to see because punctuation marks can be used in
text. But in-between pauses cannot be shown in the program,
because this kind of pause makes sense only when the verse
is recited.11
Another limitation is the program cannot detect the spilled-
over effect: a word is cut into two and both parts are split
across feet. This aspect differentiates Vipulā (having a spilled-
over) from Paṭhayā (no spilled-over).12 As a result, the pro-
11 The rules concerning metrical pause are not easy to understand, so I skip
the explanation.
12 From Vut.21, “Yattha gaṇattaya mullaṅ-ghi, Yo’bhayatthā’dimo bhave vip-
ulā.” According to the formula, the word mullaṅghi has to be split so that the
-ghi part belongs to the latter foot.
52
9.4. Verse types of vaññavutti
gram can not tell the two verse types apart. They are grouped
together under Ariyā.13
It should be noted that Vetālīya (Vut.29) has a particular
rule: “No successive 6 lahus are allowed in the even feet.” This
means in its formula, 6-R-l-g-8-R-l-g, the 8 in the second
part does not allow neither 1111112 nor 2111111. This rule
has been already implemented in the latest program.
Finally in this category, Pādākulaka (Vut.44) is an arbitrary
combination of Mattāsamaka group. It has no fixed pattern to
detect, so it is left out from analysis (no ID).14
internally in the program. The users can use the numbers for sorting the
table, by clicking the column headers. To refer to a particular verse type,
using a reference to Vuttodaya, if any, is more reliable.
53
9. Prosody
54
9.4. Verse types of vaññavutti
55
9. Prosody
56
9.4. Verse types of vaññavutti
57
9. Prosody
you copy the whole line and analyze it, ‘over-required’ will
show. You have to cut the line by using edit mode (PEN-FANCY button),
then you will get the correct result. In semi-symmetry and
asymmetry groups, the formulas are given for one whole line,
so you do not have to do likewise in these groups.
There are some other notes the users should know. First,
Upajāti (Vut.65) is a mixed-up of Indavajira (Vut.63) and Up-
endavajira (Vut.64), or sometimes from other types. So, it has
no fixed pattern to detect, and left out from analysis (no ID).
Second, Sasikalā (Vut.93) and Maṇiguṇanikara (Vut.94) use
the same formula (14 lahus plus 1 garu) but the latter has two
pauses, after the 8th syllable and the last one. These two are
not distinguishable in the program, even if they are different.
And the last note, verse types ID 113 (Takāravipulā) to 124
(Tatiyajakāravipulā) have no references in Vuttodaya. These
types follow the logic of the previous Bhakāravipulā (Vut.124),
Rakāravipulā (Vut.125), and Nakāravipulā (Vut.126).35 I add
them here because they are easy to implement the analysis.
There are minor things I have not mentioned. The users
should play around and find out how things work. Once you
know this kind of tool exists, you can do research in Pāli prosody
easier than before. For metrical composition, this tool can be
your test bench. With edit mode mentioned above, you can
compose a short verse in real time. Furthermore, the program
also provides search by meter to help the users find a matched
word to a pettern required. We will talk about this in the re-
lated modules.
58
Part III.
Pāli Collection
59
Browsing and bookmarking
10
Before we learn how to access to a Pāli document in our col-
lection, it is better to be familiar with the collection first.
There are two kinds of documents used in the program: CSCD
and the Extra. The former are those bundled with the pro-
gram, whereas the latter should be empty at the first run.
The Chaṭṭha Saṅgāyana CD (CSCD)1 is the best and most
complete collection of Pāli literature nowadays.2 This is the
main corpus we use in Pāli studies. That is to say, Pāli Platform
is a one-stop package. You have everything essential for Pāli
learning in one place.
The Extra is a collection outside CSCD. It is just a directory
that can be set by the user (in General Settings).3 Two formats
are recognized as a document: XML (conformed to CSCD), and
plain text (with .txt extension). Once you have documents in
the Extra, you can open them in the program’s viewer, and you
can analyze them with Tokenizer. When documents are added
to the Extra while the program is running, UPLOAD button in TOC
Tree (see below) has to be pressed to make them visible.
A document in the collection can be accessed directly by TOC
1 Distributed by Vipassana Research Institute (VRI), tipitaka.org
2 In our application, Roman script is used as the base text.And its encoding
is changed from UTF-16 to UTF-8. All documents are structured in XML. All
files, including TOC files, are packed into one zip file, named romn_utf8.zip
in directory data/collection.
3 By default, it is set to data/collection/extra.
60
Tree, either its tab in the main window or a newly opened
TOC Tree window (using the Collection menu or the main
tool bar), as shown in Figure 10.1.
61
10. Browsing and bookmarking
when we prepare text for Pāli Text Reader (see Chapter 18).
From the context menu, you can also bookmark the docu-
ment as well as add it to Tokenizer. These two actions can
also be done by drag-and-drop. There are some buttons pro-
vided in the tool bar corresponding to the functions mentioned
above. Please check these by yourselves, as well as those un-
mentioned.
Adding documents to Tokenizer can be done with a whole
bunch of texts in a tree node by using GRIP-HORIZONTAL button in the tool bar,
because the context menu is available only at text level not the
higher levels.
In the main window, when BOOKMARK button is pressed, or it is se-
lected by the menu, Bookmarks window will show (see Figure
10.3). This window is a singleton. All documents you book-
mark in TOC Tree, or somewhere else, will appear here. Ac-
tions available, both by the context menu or the tool bar, are
similar to those of TOC Tree.
Bookmarks window is simple, so I will leave it to the users
62
Figure 10.3.: Bookmarks window
to find out what else it can do. There are some entries that are
preset in the bookmarks, two important documents difficult to
find by new learners. If you lose the preset, you can bring it
back by deleting the program’s property file (PaliPlatform2.
property) or move it away and restart the program.
63
Document Finder
11
Most new students of Pāli or Buddhism are not familiar with
the structure of the Pāli canon. Finding a text by navigat-
ing TOC Tree can be difficult. That is why Document Finder
comes in. With this tool, you can find relevant documents by
entering a query. If you know a certain name or a text portion,
you can find it more quickly than using TOC Tree. The tool is
a part of the main tabs, and you can open it as many as you
wish by clicking SEARCH button in the main tool bar, or selecting it
in the Collection menu.
What you should know first is there are two main kinds of
search here: heading search and content search. In head-
ing search, there are three fields available for searching: text
name, book name, and group name. What is counted as text,
or book, or group, is not exactly systematic. Those names come
from the organization of CSCD. They correspond with the en-
tries shown in TOC Tree. So, you may not find the document
you need by entering your familiar text name.
If heading search fails, you can resort to content search. In
this mode, full-text search will be applied, and the documents
containing the query will be listed.
The search modes mentioned can be selected in one place,
Check-double button. Figure 11.1 shows an attempt to find any document
in Dhammapada by searching in book name.
In heading search, an asterisk (*) can be used as a wildcard.
It can be used at any position (adding it to the last position is
64
Figure 11.1.: Searching Dhammapada in Document Finder
65
11. Document Finder
66
Document viewer
12
After we know how to access to a document, in this chapter we
will learn about the viewer. As mentioned earlier, we can view
a document in two formats, XML or text. Here we focus only
on the viewing of XML documents. I call this tool Pāli HTML
Viewer.1
When a document is opened, either by a context menu or File-Alt
button in a tool bar, the viewer will show up. In its full form,
the viewer looks like Figure 12.1.
There are main three panes: (1) the center, always present,
displaying the text, (2) the right pane, opened by default, show-
ing the text’s information and related documents, and (3) the
left pane, hidden by default, used for navigation. The right
and left pane can be turned on and off by the three buttons in
the tool bar.
The information shown in the right pane is obvious. You can
also open further the related documents, if any, by using con-
text menu (right click). The related documents are the texts
that hierarchically related to the opened text. For example, if
you open a Mūla (main) text, the related texts in Aṭṭhakathā
(commentaries) will show. If you open a commentary, you will
also see links to the related texts in Ṭīkā (subcommentaries)
1 Technically speaking, the document is converted from XML to HTML us-
ing SGML transformation, then opened in the HTML viewer. The transforma-
tion has strict rules. So, non-compliant XML files cannot be viewed correctly
by the program.
67
12. Document viewer
level.
The navigator in the left pane is quite useful. You can jump
to points in the document according to three criteria: heading
jump (HEADING), paragraph-number jump (¶), and stanza jump (Music).
The last point of jumping is saved. You can return there by
using the last jump button. To understand these you have to
experiment with various set of texts.
When you search a string by pressing Ctrl-F, the search
widget will show at the bottom of the window. There are three
options for searching: case sensitivity, whole word search, and
using regular expression. By default, the search is case-insen-
sitive and not whole-word. The options can be changed by Check-double
button. For a more advanced search, you can use regular ex-
pression (see more in Chapter 20).
When you right-click at a portion, or a selection, of text, a
context menu will show up, as shown at the center of the pic-
ture. This allows us to do certain operations upon the selected
text. The users should explore these by themselves. One ex-
planation, though, Send to Dictionaries means the selected
portion will be copied to the Dictionaries tab in the main win-
dow.
There are some buttons in the tool bar needed an explana-
68
tion. When STICKY-NOTE button is selected, the editorial notes embedded
in the text will show. The notes can be seen in blue text en-
closed with square brackets. When Hashtag button is selected, the
reference points to other publications will show.2
Myanmar edition), and T (the Thai edition). I have no idea what the exact
editions these refer to, and what the numbers are represented.
3 It is likely that you may encounter certain problem when displaying non-
69
12. Document viewer
to other non-Roman.
70
Roman, this matter is not to be concerned because the pro-
gram can read both forms.
And finally also about the transformation of Thai vowel e
and o, as shown in Table 12.1, when Roman script is converted
into Thai, only on form is produced. On the other side, convert-
ing from Thai to Roman can tolerate the variation of input.
Input Output
tve
vho
tve
tve
vho
vho
71
13
Simple Lister
72
CSCD with their number of occurrences (frequency). All to-
kens are also normalized into lowercase letters.
Simple Lister is one of the tabs in the main window. It
can be opened as a separate window by Bars button, or by the
Collection menu. At first, it will show the top-most frequent
terms. We can see the summary by using Σ button. The re-
sult is shown in Figure 13.1. With Check-double button, you can choose
between term and document summary. In the picture, term
summary is shown.
In the result table, each row shows the term, its frequency,
and length. The length is Pāli-sensitive, meaning th, for ex-
ample, has 1 character long. So, you see tattha in the list has
length = 5.
By default, the result is retrieved only 500 rows, ordered by
top frequency. This makes the starting time is very fast. You
can set the number of maximum rows by a drop-down option
in the tool bar, ranging from 50 to 1,000,000 rows. The less
you select the faster you get the result.
The table can be sorted by a specific column. You just click
on the column header you want. But remember that the sort-
ing is done only on the existing data limited by the maximum
73
13. Simple Lister
fact. It is long because the result does not come up right away.
74
Figure 13.2.: Top longest terms in Simple Lister
75
13. Simple Lister
76
Figure 13.5.: Grouping by the first letter in Simple Lister
77
Part IV.
78
14
Lucene Finder
79
14. Lucene Finder
(1) Text group The first option you should consider is which
group of text you will work with. There are several options
available, running from a single set of text to the whole col-
lection. In general, indexing the whole collection is suitable
for most use cases. If you have a particular purpose to work
with a set of text, you can make the index for it and save in a
different directory.
80
14.1. Options for indexing
Once you finish your option choosing, then you hit the Build
button to create the index. The program will ask you for the
output directory. You can create a new one at this step. After
that the program will do its job, and you need to wait. Index-
ing is heavily resources-consuming, so you should not do other
things meanwhile. It will not take long.1 For the whole collec-
1 In my old dual core 32-bit laptop, it takes only 2 minutes for the whole
81
14. Lucene Finder
tion, if you see 2698 (the number of all documents) in the title
bar, the index is successfully built.
2 When a stanza has only 2 or 3 lines, ‘gathalast’ is always the last. The
unused lines are skipped. That is to say, ‘gatha1’ is always present, ‘gatha-
last’ is almost if the stanza has more than one line, and ‘gatha2’ and ‘gatha3’
appear only in long verses.
82
14.3. Lucene simple search
83
14. Lucene Finder
you want to see a lot of them. The main reason is the results
are ranked by scoring, and the most relevant result is shown
first. The method of scoring is the Lucene’s default and mathe-
matically complex. You do not need an explanation. Just keep
in mind, a higher score means more relevant. Sometimes, the
rank of scores does not make sense to you, or even to me, be-
cause we do not know its internal conditions. The output, how-
ever, is mostly reliable.
The text fragments are parts of the result that match the
query, showing with highlight and their context. This is a use-
ful feature, but unfortunately buggy. If you enter full words,
you are likely to see the fragments. If you use wildcards (see
below), it is less likely you will see them.3
Once you have search results, you can open the documents
shown by using context menu (right-click), as shown in the
picture.
84
14.4. Lucene query syntax
the default setting. If you want OR joining, you have to use the OR operator
explicitly.
5 Showing whole lines in the fragments does not work in all cases. It may
work with wildcards and regular expression, but fails with other complex
search schemes.
85
14. Lucene Finder
number specifies “the maximum number of edits allowed.” The value can be
either 0, 1 or 2 (default). I cannot give you any clearer explanation of this.
Just try it yourselves.
86
14.4. Lucene query syntax
87
14. Lucene Finder
a range, some are missing. Some documents do not have numbers at all. This
warns you that not every number is searchable, even if it is seemingly to be
there.
88
14.4. Lucene query syntax
89
14. Lucene Finder
90
15
Tokenizer
If
(1) You can index custom documents in the Extra with Tokenizer
you have your own collection and want to search or make its
term list, you have to use this. By putting your documents in
the Extra and add them to Tokenizer, you can get the term list
and you can search for a document you are looking for.2 Even
though the search function in Tokenizer is not so powerful as
Lucene Finder, it is enough to get the job done.
As we
(2) You can add any document in the collection to Tokenizer
have seen in Lucene Finder’s options for indexing, we cannot
1 As a matter of fact, I wrote Tokenizer before Simple Lister and Lucene
Finder in the hope that I could get rid of Lucene from the program’s library
and reduce the database side. After I finished the module, I realized that I
had made a wrong decision. I thought that after all Simple Lister is needed
for its simplicity and speed and Lucene Finder is needed for its superb search
function.
2 For plain text documents, all tokens are put into the bodytext field.
91
15. Tokenizer
This is a con-
(3) You can create a custom term list in Tokenizer
sequence of the previous item. When you select only a text
group that interests you, and make the term list out of it. You
can use this list in Declension Table and Conjugation Table to
check against the terms produced, alternatively to the whole
term list.
If you
(4) Capitalized terms are analyzed statistically in Tokenizer
take capitalized terms seriously, this can give you more detail
on this matter than Simple Lister or Lucene Finder. It calcu-
lates the percentage of each term found as capitalized. This
may look trivial to other languages, but in Pāli it is informa-
tive. You can know which terms are normally used as sentence
starter. In Tokenizer, you have no option for normalizing cap-
italized terms, because it is always kept as such.3
from this module. For that information, use Simple Lister instead.
92
program as the ‘main Tokenizer.’ We can open it as separate
windows as many as we want, but only the main Tokenizer is
the target of the addition by context menus or tool bars.
This leads us to two ways of adding documents into Tok-
enizer: by context menus (or tool bars) and by drag-and-drop.
There are three places that provide you a document list: TOC
Tree, Bookmarks, and Document Finder. If you use the con-
text menu (right-click) from these to add documents, the ad-
dition will be done in the main window’s Tokenizer tab. But
you can drag-and-drop freely from these three modules to any
Tokenizer opened.
Term filtering has the same function as that in Simple Lis-
ter, so as the field selector in Lucene Finder. Search pane is
not visible by default. You have to click SEARCH button. You can
drag-and-drop a term in the list to the search text field. A
multiple-term query uses logical OR relation (unlike Lucene
Finder which uses AND by default). You cannot use Lucene
syntax here. Only terms in full form are accept as valid query.
No special symbols are used. The search result is similar to
that of Lucene Finder, with slightly different display and op-
tions.4 Figure 15.1 shows Tokenizer window in its full form.
There are many things I have not talked about. I leave them
to the users. You should explore by yourselves what else you
can do. Try right-clicking here and there, and set various op-
tions to see their effect.
not expect the same ranking in both. For Lucene, I do not know exactly. In
Tokenizer, I just simply use logarithmic TF-IDF (see Wikipedia for more in-
formation).
93
Part V.
Miscellaneous Tools
94
16
Pāli Text Editor
95
16. Pāli Text Editor
96
17
Batch Script Transformer
There are some other things you should be aware of. You can
convert from Roman script to any of other scripts, but other
scripts can be converted only to Roman. You can set the out-
put folder by the button provided, otherwise you have to use
97
17. Batch Script Transformer
98
18
Pāli Text Reader
99
18. Pāli Text Reader
sponding sentence. Each sentence has its own hash. A duplication is possible
but very very unlikely. Technically, I use MD5 digest calculation here.
2 By native language, I mean the language or locale of the users recognized
100
single sentence. This can demolish the illusion of one correct
translation.
In the program bundle, I have already added many sen-
tences and several sequences. With these examples, you can
work further by your own. Figure 18.1 shows a sentence in the
reader with translations.
101
18. Pāli Text Reader
words or replace the old ones with it. However, there is no option to turn the
custom dictionary off. You have to comment out the entries when you edit the
file.
4 The two files are hardcodedly located at data/rules/dict.txt and
data/rules/sandhi.txt respectively.
102
Figure 18.2.: Detail mode in Pāli Text Reader
103
18. Pāli Text Reader
which is saved together with the original text. When you edit
a sentence, you just edit this mutable field, not the original.
This means what is shown in the reader is the edit, whether it
is changed or not. You can edit the sentence any way you want
and save it, the original version is intact, so you can restore the
original whenever you need. But remember that, each sen-
tence has only one instance when saved. So, each sentence
has only one edit. When you edit a sentence that has been
edited before, the old edit is gone, the new one is saved. If
you work on edit extensively, it is advisable to save the sen-
tences/sequence in different directories. You can do this by
menu Sentence>Save this sequence as. Once you make an edi-
tion, you have to save it first, otherwise the menu is disabled.
There are many things concerning Pāli grammar that can-
not be implemented here, for example, verb form recognition.
We mostly rely on CPED in this matter. It has a good coverage,
I think. This can pave the way for future research on compu-
tational Pāli.5 As we have gone so far, it is already amazing.
I think that is enough to know to make a smooth start for
using this tool. It is really helpful, despite its complexity. You
have to learn by playing with it: make your own edits and
translations. Remember that you can ruin the given data eas-
ily. But this is not to fear. The original data is always avail-
able, either in the software’s package or in the website.
104
Sentence Manager
19
The creation of this tool is a consequence of Pāli Text Reader.
When operations upon sentences grow complex, a suitable tool
is needed. Before you read this chapter, you must understand
how the reader works (see Chapter 18), and know what I mean
by sentence, sequence, hash, edit, and variant.
Sentence Manager can be opened by Briefcase button or by the
Collection menu. It can also be opened from the reader. The
manager is the most complex tool of all, in terms of its func-
tions and components. I cannot tell and show you everything
here. I will explain only the basic ideas that you should know
when you explore the tool by yourselves. Figure 19.1 shows
the manager on the first open.1
There are three main tabs in the manager: Sentences, Trans-
lation Variants, and Merger. The first two tabs work for one
directory at a time. This means there must by one working
directory, the default is ‘main.’ You can change the working
directory by FOLDER button. The directory contains many sentence
files, several sequence files, and one variant info file.2
When the manager opens a directory, it reads all sentences
and lists them in the table, and show the number of trans-
1 Because there are many sentences to load in the first run, starting Sen-
tence Manager takes time initially. If you open the reader first, the slowness
will be of the reader instead.
2 In the implementation, I use JSON format for sentence and info files.
105
19. Sentence Manager
106
three tabs. In these, you can edit the sentence’s text, add/edit
translations, and see the sentence-sequence statistical rela-
tion. I will not go into the details of these functions.
The Variants tab is far more simpler (see Figure 19.2). All
variants and their information are shown here. You can add a
new variant (+ button) and edit its details. You can also hide
variants you do not want to see in the translations with EYE-SLASH
button. This can make the display less clustered.
If you want to delete all translations under a variant name,
you can delete that variant by Trash button. This causes all sen-
tences containing the variant to get updated. Be careful, you
cannot undo this action and a lot of data can be deleted. Make
sure you have a backup, and you are sober enough at the mo-
ment. If you just want to rename the variant, use Tag button
instead. I leave other minor things in this tab to the users to
find out by themselves.
The last tab of the manager (Figure 19.3) is really useful
practically. It can merge two directories into one with simple
steps.
First, you have to create more than one sentence directories.
You can test this by selected a sequence and save it in a new
directory. Do it again with another sequence. Then you load
these two directories into the Merger, left and right. Now you
107
19. Sentence Manager
108
Quick guide to regular expression
20
It is probable that most users of this program know nothing
about regular expression. If you are not in a computer science
department, or leaning about formal language theory, it is un-
likely that you come across the term. But if you use computer
a lot, particular when you use programs that have advanced
search functions, the chance is good that you meet it.
What is it, then? Regular expression is a technical term,
so it will be distracting if you ask for its meaning. To put it
down to earth, regular expression is an enhancement of wild-
card pattern that can make string matching more effectively.1
Learning about regular expression is not easy. Many books
about it have several hundred pages. So, the topic is really too
big to discuss here.
However, I think we do not need to know all of its functions.
That is the reason I add this chapter to introduce the users
this powerful technique. Once you realize its capacity, you
may need to learn it more. So, in this chapter I will show
you some uses of regular expression in searching. I select only
easy techniques that can be applied to our Pāli search (see Ta-
ble 20.1). Many things not included here may or may not be
used in the program.2 You have to test them yourselves.
1 This is my definition to make it relevant here. Applications of regular ex-
pression are vast in computer science, and lesser in linguistics. To dig further,
see https://en.wikipedia.org/wiki/Regular_expression.
2 Not every technique can be used here because regular expression itself
109
20. Quick guide to regular expression
has several implementations. Even in our program, four parser engines are
used: Java (in Tokenizer, the Editor), JavaScript (in the HTML Viewer), H2
database (in Simple Lister), and Lucene (in Lucene Finder). So, a pattern
used in one place may not work in others, or may need an adjustment.
3 A dot in regex is equivalent to ? in the wildcards.
4 In the Text Editor, you can use this verbatim. But in the HTML Viewer,
you have to double the backslash, so use \\d instead. This is true to all pat-
terns that use backslashes.
5 The word boundary is not Pāli sensitive. So, a non-English character is
also counted as word boundary. For example, you may also find āiti in this
case, if there is such a word.
110
Table 20.1: Some uses of regular expression (contd…)
Pattern Meaning Example Result
\S a non-whitespace eta\Savoca6 etadavoca
[…] any in the class [Tt]ena Tena or tena
dinn[oā] dinno or dinnā
[^…] not in the class dinn[^oā] dinne
? once or not at all i?ti it or iti
* zero or more times i*ti ti or iti or even
iiti, etc.
+ one or more times manas+a manasa or
manassa
{n} exactly n times manas{2}a manassa
{n,} at least n times manas{1,}a manasa or
manassa
or even
manasssa
{n,m} at least n but manas{1,2}a manasa or
not more than m manassa
times
(…) grouping man(as)?o mano or
manaso
(…|…) any in grouping manas(o|sa) manaso or
manassa
^ at the beginning ^attha.* any word start-
ing with attha7
$ at the end .*attha$ any word end-
ing with attha
What you have seen in the table is just a small part of regex
that I think it can be applied to out search. Many other things
seem difficult to use with Pāli, at least in an easy way. For
those who know regex well, you may try group capturing and
back references, for example, using ‘(puna)p\1ṃ’ to find ‘punap-
punaṃ.’ I find little use of this, but it can be useful in some
situations.
6 You may use eta.avoca to yield the same result, but the meaning is dif-
Lister, because when searching in full text, the meaning of ‘start’ and ‘end’ is
context-dependent.
111
About the author
113
Colophon
114