0% found this document useful (0 votes)
217 views222 pages

Babel

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
217 views222 pages

Babel

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 222

Babel

Version 3.83
2022/11/30 Localization and
Javier Bezos
internationalization
Current maintainer

Johannes L. Braams
Original author
Unicode
TEX
pdfTEX
LuaTEX
XeTEX
Contents

I User guide 4
1 The user interface 4
1.1 Monolingual documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Multilingual documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Mostly monolingual documents . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Modifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.6 Plain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.7 Basic language selectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.8 Auxiliary language selectors . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.9 More on selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.10 Shorthands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.11 Package options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.12 The base option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.13 ini files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.14 Selecting fonts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.15 Modifying a language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.16 Creating a language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.17 Digits and counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.18 Dates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.19 Accessing language info . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.20 Hyphenation and line breaking . . . . . . . . . . . . . . . . . . . . . . . . 35
1.21 Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.22 Selection based on BCP 47 tags . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.23 Selecting scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.24 Selecting directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.25 Language attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
1.26 Hooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
1.27 Languages supported by babel with ldf files . . . . . . . . . . . . . . . . . 47
1.28 Unicode character properties in luatex . . . . . . . . . . . . . . . . . . . . 48
1.29 Tweaking some features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
1.30 Tips, workarounds, known issues and notes . . . . . . . . . . . . . . . . . 49
1.31 Current and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
1.32 Tentative and experimental code . . . . . . . . . . . . . . . . . . . . . . . 50

2 Loading languages with language.dat 51


2.1 Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3 The interface between the core of babel and the language definition files 52
3.1 Guidelines for contributed languages . . . . . . . . . . . . . . . . . . . . . 53
3.2 Basic macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3 Skeleton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.4 Support for active characters . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.5 Support for saving macro definitions . . . . . . . . . . . . . . . . . . . . . 56
3.6 Support for extending macros . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.7 Macros common to a number of languages . . . . . . . . . . . . . . . . . . 56
3.8 Encoding-dependent strings . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.9 Executing code based on the selector . . . . . . . . . . . . . . . . . . . . . 60

II Source code 60
4 Identification and loading of required files 60

5 locale directory 61

1
6 Tools 61
6.1 Multiple languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.2 The Package File (LATEX, babel.sty) . . . . . . . . . . . . . . . . . . . . . . 66
6.3 base . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.4 key=value options and other general option . . . . . . . . . . . . . . . . . 68
6.5 Conditional loading of shorthands . . . . . . . . . . . . . . . . . . . . . . . 69
6.6 Interlude for Plain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

7 Multiple languages 71
7.1 Selecting the language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.2 Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.3 Hooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.4 Setting up language files . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.5 Shorthands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.6 Language attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.7 Support for saving macro definitions . . . . . . . . . . . . . . . . . . . . . 97
7.8 Short tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.9 Hyphens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.10 Multiencoding strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.11 Macros common to a number of languages . . . . . . . . . . . . . . . . . . 107
7.12 Making glyphs available . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
7.12.1 Quotation marks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
7.12.2 Letters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
7.12.3 Shorthands for quotation marks . . . . . . . . . . . . . . . . . . . 109
7.12.4 Umlauts and tremas . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7.13 Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.14 Load engine specific macros . . . . . . . . . . . . . . . . . . . . . . . . . . 112
7.15 Creating and modifying languages . . . . . . . . . . . . . . . . . . . . . . 112

8 Adjusting the Babel bahavior 134


8.1 Cross referencing macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
8.2 Marks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
8.3 Preventing clashes with other packages . . . . . . . . . . . . . . . . . . . 140
8.3.1 ifthen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
8.3.2 varioref . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
8.3.3 hhline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
8.4 Encoding and fonts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
8.5 Basic bidi support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
8.6 Local Language Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 146
8.7 Language options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

9 The kernel of Babel (babel.def, common) 150

10 Loading hyphenation patterns 150

11 Font handling with fontspec 154

12 Hooks for XeTeX and LuaTeX 158


12.1 XeTeX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
12.2 Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
12.3 LuaTeX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
12.4 Southeast Asian scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
12.5 CJK line breaking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
12.6 Arabic justification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
12.7 Common stuff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
12.8 Automatic fonts and ids switching . . . . . . . . . . . . . . . . . . . . . . . 174
12.9 Bidi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
12.10 Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
12.11 Lua: transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

2
12.12 Lua: Auto bidi with basic and basic-r . . . . . . . . . . . . . . . . . . . . 194

13 Data for CJK 205

14 The ‘nil’ language 205

15 Calendars 206
15.1 Islamic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

16 Hebrew 208

17 Persian 212

18 Coptic and Ethiopic 212

19 Buddhist 213

20 Support for Plain TEX (plain.def) 213


20.1 Not renaming hyphen.tex . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
20.2 Emulating some LATEX features . . . . . . . . . . . . . . . . . . . . . . . . . 214
20.3 General tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
20.4 Encoding related macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

21 Acknowledgements 221

Troubleshoooting
Paragraph ended before \UTFviii@three@octets was complete . . . . . . . . . . . 5
No hyphenation patterns were preloaded for (babel) the language ‘LANG’ into the
format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
You are loading directly a language style . . . . . . . . . . . . . . . . . . . . . . . 8
Unknown language ‘LANG’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Argument of \language@active@arg” has an extra } . . . . . . . . . . . . . . . . . 12
Package babel Info: The following fonts are not babel standard families . . . . . 26

3
Part I
User guide
What is this document about? This user guide focuses on internationalization and
localization with LATEX and pdftex, xetex and luatex with the babel package. There are
also some notes on its use with e-Plain and pdf-Plain TEX. Part II describes the code, and
usually it can be ignored.

What if I’m interested only in the latest changes? Changes and new features with
relation to version 3.8 are highlighted with New X.XX , and there are some notes for
the latest versions in the babel site. The most recent features can be still unstable.

Can I help? Sure! If you are interested in the TEX multilingual support, please join the
kadingira mail list. You can follow the development of babel in GitHub and make
suggestions; feel free to fork it and make pull requests. If you are the author of a
package, send to me a few test files which I’ll add to mine, so that possible issues can be
caught in the development phase.

It doesn’t work for me! You can ask for help in some forums like tex.stackexchange, but if
you have found a bug, I strongly beg you to report it in GitHub, which is much better
than just complaining on an e-mail list or a web forum. Remember warnings are not
errors by themselves, they just warn about possible problems or incompatibilities.

How can I contribute a new language? See section 3.1 for contributing a language.

I only need learn the most basic features. The first subsections (1.1-1.3) describe the
traditional way of loading a language (with ldf files), which is usually all you need. The
alternative way based on ini files, which complements the previous one (it does not
replace it, although it is still necessary in some languages), is described below; go to
1.13.

I don’t like manuals. I prefer sample files. This manual contains lots of examples and
tips, but in GitHub there are many sample files.

1 The user interface


1.1 Monolingual documents
In most cases, a single language is required, and then all you need in LATEX is to load the
package using its standard mechanism for this purpose, namely, passing that language as
an optional argument. In addition, you may want to set the font and input encodings.
Another approach is making the language a global option in order to let other packages
detect and use it. This is the standard way in LATEX for an option – in this case a language –
to be recognized by several packages.
Many languages are compatible with xetex and luatex. With them you can use babel to
localize the documents. When these engines are used, the Latin script is covered by default
in current LATEX (provided the document encoding is UTF-8), because the font loader is
preloaded and the font is switched to lmroman. Other scripts require loading fontspec. You
may want to set the font attributes with fontspec, too.

EXAMPLE Here is a simple full example for “traditional” TEX engines (see below for xetex and luatex).
The packages fontenc and inputenc do not belong to babel, but they are included in the example
because typically you will need them. It assumes UTF-8, the default encoding:

pdftex
\documentclass{article}

\usepackage[T1]{fontenc}

4
\usepackage[french]{babel}

\begin{document}

Plus ça change, plus c'est la même chose!

\end{document}

Now consider something like:

\documentclass[french]{article}
\usepackage{babel}
\usepackage{varioref}

With this setting, the package varioref will also see the option french and will be able to use it.

EXAMPLE And now a simple monolingual document in Russian (text from the Wikipedia) with xetex
or luatex. Note neither fontenc nor inputenc are necessary, but the document should be encoded
in UTF-8 and a so-called Unicode font must be loaded (in this example \babelfont is used,
described below).

luatex/xetex
\documentclass[russian]{article}

\usepackage{babel}

\babelfont{rm}{DejaVu Serif}

\begin{document}

Россия, находящаяся на пересечении множества культур, а также


с учётом многонационального характера её населения, — отличается
высокой степенью этнокультурного многообразия и способностью к
межкультурному диалогу.

\end{document}

TROUBLESHOOTING A common source of trouble is a wrong setting of the input encoding.


Depending on the LATEX version you can get the following somewhat cryptic error:

! Paragraph ended before \UTFviii@three@octets was complete.

Or the more explanatory:

! Package inputenc Error: Invalid UTF-8 byte ...

Make sure you set the encoding actually used by your editor.

NOTE Because of the way babel has evolved, “language” can refer to (1) a set of hyphenation
patterns as preloaded into the format, (2) a package option, (3) an ldf file, and (4) a name used in
the document to select a language or dialect. So, a package option refers to a language in a
generic way – sometimes it is the actual language name used to select it, sometimes it is a file
name loading a language with a different name, sometimes it is a file name loading several
languages. Please, read the documentation for specific languages for further info.

TROUBLESHOOTING The following warning is about hyphenation patterns, which are not under the
direct control of babel:

5
Package babel Warning: No hyphenation patterns were preloaded for
(babel) the language `LANG' into the format.
(babel) Please, configure your TeX system to add them and
(babel) rebuild the format. Now I will use the patterns
(babel) preloaded for \language=0 instead on input line 57.

The document will be typeset, but very likely the text will not be correctly hyphenated. Some
languages may be raising this warning wrongly (because they are not hyphenated); it is a bug to
be fixed – just ignore it. See the manual of your distribution (MacTEX, MikTEX, TEXLive, etc.) for
further info about how to configure it.

NOTE With hyperref you may want to set the document language with something like:

\usepackage[pdflang=es-MX]{hyperref}

This is not currently done by babel and you must set it by hand.

NOTE Although it has been customary to recommend placing \title, \author and other elements
printed by \maketitle after \begin{document}, mainly because of shorthands, it is advisable to
keep them in the preamble. Currently there is no real need to use shorthands in those macros.

1.2 Multilingual documents


In multilingual documents, just use a list of the required languages as package or class
options. The last language is considered the main one, activated by default. Sometimes, the
main language changes the document layout (eg, spanish and french).

EXAMPLE In LATEX, the preamble of the document:

\documentclass{article}
\usepackage[dutch,english]{babel}

would tell LATEX that the document would be written in two languages, Dutch and English, and
that English would be the first language in use, and the main one.

You can also set the main language explicitly, but it is discouraged except if there is a real
reason to do so:

\documentclass{article}
\usepackage[main=english,dutch]{babel}

Examples of cases where main is useful are the following.

EXAMPLE Some classes load babel with a hardcoded language option. Sometimes, the main language
can be overridden with something like that before \documentclass:

\PassOptionsToPackage{main=english}{babel}

NOTE Languages may be set as global and as package option at the same time, but in such a case you
should set explicitly the main language with the package option main:

\documentclass[italian]{book}
\usepackage[ngerman,main=italian]{babel}

WARNING In the preamble the main language has not been selected, except hyphenation patterns
and the name assigned to \languagename (in particular, shorthands, captions and date are not
activated). If you need to define boxes and the like in the preamble, you might want to use some
of the language selectors described below.

6
To switch the language there are two basic macros, described below in detail:
\selectlanguage is used for blocks of text, while \foreignlanguage is for chunks of text
inside paragraphs.

EXAMPLE A full bilingual document with pdftex follows. The main language is french, which is
activated when the document begins. It assumes UTF-8:

pdftex
\documentclass{article}

\usepackage[T1]{fontenc}

\usepackage[english,french]{babel}

\begin{document}

Plus ça change, plus c'est la même chose!

\selectlanguage{english}

And an English paragraph, with a short text in


\foreignlanguage{french}{français}.

\end{document}

EXAMPLE With xetex and luatex, the following bilingual, single script document in UTF-8 encoding
just prints a couple of ‘captions’ and \today in Danish and Vietnamese. No additional packages
are required, because the default font supports both languages.

luatex/xetex
\documentclass{article}

\usepackage[vietnamese,danish]{babel}

\begin{document}

\prefacename, \alsoname, \today.

\selectlanguage{vietnamese}

\prefacename, \alsoname, \today.

\end{document}

NOTE Once loaded a language, you can select it with the corresponding BCP47 tag. See section 1.22
for further details.

1.3 Mostly monolingual documents


New 3.39 Very often, multilingual documents consist of a main language with small
pieces of text in another languages (words, idioms, short sentences). Typically, all you need
is to set the line breaking rules and, perhaps, the font. In such a case, babel now does not
require declaring these secondary languages explicitly, because the basic settings are
loaded on the fly when the language is selected (and also when provided in the optional
argument of \babelfont, if used.)
This is particularly useful, too, when there are short texts of this kind coming from an
external source whose contents are not known on beforehand (for example, titles in a
bibliography). At this regard, it is worth remembering that \babelfont does not load any
font until required, so that it can be used just in case.

EXAMPLE A trivial document with the default font in English and Spanish, and FreeSerif in Russian
is:

7
luatex/xetex
\documentclass[english]{article}
\usepackage{babel}

\babelfont[russian]{rm}{FreeSerif}

\begin{document}

English. \foreignlanguage{russian}{Русский}.
\foreignlanguage{spanish}{Español}.

\end{document}

NOTE Instead of its name, you may prefer to select the language with the corresponding BCP47 tag.
This alternative, however, must be activated explicitly, because a two- or tree-letter word is a
valid name for a language (eg, lu can be the locale name with tag khb or the tag for
lubakatanga). See section 1.22 for further details.

1.4 Modifiers
New 3.9c The basic behavior of some languages can be modified when loading babel by
means of modifiers. They are set after the language name, and are prefixed with a dot (only
when the language is set as package option – neither global options nor the main key
accepts them). An example is (spaces are not significant and they can be added or
removed):1

\usepackage[latin.medieval, spanish.notilde.lcroman, danish]{babel}

Attributes (described below) are considered modifiers, ie, you can set an attribute by
including it in the list of modifiers. However, modifiers are a more general mechanism.

1.5 Troubleshooting
• Loading directly sty files in LATEX (ie, \usepackage{hlanguagei}) is deprecated and you
will get the error:2

! Package babel Error: You are loading directly a language style.


(babel) This syntax is deprecated and you must use
(babel) \usepackage[language]{babel}.

• Another typical error when using babel is the following:3

! Package babel Error: Unknown language `#1'. Either you have


(babel) misspelled its name, it has not been installed,
(babel) or you requested it in a previous run. Fix its name,
(babel) install it or just rerun the file, respectively. In
(babel) some cases, you may need to remove the aux file

The most frequent reason is, by far, the latest (for example, you included spanish, but
you realized this language is not used after all, and therefore you removed it from the
option list). In most cases, the error vanishes when the document is typeset again, but
in more severe ones you will need to remove the aux file.
1
No predefined “axis” for modifiers are provided because languages and their scripts have quite different needs.
2
In old versions the error read “You have used an old interface to call babel”, not very helpful.
3
In old versions the error read “You haven’t loaded the language LANG yet”.

8
1.6 Plain
In e-Plain and pdf-Plain, load languages styles with \input and then use \begindocument
(the latter is defined by babel):

\input estonian.sty
\begindocument

WARNING Not all languages provide a sty file and some of them are not compatible with those
formats. Please, refer to Using babel with Plain for further details.

1.7 Basic language selectors


This section describes the commands to be used in the document to switch the language in
multilingual documents. In most cases, only the two basic macros \selectlanguage and
\foreignlanguage are necessary. The environments otherlanguage, otherlanguage*
and hyphenrules are auxiliary, and described in the next section.
The main language is selected automatically when the document environment begins.

\selectlanguage {hlanguagei}
When a user wants to switch from one language to another he can do so using the macro
\selectlanguage. This macro takes the language, defined previously by a language
definition file, as its argument. It calls several macros that should be defined in the
language definition files to activate the special definitions for the language chosen:

\selectlanguage{german}

This command can be used as environment, too.

NOTE For “historical reasons”, a macro name is converted to a language name without the leading \;
in other words, \selectlanguage{\german} is equivalent to \selectlanguage{german}. Using a
macro instead of a “real” name is deprecated. New 3.43 However, if the macro name does not
match any language, it will get expanded as expected.

NOTE Bear in mind \selectlanguage can be automatically executed, in some cases, in the auxiliary
files, at heads and foots, and after the environment otherlanguage*.

WARNING If used inside braces there might be some non-local changes, as this would be roughly
equivalent to:

{\selectlanguage{<inner-language>} ...}\selectlanguage{<outer-language>}

If you want a change which is really local, you must enclose this code with an additional
grouping level.

WARNING There are a couple of issues related to the way the language information is written to the
auxiliary files:

• \selectlanguage should not be used inside some boxed environments (like floats or
minipage) to switch the language if you need the information written to the aux be correctly
synchronized. This rarely happens, but if it were the case, you must use otherlanguage
instead.
• In addition, this macro inserts a \write in vertical mode, which may break the vertical
spacing in some cases (for example, between lists). New 3.64 The behavior can be adjusted
with \babeladjust{select.write=hmodei}, where hmodei is shift (which shifts the skips
down and adds a \penalty); keep (the default – with it the \write and the skips are kept in
the order they are written), and omit (which may seem a too drastic solution, because nothing
is written, but more often than not this command is applied to more or less shorts texts with
no sectioning or similar commands and therefore no language synchronization is necessary).

9
\foreignlanguage [hoption-listi]{hlanguagei}{htexti}
The command \foreignlanguage takes two arguments; the second argument is a phrase
to be typeset according to the rules of the language named in its first one.
This command (1) only switches the extra definitions and the hyphenation rules for the
language, not the names and dates, (2) does not send information about the language to
auxiliary files (i.e., the surrounding language is still in force), and (3) it works even if the
language has not been set as package option (but in such a case it only sets the
hyphenation patterns and a warning is shown). With the bidi option, it also enters in
horizontal mode (this is not done always for backwards compatibility), and since it is
meant for phrases only the text direction (and not the paragraph one) is set.
New 3.44 As already said, captions and dates are not switched. However, with the
optional argument you can switch them, too. So, you can write:

\foreignlanguage[date]{polish}{\today}

In addition, captions can be switched with captions (or both, of course, with date,
captions). Until 3.43 you had to write something like {\selectlanguage{..} ..}, which
was not always the most convenient way.

1.8 Auxiliary language selectors

\begin{otherlanguage} {hlanguagei} … \end{otherlanguage}


The environment otherlanguage does basically the same as \selectlanguage, except that
language change is (mostly) local to the environment.
Actually, there might be some non-local changes, as this environment is roughly equivalent
to:

\begingroup
\selectlanguage{<inner-language>}
...
\endgroup
\selectlanguage{<outer-language>}

If you want a change which is really local, you must enclose this environment with an
additional grouping, like braces {}.
Spaces after the environment are ignored.

\begin{otherlanguage*} [hoption-listi]{hlanguagei} … \end{otherlanguage*}


Same as \foreignlanguage but as environment. Spaces after the environment are not
ignored.
This environment was originally intended for intermixing left-to-right typesetting with
right-to-left typesetting in engines not supporting a change in the writing direction inside a
line. However, by default it never complied with the documented behavior and it is just a
version as environment of \foreignlanguage, except when the option bidi is set – in this
case, \foreignlanguage emits a \leavevmode, while otherlanguage* does not.

1.9 More on selection

\babeltags {htag1i = hlanguage1i, htag2i = hlanguage2i, …}

New 3.9i In multilingual documents with many language-switches the commands above
can be cumbersome. With this tool shorter names can be defined. It adds nothing really
new – it is just syntactical sugar.

10
It defines \texthtag1i{htexti} to be \foreignlanguage{hlanguage1i}{htexti}, and
\begin{htag1i} to be \begin{otherlanguage*}{hlanguage1i}, and so on. Note \htag1i is
also allowed, but remember to set it locally inside a group.
WARNING There is a clear drawback to this feature, namely, the ‘prefix’ \text... is heavily
overloaded in LATEX and conflicts with existing macros may arise (\textlatin, \textbar, \textit,
\textcolor and many others). The same applies to environments, because arabic conflicts with
\arabic. Furthermore, and because of this overloading, detecting the language of a chunk of text
by external tools can become unfeasible. Except if there is a reason for this ‘syntactical sugar’, the
best option is to stick to the default selectors or to define your own alternatives.

EXAMPLE With

\babeltags{de = german}

you can write

text \textde{German text} text

and

text
\begin{de}
German text
\end{de}
text

NOTE Something like \babeltags{finnish = finnish} is legitimate – it defines \textfinnish and


\finnish (and, of course, \begin{finnish}).

NOTE Actually, there may be another advantage in the ‘short’ syntax \texthtag i, namely, it is not
affected by \MakeUppercase (while \foreignlanguage is).

\babelensure [include=hcommandsi,exclude=hcommandsi,fontenc=hencoding i]{hlanguagei}

New 3.9i Except in a few languages, like russian, captions and dates are just strings, and
do not switch the language. That means you should set it explicitly if you want to use them,
or hyphenation (and in some cases the text itself) will be wrong. For example:

\foreignlanguage{russian}{text \foreignlanguage{polish}{\seename} text}

Of course, TEX can do it for you. To avoid switching the language all the while,
\babelensure redefines the captions for a given language to wrap them with a selector:

\babelensure{polish}

By default only the basic captions and \today are redefined, but you can add further
macros with the key include in the optional argument (without commas). Macros not to
be modified are listed in exclude. You can also enforce a font encoding with the option
fontenc.4 A couple of examples:

\babelensure[include=\Today]{spanish}
\babelensure[fontenc=T5]{vietnamese}

They are activated when the language is selected (at the afterextras event), and it makes
some assumptions which could not be fulfilled in some languages. Note also you should
include only macros defined by the language, not global macros (eg, \TeX of \dag).
With ini files (see below), captions are ensured by default.
4
With it, encoded strings may not work as expected.

11
1.10 Shorthands
A shorthand is a sequence of one or two characters that expands to arbitrary TEX code.
Shorthands can be used for different kinds of things; for example: (1) in some languages
shorthands such as "a are defined to be able to hyphenate the word if the encoding is OT1;
(2) in some languages shorthands such as ! are used to insert the right amount of white
space; (3) several kinds of discretionaries and breaks can be inserted easily with "-, "=, etc.
The package inputenc as well as xetex and luatex have alleviated entering non-ASCII
characters, but minority languages and some kinds of text can still require characters not
directly available on the keyboards (and sometimes not even as separated or precomposed
Unicode characters). As to the point 2, now pdfTeX provides \knbccode, and luatex can
manipulate the glyph list. Tools for point 3 can be still very useful in general.
There are four levels of shorthands: user, language, system, and language user (by order of
precedence). In most cases, you will use only shorthands provided by languages.

NOTE Keep in mind the following:

1. Activated chars used for two-char shorthands cannot be followed by a closing brace } and the
spaces following are gobbled. With one-char shorthands (eg, :), they are preserved.
2. If on a certain level (system, language, user, language user) there is a one-char shorthand,
two-char ones starting with that char and on the same level are ignored.
3. Since they are active, a shorthand cannot contain the same character in its definition (except
if deactivated with, eg, \string).

TROUBLESHOOTING A typical error when using shorthands is the following:

! Argument of \language@active@arg" has an extra }.

It means there is a closing brace just after a shorthand, which is not allowed (eg, "}). Just add {}
after (eg, "{}}).

\shorthandon {hshorthands-listi}
\shorthandoff * {hshorthands-listi}
It is sometimes necessary to switch a shorthand character off temporarily, because it must
be used in an entirely different way. For this purpose, the user commands \shorthandoff
and \shorthandon are provided. They each take a list of characters as their arguments.
The command \shorthandoff sets the \catcode for each of the characters in its argument
to other (12); the command \shorthandon sets the \catcode to active (13). Both commands
only work on ‘known’ shorthand characters, and an error will be raised otherwise. You
can check if a character is a shorthand with \ifbabelshorthand (see below).
New 3.9a However, \shorthandoff does not behave as you would expect with characters
like ~ or ^, because they usually are not “other”. For them \shorthandoff* is provided, so
that with

\shorthandoff*{~^}

~ is still active, very likely with the meaning of a non-breaking space, and ^ is the
superscript character. The catcodes used are those when the shorthands are defined,
usually when language files are loaded.
If you do not need shorthands, or prefer an alternative approach of your own, you may
want to switch them off with the package option shorthands=off, as described below.

WARNING It is worth emphasizing these macros are meant for temporary changes. Whenever
possible and if there are not conflicts with other packages, shorthands must be always enabled
(or disabled).

12
\useshorthands * {hchari}
The command \useshorthands initiates the definition of user-defined shorthand
sequences. It has one argument, the character that starts these personal shorthands.
New 3.9a User shorthands are not always alive, as they may be deactivated by languages
(for example, if you use " for your user shorthands and switch from german to french, they
stop working). Therefore, a starred version \useshorthands*{hchari} is provided, which
makes sure shorthands are always activated.
Currently, if the package option shorthands is used, you must include any character to be
activated with \useshorthands. This restriction will be lifted in a future release.

\defineshorthand [hlanguagei,hlanguagei,...]{hshorthandi}{hcodei}
The command \defineshorthand takes two arguments: the first is a one- or two-character
shorthand sequence, and the second is the code the shorthand should expand to.
New 3.9a An optional argument allows to (re)define language and system shorthands
(some languages do not activate shorthands, so you may want to add
\languageshorthands{hlang i} to the corresponding \extrashlang i, as explained below).
By default, user shorthands are (re)defined.
User shorthands override language ones, which in turn override system shorthands.
Language-dependent user shorthands (new in 3.9) take precedence over “normal” user
shorthands.

EXAMPLE Let’s assume you want a unified set of shorthand for discretionaries (languages do not
define shorthands consistently, and "-, \-, "= have different meanings). You can start with, say:

\useshorthands*{"}
\defineshorthand{"*}{\babelhyphen{soft}}
\defineshorthand{"-}{\babelhyphen{hard}}

However, the behavior of hyphens is language-dependent. For example, in languages like Polish
and Portuguese, a hard hyphen inside compound words are repeated at the beginning of the next
line. You can then set:

\defineshorthand[*polish,*portuguese]{"-}{\babelhyphen{repeat}}

Here, options with * set a language-dependent user shorthand, which means the generic one
above only applies for the rest of languages; without * they would (re)define the language
shorthands instead, which are overridden by user ones.
Now, you have a single unified shorthand ("-), with a content-based meaning (‘compound word
hyphen’) whose visual behavior is that expected in each context.

\languageshorthands {hlanguagei}
The command \languageshorthands can be used to switch the shorthands on the
language level. It takes one argument, the name of a language or none (the latter does what
its name suggests).5 Note that for this to work the language should have been specified as
an option when loading the babel package. For example, you can use in english the
shorthands defined by ngerman with

\addto\extrasenglish{\languageshorthands{ngerman}}

(You may also need to activate them as user shorthands in the preamble with, for example,
\useshorthands or \useshorthands*.)
5
Actually, any name not corresponding to a language group does the same as none. However, follow this con-
vention because it might be enforced in future releases of babel to catch possible errors.

13
EXAMPLE Very often, this is a more convenient way to deactivate shorthands than \shorthandoff,
for example if you want to define a macro to easy typing phonetic characters with tipa:

\newcommand{\myipa}[1]{{\languageshorthands{none}\tipaencoding#1}}

\babelshorthand {hshorthandi}
With this command you can use a shorthand even if (1) not activated in shorthands (in
this case only shorthands for the current language are taken into account, ie, not user
shorthands), (2) turned off with \shorthandoff or (3) deactivated with the internal
\bbl@deactivate; for example, \babelshorthand{"u} or \babelshorthand{:}. (You can
conveniently define your own macros, or even your own user shorthands provided they
do not overlap.)

EXAMPLE Since by default shorthands are not activated until \begin{document}, you may use this
macro when defining the \title in the preamble:

\title{Documento científico\babelshorthand{"-}técnico}

For your records, here is a list of shorthands, but you must double check them, as they may
change:6

Languages with no shorthands Croatian, English (any variety), Indonesian, Hebrew,


Interlingua, Irish, Lower Sorbian, Malaysian, North Sami, Romanian, Scottish, Welsh
Languages with only " as defined shorthand character Albanian, Bulgarian, Danish,
Dutch, Finnish, German (old and new orthography, also Austrian), Icelandic, Italian,
Norwegian, Polish, Portuguese (also Brazilian), Russian, Serbian (with Latin script),
Slovene, Swedish, Ukrainian, Upper Sorbian
Basque " ' ~
Breton : ; ? !
Catalan " ' `
Czech " -
Esperanto ^
Estonian " ~
French (all varieties) : ; ? !
Galician " . ' ~ < >
Greek ~
Hungarian `
Kurmanji ^
Latin " ^ =
Slovak " ^ ' -
Spanish " . < > ' ~
Turkish : ! =

In addition, the babel core declares ~ as a one-char shorthand which is let, like the
standard ~, to a non breaking space.7

\ifbabelshorthand {hcharacteri}{htruei}{hfalsei}
New 3.23 Tests if a character has been made a shorthand.

\aliasshorthand {horiginali}{haliasi}
The command \aliasshorthand can be used to let another character perform the same
functions as the default shorthand character. If one prefers for example to use the
6
Thanks to Enrico Gregorio
7
This declaration serves to nothing, but it is preserved for backward compatibility.

14
character / over " in typing Polish texts, this can be achieved by entering
\aliasshorthand{"}{/}. For the reasons in the warning below, usage of this macro is not
recommended.

NOTE The substitute character must not have been declared before as shorthand (in such a case,
\aliashorthands is ignored).

EXAMPLE The following example shows how to replace a shorthand by another

\aliasshorthand{~}{^}
\AtBeginDocument{\shorthandoff*{~}}

WARNING Shorthands remember somehow the original character, and the fallback value is that of
the latter. So, in this example, if no shorthand if found, ^ expands to a non-breaking space,
because this is the value of ~ (internally, ^ still calls \active@char~ or \normal@char~).
Furthermore, if you change the system value of ^ with \defineshorthand nothing happens.

1.11 Package options


New 3.9a These package options are processed before language options, so that they are
taken into account irrespective of its order. The first three options have been available in
previous versions.

KeepShorthandsActive Tells babel not to deactivate shorthands after loading a language file, so that they are also
available in the preamble.

activeacute For some languages babel supports this options to set ' as a shorthand in case it is not done
by default.

activegrave Same for `.

shorthands= hcharihchari... | off


The only language shorthands activated are those given, like, eg:

\usepackage[esperanto,french,shorthands=:;!?]{babel}

If ' is included, activeacute is set; if ` is included, activegrave is set. Active characters


(like ~) should be preceded by \string (otherwise they will be expanded by LATEX before
they are passed to the package and therefore they will not be recognized); however, t is
provided for the common case of ~ (as well as c for not so common case of the comma).
With shorthands=off no language shorthands are defined, As some languages use this
mechanism for tools not available otherwise, a macro \babelshorthand is defined, which
allows using them; see above.

safe= none | ref | bib


Some LATEX macros are redefined so that using shorthands is safe. With safe=bib only
\nocite, \bibcite and \bibitem are redefined. With safe=ref only \newlabel, \ref and
\pageref are redefined (as well as a few macros from varioref and ifthen).
With safe=none no macro is redefined. This option is strongly recommended, because a
good deal of incompatibilities and errors are related to these redefinitions. As of
New 3.34 , in TEX based engines (ie, almost every engine except the oldest ones)
shorthands can be used in these macros (formerly you could not).

math= active | normal


Shorthands are mainly intended for text, not for math. By setting this option with the
value normal they are deactivated in math mode (default is active) and things like ${a'}$
(a closing brace after a shorthand) are not a source of trouble anymore.

15
config= hfilei
Load hfilei.cfg instead of the default config file bblopts.cfg (the file is loaded even with
noconfigs).

main= hlanguagei
Sets the main language, as explained above, ie, this language is always loaded last. If it is
not given as package or global option, it is added to the list of requested languages.

headfoot= hlanguagei
By default, headlines and footlines are not touched (only marks), and if they contain
language-dependent macros (which is not usual) there may be unexpected results. With
this option you may set the language in heads and foots.

noconfigs Global and language default config files are not loaded, so you can make sure your
document is not spoilt by an unexpected .cfg file. However, if the key config is set, this
file is loaded.

showlanguages Prints to the log the list of languages loaded when the format was created: number
(remember dialects can share it), name, hyphenation file and exceptions file.

nocase New 3.9l Language settings for uppercase and lowercase mapping (as set by \SetCase)
are ignored. Use only if there are incompatibilities with other packages.

silent New 3.9l No warnings and no infos are written to the log file.8

hyphenmap= off | first | select | other | other*


New 3.9g Sets the behavior of case mapping for hyphenation, provided the language
defines it.9 It can take the following values:

off deactivates this feature and no case mapping is applied;


first sets it at the first switching commands in the current or parent scope (typically,
when the aux file is first read and at \begin{document}, but also the first
\selectlanguage in the preamble), and it’s the default if a single language option has
been stated;10
select sets it only at \selectlanguage;
other also sets it at otherlanguage;
other* also sets it at otherlanguage* as well as in heads and foots (if the option headfoot
is used) and in auxiliary files (ie, at \select@language), and it’s the default if several
language options have been stated. The option first can be regarded as an optimized
version of other* for monolingual documents.11

bidi= default | basic | basic-r | bidi-l | bidi-r


New 3.14 Selects the bidi algorithm to be used in luatex and xetex. See sec. 1.24.

layout=
New 3.16 Selects which layout elements are adapted in bidi documents. See sec. 1.24.

provide= *
8
You can use alternatively the package silence.
9
Turned off in plain.
10
Duplicated options count as several ones.
11
Providing foreign is pointless, because the case mapping applied is that at the end of the paragraph, but if
either xetex or luatex change this behavior it might be added. On the other hand, other is provided even if I [JBL]
think it isn’t really useful, but who knows.

16
New 3.49 An alternative to \babelprovide for languages passed as options. See
section 1.13, which describes also the variants provide+= and provide*=.

1.12 The base option


With this package option babel just loads some basic macros (those in switch.def),
defines \AfterBabelLanguage and exits. It also selects the hyphenation patterns for the
last language passed as option (by its name in language.dat). There are two main uses:
classes and packages, and as a last resort in case there are, for some reason, incompatible
languages. It can be used if you just want to select the hyphenation patterns of a single
language, too.

\AfterBabelLanguage {hoption-namei}{hcodei}
This command is currently the only provided by base. Executes hcodei when the file loaded
by the corresponding package option is finished (at \ldf@finish). The setting is global. So

\AfterBabelLanguage{french}{...}

does ... at the end of french.ldf. It can be used in ldf files, too, but in such a case the code
is executed only if hoption-namei is the same as \CurrentOption (which could not be the
same as the option name as set in \usepackage!).

EXAMPLE Consider two languages foo and bar defining the same \macro with \newcommand. An
error is raised if you attempt to load both. Here is a way to overcome this problem:

\usepackage[base]{babel}
\AfterBabelLanguage{foo}{%
\let\macroFoo\macro
\let\macro\relax}
\usepackage[foo,bar]{babel}

NOTE With a recent version of LATEX, an alternative method to execute some code just after an ldf
file is loaded is with \AddToHook and the hook file/<language>.ldf/after. Babel does not
predeclare it, and you have to do it yourself with \ActivateGenericHook.

WARNING Currently this option is not compatible with languages loaded on the fly.

1.13 ini files


An alternative approach to define a language (or, more precisely, a locale) is by means of
an ini file. Currently babel provides about 250 of these files containing the basic data
required for a locale, plus basic templates for 500 about locales.
ini files are not meant only for babel, and they has been devised as a resource for other
packages. To easy interoperability between TEX and other systems, they are identified with
the BCP 47 codes as preferred by the Unicode Common Locale Data Repository, which was
used as source for most of the data provided by these files, too (the main exception being
the \...name strings).
Most of them set the date, and many also the captions (Unicode and LICR). They will be
evolving with the time to add more features (something to keep in mind if backward
compatibility is important). The following section shows how to make use of them by
means of \babelprovide. In other words, \babelprovide is mainly meant for auxiliary
tasks, and as alternative when the ldf, for some reason, does work as expected.

EXAMPLE Although Georgian has its own ldf file, here is how to declare this language with an ini
file in Unicode engines.

17
luatex/xetex
\documentclass{book}

\usepackage{babel}
\babelprovide[import, main]{georgian}

\babelfont{rm}[Renderer=Harfbuzz]{DejaVu Sans}

\begin{document}

\tableofcontents

\chapter{სამზარეულო და სუფრის ტრადიციები}

ქართული ტრადიციული სამზარეულო ერთ-ერთი უმდიდრესია მთელ მსოფლიოში.

\end{document}

New 3.49 Alternatively, you can tell babel to load all or some languages passed as options
with \babelprovide and not from the ldf file in a few few typical cases. Thus, provide=*
means ‘load the main language with the \babelprovide mechanism instead of the ldf file’
applying the basic features, which in this case means import, main. There are (currently)
three options:

• provide=* is the option just explained, for the main language;

• provide+=* is the same for additional languages (the main language is still the ldf file);

• provide*=* is the same for all languages, ie, main and additional.

EXAMPLE The preamble in the previous example can be more compactly written as:

\documentclass{book}
\usepackage[georgian, provide=*]{babel}
\babelfont{rm}[Renderer=Harfbuzz]{DejaVu Sans}

Or also:

\documentclass[georgian]{book}
\usepackage[provide=*]{babel}
\babelfont{rm}[Renderer=Harfbuzz]{DejaVu Sans}

NOTE The ini files just define and set some parameters, but the corresponding behavior is not
always implemented. Also, there are some limitations in the engines. A few remarks follow
(which could no longer be valid when you read this manual, if the packages involved han been
updated). The Harfbuzz renderer has still some issues, so as a rule of thumb prefer the default
renderer, and resort to Harfbuzz only if the former does not work for you. Fortunately, fonts can
be loaded twice with different renderers; for example:

\babelfont[spanish]{rm}{FreeSerif}
\babelfont[hindi]{rm}[Renderer=Harfbuzz]{FreeSerif}

Arabic Monolingual documents mostly work in luatex, but it must be fine tuned, particularly
math and graphical elements like picture. In xetex babel resorts to the bidi package, which
seems to work.
Hebrew Niqqud marks seem to work in both engines, but depending on the font cantillation
marks might be misplaced (xetex or luatex with Harfbuzz seems better).
Devanagari In luatex and the the default renderer many fonts work, but some others do not, the
main issue being the ‘ra’. You may need to set explicitly the script to either deva or dev2, eg:

18
\newfontscript{Devanagari}{deva}

Other Indic scripts are still under development in the default luatex renderer, but should
work with Renderer=Harfbuzz. They also work with xetex, although unlike with luatex fine
tuning the font behavior is not always possible.
Southeast scripts Thai works in both luatex and xetex, but line breaking differs (rules are
hard-coded in xetex, but they can be modified in luatex). Lao seems to work, too, but there are
no patterns for the latter in luatex. Khemer clusters are rendered wrongly with the default
renderer. The comment about Indic scripts and lualatex also applies here. Some quick
patterns can help, with something similar to:

\babelprovide[import, hyphenrules=+]{lao}
\babelpatterns[lao]{1ດ 1ມ 1ອ 1ງ 1ກ 1າ} % Random

East Asia scripts Settings for either Simplified of Traditional should work out of the box, with
basic line breaking with any renderer. Although for a few words and shorts texts the ini files
should be fine, CJK texts are best set with a dedicated framework (CJK, luatexja, kotex, CTeX,
etc.). This is what the class ltjbook does with luatex, which can be used in conjunction with
the ldf for japanese, because the following piece of code loads luatexja:

\documentclass[japanese]{ltjbook}
\usepackage{babel}

Latin, Greek, Cyrillic Combining chars with the default luatex font renderer might be wrong;
on then other hand, with the Harfbuzz renderer diacritics are stacked correctly, but many
hyphenations points are discarded (this bug is related to kerning, so it depends on the font).
With xetex both combining characters and hyphenation work as expected (not quite, but in
most cases it works; the problem here are font clusters).

NOTE Wikipedia defines a locale as follows: “In computing, a locale is a set of parameters that defines
the user’s language, region and any special variant preferences that the user wants to see in their
user interface. Usually a locale identifier consists of at least a language code and a country/region
code.” Babel is moving gradually from the old and fuzzy concept of language to the more modern
of locale. Note each locale is by itself a separate “language”, which explains why there are so
many files. This is on purpose, so that possible variants can be created and/or redefined easily.

Here is the list (u means Unicode captions, and l means LICR captions):

af Afrikaansul be Belarusianul
agq Aghem bem Bemba
ak Akan bez Bena
am Amharicul bg Bulgarianul
ar-DZ Arabicu bm Bambara
ar-EG Arabicu bn Banglau
ar-IQ Arabicu bo Tibetanu
ar-JO Arabicu br Bretonul
ar-LB Arabicu brx Bodo
ar-MA Arabicu bs-Cyrl Bosnian
ar-PS Arabicu bs-Latn Bosnianul
ar-SA Arabicu bs Bosnianul
ar-SY Arabicu ca Catalanul
ar-TN Arabicu ce Chechen
ar Arabicu cgg Chiga
as Assameseu chr Cherokee
asa Asu ckb-Arab Central Kurdishu
ast Asturianul ckb-Latn Central Kurdishu
az-Cyrl Azerbaijani ckb Central Kurdishu
az-Latn Azerbaijani cop Coptic
az Azerbaijaniul cs Czechul
bas Basaa cu-Cyrs Church Slavicu

19
cu-Glag Church Slavic haw Hawaiian
cu Church Slavicu he Hebrewul
cy Welshul hi Hindiu
da Danishul hr Croatianul
dav Taita hsb Upper Sorbianul
de-1901 Germanul hu Hungarianulll
de-1996 Germanul hy Armenianul
de-AT-1901 Austrian Germanul ia Interlinguaul
de-AT-1996 Austrian Germanul id Indonesianul
de-AT Austrian Germanul ig Igbo
de-CH-1901 Swiss High Germanul ii Sichuan Yi
de-CH-1996 Swiss High Germanul is Icelandicul
de-CH Swiss High Germanul it Italianul
de Germanul ja Japaneseu
dje Zarma jgo Ngomba
dsb Lower Sorbianul jmc Machame
dua Duala ka Georgianu
dyo Jola-Fonyi kab Kabyle
dz Dzongkha kam Kamba
ebu Embu kde Makonde
ee Ewe kea Kabuverdianu
el-polyton Polytonic Greekul kgp Kaingang
el Greekul khq Koyra Chiini
en-AU Australian Englishul ki Kikuyu
en-CA Canadian Englishul kk Kazakh
en-GB British Englishul kkj Kako
en-NZ Englishul kl Kalaallisut
en-US American Englishul kln Kalenjin
en Englishul km Khmeru
eo Esperantoul kmr-Arab Northern Kurdishu
es-MX Mexican Spanishul kmr-Latn Northern Kurdishul
es Spanishul kmr Northern Kurdishul
et Estonianul kn Kannadau
eu Basqueull ko-Hani Koreanu
ewo Ewondo ko Koreanu
fa Persianu kok Konkani
ff Fulah ks Kashmiri
fi Finnishul ksb Shambala
fil Filipino ksf Bafia
fo Faroese ksh Colognian
fr-BE Frenchul kw Cornish
fr-CA Canadian Frenchul ky Kyrgyz
fr-CH Swiss Frenchul la-x-classic Classic Latinul
fr-LU Frenchul la-x-ecclesia Ecclesiastic Latinul
fr Frenchul la-x-medieval Medieval Latinul
fur Friulianul la Latinul
fy Western Frisian lag Langi
ga Irishul lb Luxembourgishul
gd Scottish Gaelicul lg Ganda
gl Galicianul lkt Lakota
grc Ancient Greekul ln Lingala
gsw Swiss German lo Laou
gu Gujarati lrc Northern Luri
guz Gusii lt Lithuanianulll
gv Manx lu Luba-Katanga
ha-GH Hausa luo Luo
ha-NE Hausa luy Luyia
ha Hausaul lv Latvianul

20
mas Masai saq Samburu
mer Meru sbp Sangu
mfe Morisyen sc Sardinian
mg Malagasy se Northern Samiul
mgh Makhuwa-Meetto seh Sena
mgo Metaʼ ses Koyraboro Senni
mk Macedonianul sg Sango
ml Malayalamu shi-Latn Tachelhit
mn Mongolian shi-Tfng Tachelhit
mr Marathiu shi Tachelhit
ms-BN Malay si Sinhalau
ms-SG Malay sk Slovakul
ms Malayul sl Slovenianul
mt Maltese smn Inari Sami
mua Mundang sn Shona
my Burmese so Somali
mzn Mazanderani sq Albanianul
naq Nama sr-Cyrl-BA Serbianul
nb Norwegian Bokmålul sr-Cyrl-ME Serbianul
nd North Ndebele sr-Cyrl-XK Serbianul
ne Nepali sr-Cyrl Serbianul
nl Dutchul sr-Latn-BA Serbianul
nmg Kwasio sr-Latn-ME Serbianul
nn Norwegian Nynorskul sr-Latn-XK Serbianul
nnh Ngiemboon sr-Latn Serbianul
no Norwegianul sr Serbianul
nus Nuer sv Swedishul
nyn Nyankole sw Swahili
oc Occitanul syr Syriac
om Oromo ta Tamilu
or Odia te Teluguu
os Ossetic teo Teso
pa-Arab Punjabi th Thaiul
pa-Guru Punjabiu ti Tigrinya
pa Punjabiu tk Turkmenul
pl Polishul to Tongan
pms Piedmonteseul tr Turkishul
ps Pashto twq Tasawaq
pt-BR Brazilian Portugueseul tzm Central Atlas Tamazight
pt-PT European Portugueseul ug Uyghuru
pt Portugueseul uk Ukrainianul
qu Quechua ur Urduu
rm Romanshul uz-Arab Uzbek
rn Rundi uz-Cyrl Uzbek
ro-MD Moldavianul uz-Latn Uzbek
ro Romanianul uz Uzbek
rof Rombo vai-Latn Vai
ru Russianul vai-Vaii Vai
rw Kinyarwanda vai Vai
rwk Rwa vi Vietnameseul
sa-Beng Sanskrit vun Vunjo
sa-Deva Sanskrit wae Walser
sa-Gujr Sanskrit xog Soga
sa-Knda Sanskrit yav Yangben
sa-Mlym Sanskrit yi Yiddish
sa-Telu Sanskrit yo Yoruba
sa Sanskrit yrl Nheengatu
sah Sakha yue Cantonese

21
zgh Standard Moroccan zh-Hant-HK Chinese
Tamazight zh-Hant-MO Chinese
zh-Hans-HK Chinese
zh-Hant Chineseu
zh-Hans-MO Chinese
zh-Hans-SG Chinese zh Chineseu
zh-Hans Chineseu zu Zulu

In some contexts (currently \babelfont) an ini file may be loaded by its name. Here is the
list of the names currently supported. With these languages, \babelfont loads (if not done
before) the language and script names (even if the language is defined as a package option
with an ldf file). These are also the names recognized by \babelprovide with a valueless
import.

afrikaans bulgarian
aghem burmese
akan canadian
albanian cantonese
american catalan
amharic centralatlastamazight
ancientgreek centralkurdish
arabic chechen
arabic-algeria cherokee
arabic-DZ chiga
arabic-morocco chinese-hans-hk
arabic-MA chinese-hans-mo
arabic-syria chinese-hans-sg
arabic-SY chinese-hans
armenian chinese-hant-hk
assamese chinese-hant-mo
asturian chinese-hant
asu chinese-simplified-hongkongsarchina
australian chinese-simplified-macausarchina
austrian chinese-simplified-singapore
azerbaijani-cyrillic chinese-simplified
azerbaijani-cyrl chinese-traditional-hongkongsarchina
azerbaijani-latin chinese-traditional-macausarchina
azerbaijani-latn chinese-traditional
azerbaijani chinese
bafia churchslavic
bambara churchslavic-cyrs
basaa churchslavic-oldcyrillic12
basque churchsslavic-glag
belarusian churchsslavic-glagolitic
bemba colognian
bena cornish
bangla croatian
bodo czech
bosnian-cyrillic danish
bosnian-cyrl duala
bosnian-latin dutch
bosnian-latn dzongkha
bosnian embu
brazilian english-au
breton english-australia
british english-ca
12
The name in the CLDR is Old Church Slavonic Cyrillic, but it has been shortened for practical reasons.

22
english-canada kalaallisut
english-gb kalenjin
english-newzealand kamba
english-nz kannada
english-unitedkingdom kashmiri
english-unitedstates kazakh
english-us khmer
english kikuyu
esperanto kinyarwanda
estonian konkani
ewe korean
ewondo koyraborosenni
faroese koyrachiini
filipino kwasio
finnish kyrgyz
french-be lakota
french-belgium langi
french-ca lao
french-canada latvian
french-ch lingala
french-lu lithuanian
french-luxembourg lowersorbian
french-switzerland lsorbian
french lubakatanga
friulian luo
fulah luxembourgish
galician luyia
ganda macedonian
georgian machame
german-at makhuwameetto
german-austria makonde
german-ch malagasy
german-switzerland malay-bn
german malay-brunei
greek malay-sg
gujarati malay-singapore
gusii malay
hausa-gh malayalam
hausa-ghana maltese
hausa-ne manx
hausa-niger marathi
hausa masai
hawaiian mazanderani
hebrew meru
hindi meta
hungarian mexican
icelandic mongolian
igbo morisyen
inarisami mundang
indonesian nama
interlingua nepali
irish newzealand
italian ngiemboon
japanese ngomba
jolafonyi norsk
kabuverdianu northernluri
kabyle northernsami
kako northndebele

23
norwegianbokmal serbian-cyrl-xk
norwegiannynorsk serbian-cyrl
nswissgerman serbian-latin-bosniaherzegovina
nuer serbian-latin-kosovo
nyankole serbian-latin-montenegro
nynorsk serbian-latin
occitan serbian-latn-ba
oriya serbian-latn-me
oromo serbian-latn-xk
ossetic serbian-latn
pashto serbian
persian shambala
piedmontese shona
polish sichuanyi
polytonicgreek sinhala
portuguese-br slovak
portuguese-brazil slovene
portuguese-portugal slovenian
portuguese-pt soga
portuguese somali
punjabi-arab spanish-mexico
punjabi-arabic spanish-mx
punjabi-gurmukhi spanish
punjabi-guru standardmoroccantamazight
punjabi swahili
quechua swedish
romanian swissgerman
romansh tachelhit-latin
rombo tachelhit-latn
rundi tachelhit-tfng
russian tachelhit-tifinagh
rwa tachelhit
sakha taita
samburu tamil
samin tasawaq
sango telugu
sangu teso
sanskrit-beng thai
sanskrit-bengali tibetan
sanskrit-deva tigrinya
sanskrit-devanagari tongan
sanskrit-gujarati turkish
sanskrit-gujr turkmen
sanskrit-kannada ukenglish
sanskrit-knda ukrainian
sanskrit-malayalam uppersorbian
sanskrit-mlym urdu
sanskrit-telu usenglish
sanskrit-telugu usorbian
sanskrit uyghur
scottishgaelic uzbek-arab
sena uzbek-arabic
serbian-cyrillic-bosniaherzegovina uzbek-cyrillic
serbian-cyrillic-kosovo uzbek-cyrl
serbian-cyrillic-montenegro uzbek-latin
serbian-cyrillic uzbek-latn
serbian-cyrl-ba uzbek
serbian-cyrl-me vai-latin

24
vai-latn welsh
vai-vai westernfrisian
vai-vaii yangben
vai
yiddish
vietnam
vietnamese yoruba
vunjo zarma
walser zulu

Modifying and adding values to ini files


New 3.39 There is a way to modify the values of ini files when they get loaded with
\babelprovide and import. To set, say, digits.native in the numbers section, use
something like numbers/digits.native=abcdefghij. Keys may be added, too. Without
import you may modify the identification keys.
This can be used to create private variants easily. All you need is to import the same ini
file with a different locale name and different parameters.

1.14 Selecting fonts


New 3.15 Babel provides a high level interface on top of fontspec to select fonts. There
is no need to load fontspec explicitly – babel does it for you with the first \babelfont.13

\babelfont [hlanguage-listi]{hfont-familyi}[hfont-optionsi]{hfont-namei}

NOTE See the note in the previous section about some issues in specific languages.

The main purpose of \babelfont is to define at once in a multilingual document the fonts
required by the different languages, with their corresponding language systems (script and
language). So, if you load, say, 4 languages, \babelfont{rm}{FreeSerif} defines 4 fonts
(with their variants, of course), which are switched with the language by babel. It is a tool
to make things easier and transparent to the user.
Here font-family is rm, sf or tt (or newly defined ones, as explained below), and font-name
is the same as in fontspec and the like.
If no language is given, then it is considered the default font for the family, activated when
a language is selected.
On the other hand, if there is one or more languages in the optional argument, the font will
be assigned to them, overriding the default one. Alternatively, you may set a font for a
script – just precede its name (lowercase) with a star (eg, *devanagari). With this optional
argument, the font is not yet defined, but just predeclared. This means you may define as
many fonts as you want ‘just in case’, because if the language is never selected, the
corresponding \babelfont declaration is just ignored.
Babel takes care of the font language and the font script when languages are selected (as
well as the writing direction); see the recognized languages above. In most cases, you will
not need font-options, which is the same as in fontspec, but you may add further key/value
pairs if necessary.

EXAMPLE Usage in most cases is very simple. Let us assume you are setting up a document in
Swedish, with some words in Hebrew, with a font suited for both languages.

luatex/xetex
\documentclass{article}

\usepackage[swedish, bidi=default]{babel}

\babelprovide[import]{hebrew}

\babelfont{rm}{FreeSerif}

13
See also the package combofont for a complementary approach.

25
\begin{document}

Svenska \foreignlanguage{hebrew}{‫עבְִרית‬
ִ } svenska.

\end{document}

If on the other hand you have to resort to different fonts, you can replace the red line above with,
say:

luatex/xetex
\babelfont{rm}{Iwona}
\babelfont[hebrew]{rm}{FreeSerif}

\babelfont can be used to implicitly define a new font family. Just write its name instead
of rm, sf or tt. This is the preferred way to select fonts in addition to the three basic
families.

EXAMPLE Here is how to do it:

luatex/xetex
\babelfont{kai}{FandolKai}

Now, \kaifamily and \kaidefault, as well as \textkai are at your disposal.

NOTE You may load fontspec explicitly. For example:

luatex/xetex
\usepackage{fontspec}
\newfontscript{Devanagari}{deva}
\babelfont[hindi]{rm}{Shobhika}

This makes sure the OpenType script for Devanagari is deva and not dev2, in case it is not
detected correctly. You may also pass some options to fontspec: with silent, the warnings about
unavailable scripts or languages are not shown (they are only really useful when the document
format is being set up).

NOTE Directionality is a property affecting margins, indentation, column order, etc., not just text.
Therefore, it is under the direct control of the language, which applies both the script and the
direction to the text. As a consequence, there is no need to set Script when declaring a font with
\babelfont (nor Language). In fact, it is even discouraged.

NOTE \fontspec is not touched at all, only the preset font families (rm, sf, tt, and the like). If a
language is switched when an ad hoc font is active, or you select the font with this command,
neither the script nor the language is passed. You must add them by hand. This is by design, for
several reasons —for example, each font has its own set of features and a generic setting for
several of them can be problematic, and also preserving a “lower-level” font selection is useful.

NOTE The keys Language and Script just pass these values to the font, and do not set the script for
the language (and therefore the writing direction). In other words, the ini file or \babelprovide
provides default values for \babelfont if omitted, but the opposite is not true. See the note above
for the reasons of this behavior.

WARNING Using \setxxxxfont and \babelfont at the same time is discouraged, but very often
works as expected. However, be aware with \setxxxxfont the language system will not be set by
babel and should be set with fontspec if necessary.

TROUBLESHOOTING Package babel Info: The following fonts are not babel standard families.
This is not an error. babel assumes that if you are using \babelfont for a family, very likely you
want to define the rest of them. If you don’t, you can find some inconsistencies between families.

26
This checking is done at the beginning of the document, at a point where we cannot know which
families will be used.
Actually, there is no real need to use \babelfont in a monolingual document, if you set the
language system in \setmainfont (or not, depending on what you want).
As the message explains, there is nothing intrinsically wrong with not defining all the families. In
fact, there is nothing intrinsically wrong with not using \babelfont at all. But you must be aware
that this may lead to some problems.

NOTE \babelfont is a high level interface to fontspec, and therefore in xetex you can apply
Mappings. For example, there is a set of transliterations for Brahmic scripts by Davis M. Jones.
After installing them in you distribution, just set the map as you would do with fontspec.

1.15 Modifying a language


Modifying the behavior of a language (say, the chapter “caption”), is sometimes necessary,
but not always trivial. In the case of caption names a specific macro is provided, because
this is perhaps the most frequent change:

\setlocalecaption {hlanguage-namei}{hcaption-namei}{hstring i}
New 3.51 Here caption-name is the name as string without the trailing name. An example,
which also shows caption names are often a stylistic choice, is:

\setlocalecaption{english}{contents}{Table of Contents}

This works not only with existing caption names, because it also serves to define new ones
by setting the caption-name to the name of your choice (name will be postpended). Captions
so defined or redefined behave with the ‘new way’ described in the following note.

NOTE There are a few alternative methods:

• With data import’ed from ini files, you can modify the values of specific keys, like:

\babelprovide[import, captions/listtable = Lista de tablas]{spanish}

(In this particular case, instead of the captions group you may need to modify the
captions.licr one.)
• The ‘old way’, still valid for many languages, to redefine a caption is the following:

\addto\captionsenglish{%
\renewcommand\contentsname{Foo}%
}

As of 3.15, there is no need to hide spaces with % (babel removes them), but it is advisable to
do so. This redefinition is not activated until the language is selected.
• The ‘new way’, which is found in bulgarian, azerbaijani, spanish, french, turkish,
icelandic, vietnamese and a few more, as well as in languages created with \babelprovide
and its key import, is:

\renewcommand\spanishchaptername{Foo}

This redefinition is immediate.

NOTE Do not redefine a caption in the following way:

\AtBeginDocument{\renewcommand\contentsname{Foo}}

The changes may be discarded with a language selector, and the original value restored.

Macros to be run when a language is selected can be add to \extrashlang i:

27
\addto\extrasrussian{\mymacro}

There is a counterpart for code to be run when a language is unselected: \noextrashlang i.

NOTE These macros (\captionshlang i, \extrashlang i) may be redefined, but must not be used as
such – they just pass information to babel, which executes them in the proper context.

Another way to modify a language loaded as a package or class option is by means of


\babelprovide, described below in depth. So, something like:

\usepackage[danish]{babel}
\babelprovide[captions=da, hyphenrules=nohyphenation]{danish}

first loads danish.ldf, and then redefines the captions for danish (as provided by the ini
file) and prevents hyphenation. The rest of the language definitions are not touched.
Without the optional argument it just loads some aditional tools if provided by the ini file,
like extra counters.

1.16 Creating a language


New 3.10 And what if there is no style for your language or none fits your needs? You
may then define quickly a language with the help of the following macro in the preamble
(which may be used to modify an existing language, too, as explained in the previous
subsection).

\babelprovide [hoptionsi]{hlanguage-namei}
If the language hlanguage-namei has not been loaded as class or package option and there
are no hoptionsi, it creates an “empty” one with some defaults in its internal structure: the
hyphen rules, if not available, are set to the current ones, left and right hyphen mins are
set to 2 and 3. In either case, caption, date and language system are not defined.
If no ini file is imported with import, hlanguage-namei is still relevant because in such a
case the hyphenation and like breaking rules (including those for South East Asian and
CJK) are based on it as provided in the ini file corresponding to that name; the same
applies to OpenType language and script.
Conveniently, some options allow to fill the language, and babel warns you about what to
do if there is a missing string. Very likely you will find alerts like that in the log file:

Package babel Warning: \chaptername not set for 'mylang'. Please,


(babel) define it after the language has been loaded
(babel) (typically in the preamble) with:
(babel) \setlocalecaption{mylang}{chapter}{..}
(babel) Reported on input line 26.

In most cases, you will only need to define a few macros. Note languages loaded on the fly
are not yet available in the preamble.

EXAMPLE If you need a language named arhinish:

\usepackage[danish]{babel}
\babelprovide{arhinish}
\setlocalecaption{arhinish}{chapter}{Chapitula}
\setlocalecaption{arhinish}{refname}{Refirenke}
\renewcommand\arhinishhyphenmins{22}

EXAMPLE Locales with names based on BCP 47 codes can be created with something like:

28
\babelprovide[import=en-US]{enUS}

Note, however, mixing ways to identify locales can lead to problems. For example, is yi the name
of the language spoken by the Yi people or is it the code for Yiddish?

The main language is not changed (danish in this example). So, you must add
\selectlanguage{arhinish} or other selectors where necessary.
If the language has been loaded as an argument in \documentclass or \usepackage, then
\babelprovide redefines the requested data.

import= hlanguage-tag i
New 3.13 Imports data from an ini file, including captions and date (also line breaking
rules in newly defined languages). For example:

\babelprovide[import=hu]{hungarian}

Unicode engines load the UTF-8 variants, while 8-bit engines load the LICR (ie, with macros
like \' or \ss) ones.
New 3.23 It may be used without a value, and that is often the recommended option. In
such a case, the ini file set in the corresponding babel-<language>.tex (where
<language> is the last argument in \babelprovide) is imported. See the list of recognized
languages above. So, the previous example is best written as:

\babelprovide[import]{hungarian}

There are about 250 ini files, with data taken from the ldf files and the CLDR provided by
Unicode. Not all languages in the latter are complete, and therefore neither are the ini
files. A few languages may show a warning about the current lack of suitability of some
features.
Besides \today, this option defines an additional command for dates: \<language>date,
which takes three arguments, namely, year, month and day numbers. In fact, \today calls
\<language>today, which in turn calls
\<language>date{\the\year}{\the\month}{\the\day}. New 3.44 More convenient is
usually \localedate, with prints the date for the current locale.

captions= hlanguage-tag i
Loads only the strings. For example:

\babelprovide[captions=hu]{hungarian}

hyphenrules= hlanguage-listi
With this option, with a space-separated list of hyphenation rules, babel assigns to the
language the first valid hyphenation rules in the list. For example:

\babelprovide[hyphenrules=chavacano spanish italian]{chavacano}

If none of the listed hyphenrules exist, the default behavior applies. Note in this example
we set chavacano as first option – without it, it would select spanish even if chavacano
exists.
A special value is +, which allocates a new language (in the TEX sense). It only makes sense
as the last value (or the only one; the subsequent ones are silently ignored). It is mostly
useful with luatex, because you can add some patterns with \babelpatterns, as for
example:

29
\babelprovide[hyphenrules=+]{neo}
\babelpatterns[neo]{a1 e1 i1 o1 u1}

In other engines it just suppresses hyphenation (because the pattern list is empty).
New 3.58 Another special value is unhyphenated, which activates a line breking mode
that allows spaces to be stretched to arbitrary amounts.

main This valueless option makes the language the main one (thus overriding that set when
babel is loaded). Only in newly defined languages.

EXAMPLE Let’s assume your document (xetex or luatex) is mainly in Polytonic Greek with but with
some sections in Italian. Then, the first attempt should be:

\usepackage[italian, greek.polutonic]{babel}

But if, say, accents in Greek are not shown correctly, you can try

\usepackage[italian, polytonicgreek, provide=*]{babel}

Remerber there is an alternative syntax for the latter:

\usepackage[italian]{babel}
\babelprovide[import, main]{polytonicgreek}

Finally, also remember you might not need to load italian at all if there are only a few word in
this language (see 1.3).

script= hscript-namei
New 3.15 Sets the script name to be used by fontspec (eg, Devanagari). Overrides the
value in the ini file. If fontspec does not define it, then babel sets its tag to that provided
by the ini file. This value is particularly important because it sets the writing direction, so
you must use it if for some reason the default value is wrong.

language= hlanguage-namei
New 3.15 Sets the language name to be used by fontspec (eg, Hindi). Overrides the value
in the ini file. If fontspec does not define it, then babel sets its tag to that provided by the
ini file. Not so important, but sometimes still relevant.

alph= hcounter-namei
Assigns to \alph that counter. See the next section.

Alph= hcounter-namei
Same for \Alph.

A few options (only luatex) set some properties of the writing system used by the language.
These properties are always applied to the script, no matter which language is active.
Although somewhat inconsistent, this makes setting a language up easier in most typical
cases.

30
onchar= ids | fonts | letters
New 3.38 This option is much like an ‘event’ called when a character belonging to the
script of this locale is found (as its name implies, it acts on characters, not on spaces).
There are currently two ‘actions’, which can be used at the same time (separated by a
space): with ids the \language and the \localeid are set to the values of this locale; with
fonts, the fonts are changed to those of this locale (as set with \babelfont). Characters
can be added or modified with \babelcharproperty.
New 3.81 Option letters restricts the ‘actions’ to letters, in the TEX sense (i. e., with
catcode 11). Digits and punctuation are then considered part of current locale (as set by a
selector). This option is useful when the main script in non-Latin and there is a secondary
one whose script is Latin.

NOTE An alternative approach with luatex and Harfbuzz is the font option
RawFeature={multiscript=auto}. It does not switch the babel language and therefore the line
breaking rules, but in many cases it can be enough.

NOTE There is no general rule to set the font for a punctuation mark, because it is a semantic
decision and not a typographical one. Consider the following sentence: “‫ﯾﮏ‬, ‫دو‬, and ‫ ﺳﻪ‬are
Persian numbers”. In this case the punctuation font must be the English one, even if the commas
are surrounded by non-Latin letters. Quotation marks, parenthesis, etc., are even more complex.
Several criteria are possible, like the main language (the default in babel), the first letter in the
paragraph, or the surrounding letters, among others, but even so manual switching can be still
necessary.

intraspace= hbasei hshrink i hstretchi


Sets the interword space for the writing system of the language, in em units (so, 0 .1 0 is
0em plus .1em). Like \spaceskip, the em unit applied is that of the current text (more
precisely, the previous glyph). Currently used only in Southeast Asian scrips, like Thai, and
CJK.

intrapenalty= hpenaltyi
Sets the interword penalty for the writing system of this language. Currently used only in
Southeast Asian scrips, like Thai. Ignored if 0 (which is the default value).

transforms= htransform-listi
See section 1.21.

justification= kashida | elongated | unhyphenated | padding


New 3.59 There are currently three options, mainly for the Arabic script. It sets the
linebreaking and justification method, which can be based on the the arabic tatweel
character or in the ‘justification alternatives’ OpenType table (jalt). For an explanation
see the babel site.
New 3.81 The option padding has been devised primarily for Tibetan. It’s still somewhat
experimental. Again, there is an explanation in the babel site.

linebreaking= New 3.59 Just a synonymous for justification.

NOTE (1) If you need shorthands, you can define them with \useshorthands and \defineshorthand
as described above. (2) Captions and \today are “ensured” with \babelensure (this is the default
in ini-based languages).

1.17 Digits and counters


New 3.20 About thirty ini files define a field named digits.native. When it is present,
two macros are created: \<language>digits and \<language>counter (only xetex and
luatex). With the first, a string of ‘Latin’ digits are converted to the native digits of that

31
language; the second takes a counter name as argument. With the option maparabic in
\babelprovide, \arabic is redefined to produce the native digits (this is done globally, to
avoid inconsistencies in, for example, page numbering, and note as well dates do not rely
on \arabic.)
For example:

\babelprovide[import]{telugu}
% Or also, if you want:
% \babelprovide[import, maparabic]{telugu}
\babelfont{rm}{Gautami} % With luatex, better with Harfbuzz
\begin{document}
\telugudigits{1234}
\telugucounter{section}
\end{document}

Languages providing native digits in all or some variants are:

Arabic Persian Lao Odia Urdu


Assamese Gujarati Northern Luri Punjabi Uzbek
Bangla Hindi Malayalam Pashto Vai
Tibetar Khmer Marathi Tamil Cantonese
Bodo Kannada Burmese Telugu Chinese
Central Kurdish Konkani Mazanderani Thai
Dzongkha Kashmiri Nepali Uyghur

New 3.30 With luatex there is an alternative approach for mapping digits, namely,
mapdigits. Conversion is based on the language and it is applied to the typeset text (not
math, PDF bookmarks, etc.) before bidi and fonts are processed (ie, to the node list as
generated by the TEX code). This means the local digits have the correct bidirectional
behavior (unlike Numbers=Arabic in fontspec, which is not recommended).

NOTE With xetex you can use the option Mapping when defining a font.

\localenumeral {hstylei}{hnumberi}
\localecounterl {hstylei}{hcounteri}
New 3.41 Many ‘ini‘ locale files has been extended with information about
non-positional numerical systems, based on those predefined in CSS. They only work with
xetex and luatex and are fully expendable (even inside an unprotected \edef). Currently,
they are limited to numbers below 10000.
There are several ways to use them (for the availabe styles in each language, see the list
below):

• \localenumeral{hstylei}{hnumberi}, like \localenumeral{abjad}{15}

• \localecounter{hstylei}{hcounteri}, like \localecounter{lower}{section}

• In \babelprovide, as an argument to the keys alph and Alph, which redefine what
\alph and \Alph print. For example:

\babelprovide[alph=alphabetic]{thai}

The styles are:

Ancient Greek lower.ancient, upper.ancient


Amharic afar, agaw, ari, blin, dizi, gedeo, gumuz, hadiyya, harari, kaffa, kebena,
kembata, konso, kunama, meen, oromo, saho, sidama, silti, tigre, wolaita, yemsa
Arabic abjad, maghrebi.abjad

32
Armenian lower.letter, upper.letter
Belarusan, Bulgarian, Church Slavic, Macedonian, Serbian lower, upper
Bangla alphabetic
Central Kurdish alphabetic
Chinese cjk-earthly-branch, cjk-heavenly-stem, circled.ideograph,
parenthesized.ideograph, fullwidth.lower.alpha, fullwidth.upper.alpha
Church Slavic (Glagolitic) letters
Coptic epact, lower.letters
French date.day (mainly for internal use).
Georgian letters
Greek lower.modern, upper.modern, lower.ancient, upper.ancient (all with keraia)
Hebrew letters (neither geresh nor gershayim yet)
Hindi alphabetic
Italian lower.legal, upper.legal
Japanese hiragana, hiragana.iroha, katakana, katakana.iroha, circled.katakana,
informal, formal, cjk-earthly-branch, cjk-heavenly-stem, circled.ideograph,
parenthesized.ideograph, fullwidth.lower.alpha, fullwidth.upper.alpha
Khmer consonant
Korean consonant, syllabe, hanja.informal, hanja.formal, hangul.formal,
cjk-earthly-branch, cjk-heavenly-stem, circled.ideograph,
parenthesized.ideograph, fullwidth.lower.alpha, fullwidth.upper.alpha
Marathi alphabetic
Persian abjad, alphabetic
Russian lower, lower.full, upper, upper.full
Syriac letters
Tamil ancient
Thai alphabetic
Ukrainian lower , lower.full, upper , upper.full

New 3.45 In addition, native digits (in languages defining them) may be printed with the
numeral style digits.

1.18 Dates
New 3.45 When the data is taken from an ini file, you may print the date corresponding
to the Gregorian calendar and other lunisolar systems with the following command.

\localedate [hcalendar=.., variant=.., converti]{hyeari}{hmonthi}{hdayi}


By default the calendar is the Gregorian, but an ini file may define strings for other
calendars (currently ar, ar-*, he, fa, hi). In the latter case, the three arguments are the
year, the month, and the day in those in the corresponding calendar. They are not the
Gregorian data to be converted (which means, say, 13 is a valid month number with
calendar=hebrew and calendar=coptic). However, with the option convert it’s
converted (using internally the following command).
Even with a certain calendar there may be variants. In Kurmanji the default variant prints
something like 30. Çileya Pêşîn 2019, but with variant=izafa it prints 31’ê Çileya Pêşînê
2019.

\babelcalendar [hdatei]{hcalendari}{hyear-macroi}hmonth-macroihday-macroi
New 3.76 Although calendars aren’t the primary concern of babel, the package should be
able to, at least, generate correctly the current date in the way users would expect in their
own culture. Currently, \localedate can print dates in a few calendars (provided the ini
locale file has been imported), but year, month and day had to be entered by hand, which
is very inconvenient. With this macro, the current date is converted and stored in the
three last arguments, which must be macros: allowed calendars are buddhist, coptic,
hebrew, islamic-civil, islamic-umalqura, persian. The optional argument converts the

33
given date, in the form ‘hyeari-hmonthi-hdayi’. Please, refer to the page on the news for
3.76 in the babel site for further details.

1.19 Accessing language info

\languagename The control sequence \languagename contains the name of the current language.

WARNING Due to some internal inconsistencies in catcodes, it should not be used to test its value.
Use iflang, by Heiko Oberdiek.

\iflanguage {hlanguagei}{htruei}{hfalsei}
If more than one language is used, it might be necessary to know which language is active
at a specific time. This can be checked by a call to \iflanguage, but note here “language” is
used in the TEX sense, as a set of hyphenation patterns, and not as its babel name. This
macro takes three arguments. The first argument is the name of a language; the second and
third arguments are the actions to take if the result of the test is true or false respectively.

\localeinfo * {hfieldi}

New 3.38 If an ini file has been loaded for the current language, you may access the
information stored in it. This macro is fully expandable, and the available fields are:

name.english as provided by the Unicode CLDR.


tag.ini is the tag of the ini file (the way this file is identified in its name).
tag.bcp47 is the full BCP 47 tag (see the warning below). This is the value to be used for
the ‘real’ provided tag (babel may fill other fields if they are considered necessary).
language.tag.bcp47 is the BCP 47 language tag.
tag.opentype is the tag used by OpenType (usually, but not always, the same as BCP 47).
script.name , as provided by the Unicode CLDR.
script.tag.bcp47 is the BCP 47 tag of the script used by this locale. This is a required
field for the fonts to be correctly set up, and therefore it should be always defined.
script.tag.opentype is the tag used by OpenType (usually, but not always, the same as
BCP 47).
region.tag.bcp47 is the BCP 47 tag of the region or territory. Defined only if the locale
loaded actually contains it (eg, es-MX does, but es doesn’t), which is how locales behave
in the CLDR. New 3.75
variant.tag.bcp47 is the BCP 47 tag of the variant (in the BCP 47 sense, like 1901 for
German). New 3.75
extension.hsi.tag.bcp47 is the BCP 47 value of the extension whose singleton is hsi
(currently the recognized singletons are x, t and u). The internal syntax can be
somewhat complex, and this feature is still somewhat tentative. An example is
classiclatin which sets extension.x.tag.bcp47 to classic. New 3.75

WARNING New 3.46 As of version 3.46 tag.bcp47 returns the full BCP 47 tag. Formerly it returned
just the language subtag, which was clearly counterintuitive.

New 3.75 Sometimes, it comes in handy to be able to use \localeinfo in an expandable


way even if something went wrong (for example, the locale currently active is undefined).
For these cases, localeinfo* just returns an empty string instead of raising an error. Bear
in mind that babel, following the CLDR, may leave the region unset, which means
\getlanguageproperty*, described below, is the preferred command, so that the
existence of a field can be checked before. This also means building a string with the
language and the region with \localeinfo*{language.tab.bcp47}-
\localeinfo*{region.tab.bcp47} is not usually a good idea (because of the hyphen).

34
\getlocaleproperty * {hmacroi}{hlocalei}{hpropertyi}

New 3.42 The value of any locale property as set by the ini files (or added/modified with
\babelprovide) can be retrieved and stored in a macro with this command. For example,
after:

\getlocaleproperty\hechap{hebrew}{captions/chapter}

the macro \hechap will contain the string ‫פרק‬.


If the key does not exist, the macro is set to \relax and an error is raised. New 3.47 With
the starred version no error is raised, so that you can take your own actions with
undefined properties.

\localeid Each language in the babel sense has its own unique numeric identifier, which can be
retrieved with \localeid.
The \localeid is not the same as the \language identifier, which refers to a set of
hyphenation patters (which, in turn, is just a component of the line breaking algorithm
described in the next section). The data about preloaded patterns are store in an internal
macro named \bbl@languages (see the code for further details), but note several locales
may share a single \language, so they are separated concepts. In luatex, the \localeid is
saved in each node (when it makes sense) as an attribute, too.

\LocaleForEach {hcodei}
Babel remembers which ini files have been loaded. There is a loop named
\LocaleForEach to traverse the list, where #1 is the name of the current item, so that
\LocaleForEach{\message{ **#1** }} just shows the loaded ini’s.

ensureinfo=off New 3.75 Previously, ini files were loaded only with \babelprovide and also when
languages are selected if there is a \babelfont or they have not been explicitly declared.
Now the ini files are loaded (and therefore the corresponding data) even if these two
conditions are not met (in previous versions you had to enable it with \BabelEnsureInfo
in the preamble). Because of the way this feature works, problems are very unlikely, but
there is switch as a package option to turn the new behavior off (ensureinfo=off).

1.20 Hyphenation and line breaking


Babel deals with three kinds of line breaking rules: Western, typically the LGC group,
South East Asian, like Thai, and CJK, but support depends on the engine: pdftex only deals
with the former, xetex also with the second one (although in a limited way), while luatex
provides basic rules for the latter, too. With luatex there are also tools for non-standard
hyphenation rules, explained in the next section.

\babelhyphen * {htypei}
\babelhyphen * {htexti}

New 3.9a It is customary to classify hyphens in two types: (1) explicit or hard hyphens,
which in TEX are entered as -, and (2) optional or soft hyphens, which are entered as \-.
Strictly, a soft hyphen is not a hyphen, but just a breaking opportunity or, in TEX terms, a
“discretionary”; a hard hyphen is a hyphen with a breaking opportunity after it. A further
type is a non-breaking hyphen, a hyphen without a breaking opportunity.
In TEX, - and \- forbid further breaking opportunities in the word. This is the desired
behavior very often, but not always, and therefore many languages provide shorthands for
these cases. Unfortunately, this has not been done consistently: for example, "- in Dutch,
Portuguese, Catalan or Danish is a hard hyphen, while in German, Spanish, Norwegian,
Slovak or Russian is a soft hyphen. Furthermore, some of them even redefine \-, so that
you cannot insert a soft hyphen without breaking opportunities in the rest of the word.
Therefore, some macros are provided with a set of basic “hyphens” which can be used by
themselves, to define a user shorthand, or even in language files.

35
• \babelhyphen{soft} and \babelhyphen{hard} are self explanatory.

• \babelhyphen{repeat} inserts a hard hyphen which is repeated at the beginning of the


next line, as done in languages like Polish, Portuguese and Spanish.

• \babelhyphen{nobreak} inserts a hard hyphen without a break after it (even if a space


follows).

• \babelhyphen{empty} inserts a break opportunity without a hyphen at all.

• \babelhyphen{htexti} is a hard “hyphen” using htexti instead. A typical case is


\babelhyphen{/}.

With all of them, hyphenation in the rest of the word is enabled. If you don’t want to
enable it, there is a starred counterpart: \babelhyphen*{soft} (which in most cases is
equivalent to the original \-), \babelhyphen*{hard}, etc.
Note hard is also good for isolated prefixes (eg, anti-) and nobreak for isolated suffixes (eg,
-ism), but in both cases \babelhyphen*{nobreak} is usually better.
There are also some differences with LATEX: (1) the character used is that set for the current
font, while in LATEX it is hardwired to - (a typical value); (2) the hyphen to be used in fonts
with a negative \hyphenchar is -, like in LATEX, but it can be changed to another value by
redefining \babelnullhyphen; (3) a break after the hyphen is forbidden if preceded by a
glue >0 pt (at the beginning of a word, provided it is not immediately preceded by, say, a
parenthesis).

\babelhyphenation [hlanguagei,hlanguagei,...]{hexceptionsi}
New 3.9a Sets hyphenation exceptions for the languages given or, without the optional
argument, for all languages (eg, proper nouns or common loan words, and of course
monolingual documents). Multiple declarations work much like \hyphenation (last wins),
but language exceptions take precedence over global ones.
It can be used only in the preamble, and exceptions are set when the language is first
selected, thus taking into account changes of \lccodes’s done in \extrashlang i as well as
the language-specific encoding (not set in the preamble by default). Multiple
\babelhyphenation’s are allowed. For example:

\babelhyphenation{Wal-hal-la Dar-bhan-ga}

Listed words are saved expanded and therefore it relies on the LICR. Of course, it also
works without the LICR if the input and the font encodings are the same, like in Unicode
based engines.

NOTE Using \babelhyphenation with Southeast Asian scripts is mostly pointless. But with
\babelpatterns (below) you may fine-tune line breaking (only luatex). Even if there are no
patterns for the language, you can add at least some typical cases.

NOTE Use \babelhyphenation instead of \hyphenation to set hyphenation exceptions in the


preamble before any language is explicitly set with a selector. In the preamble the hyphenation
rules are not always fully set up and an error can be raised.

\begin{hyphenrules} {hlanguagei} … \end{hyphenrules}


The environment hyphenrules can be used to select only the hyphenation rules to be used
(it can be used as command, too). This can for instance be used to select ‘nohyphenation’,
provided that in language.dat the ‘language’ nohyphenation is defined by loading
zerohyph.tex. It deactivates language shorthands, too (but not user shorthands).
Except for these simple uses, hyphenrules is deprecated and otherlanguage* (the starred
version) is preferred, because the former does not take into account possible changes in
encodings of characters like, say, ' done by some languages (eg, italian, french, ukraineb).

36
\babelpatterns [hlanguagei,hlanguagei,...]{hpatternsi}
New 3.9m In luatex only,14 adds or replaces patterns for the languages given or, without
the optional argument, for all languages. If a pattern for a certain combination already
exists, it gets replaced by the new one.
It can be used only in the preamble, and patterns are added when the language is first
selected, thus taking into account changes of \lccodes’s done in \extrashlang i as well as
the language-specific encoding (not set in the preamble by default). Multiple
\babelpatterns’s are allowed.
Listed patterns are saved expanded and therefore it relies on the LICR. Of course, it also
works without the LICR if the input and the font encodings are the same, like in Unicode
based engines.
New 3.31 (Only luatex.) With \babelprovide and imported CJK languages, a simple
generic line breaking algorithm (push-out-first) is applied, based on a selection of the
Unicode rules ( New 3.32 it is disabled in verbatim mode, or more precisely when the
hyphenrules are set to nohyphenation). It can be activated alternatively by setting
explicitly the intraspace.
New 3.27 Interword spacing for Thai, Lao and Khemer is activated automatically if a
language with one of those scripts are loaded with \babelprovide. See the sample on the
babel repository. With both Unicode engines, spacing is based on the “current” em unit (the
size of the previous char in luatex, and the font size set by the last \selectfont in xetex).

1.21 Transforms
Transforms (only luatex) provide a way to process the text on the typesetting level in
several language-dependent ways, like non-standard hyphenation, special line breaking
rules, script to script conversion, spacing conventions and so on.15
It currently embraces \babelprehyphenation and \babelposthyphenation.
New 3.57 Several ini files predefine some transforms. They are activated with the key
transforms in \babelprovide, either if the locale is being defined with this macro or the
languages has been previouly loaded as a class or package option, as the following
example illustrates:

\usepackage[magyar]{babel}
\babelprovide[transforms = digraphs.hyphen]{magyar}

New 3.67 Transforms predefined in the ini locale files can be made attribute-dependent,
too. When an attribute between parenthesis is inserted subsequent transforms will be
assigned to it (up to the list end or another attribute). For example, and provided an
attribute called \withsigmafinal has been declared:

transforms = transliteration.omega (\withsigmafinal) sigma.final

This applies transliteration.omega always, but sigma.final only when


\withsigmafinal is set.
Here are the transforms currently predefined. (A few may still require some fine-tuning.
More to follow in future releases.)

Arabic transliteration.dad Applies the transliteration system devised by


Yannis Haralambous for dad (simple and TEX-
friendly). Not yet complete, but sufficient for
most texts.
14
With luatex exceptions and patterns can be modified almost freely. However, this is very likely a task for a
separate package and babel only provides the most basic tools.
15
They are similar in concept, but not the same, as those in Unicode. The main inspiration for this feature is the
Omega transformation processes.

37
Croatian digraphs.ligatures Ligatures DŽ, Dž, dž, LJ, Lj, lj, NJ, Nj, nj. It
assumes they exist. This is not the recom-
mended way to make these transformations
(the best way is with OTF features), but it can
get you out of a hurry.
Czech, Polish, hyphen.repeat Explicit hyphens behave like \babelhyphen
Portuguese, {repeat}.
Slovak,
Spanish
Czech, Polish, oneletter.nobreak Converts a space after a non-syllabic prepo-
Slovak sition or conjunction into a non-breaking
space.
Finnish prehyphen.nobreak Line breaks just after hyphens prepended to
words are prevented, like in “pakastekaapit
ja -arkut”.
Greek diaeresis.hyphen Removes the diaeresis above iota and upsilon
if hyphenated just before. It works with the
three variants.
Greek transliteration.omega Although the provided combinations are not
the full set, this transform follows the syn-
tax of Omega: = for the circumflex, v for
digamma, and so on. For better compatibility
with Levy’s system, ~ (as ‘string’) is an alter-
native to =. ' is tonos in Monotonic Greek, but
oxia in Polytonic and Ancient Greek.
Greek sigma.final The transliteration system above does not
convert the sigma at the end of a word (on
purpose). This transforms does it. To pre-
vent the conversion (an abbreviation, for ex-
ample), write "s.
Hindi, Sanskrit transliteration.hk The Harvard-Kyoto system to romanize De-
vanagari.
Hindi, Sanskrit punctuation.space Inserts a space before the following four
characters: !?:; .
Hungarian digraphs.hyphen Hyphenates the long digraphs ccs, ddz, ggy,
lly, nny, ssz, tty and zzs as cs-cs, dz-dz, etc.
Indic scripts danda.nobreak Prevents a line break before a danda or dou-
ble danda if there is a space. For Assamese,
Bengali, Gujarati, Hindi, Kannada, Malay-
alam, Marathi, Oriya, Tamil, Telugu.
Latin digraphs.ligatures Replaces the groups ae, AE, oe, OE with æ, Æ,
œ, Œ.
Latin letters.noj Replaces j, J with i, I.
Latin letters.uv Replaces v, U with u, V.
Sanskrit transliteration.iast The IAST system to romanize Devanagari.16
Serbian transliteration.gajica (Note serbian with ini files refers to the
Cyrillic script, which is here the target.) The
standard system devised by Ljudevit Gaj.
Arabic, kashida.plain Experimental. A very simple and basic trans-
Persian form for ‘plain’ Arabic fonts, which attempts
to distribute the tatwil as evenly as possible
(starting at the end of the line). See the news
for version 3.59.

38
\babelposthyphenation [hoptionsi]{hhyphenrules-namei}{hlua-patterni}{hreplacementi}
New 3.37-3.39 With luatex it is possible to define non-standard hyphenation rules, like
f-f → ff-f, repeated hyphens, ranked ruled (or more precisely, ‘penalized’ hyphenation
points), and so on. A few rules are currently provided (see above), but they can be defined
as shown in the following example, where {1} is the first captured char (between () in the
pattern):

\babelposthyphenation{german}{([fmtrp]) | {1}}
{
{ no = {1}, pre = {1}{1}- }, % Replace first char with disc
remove, % Remove automatic disc (2nd node)
{} % Keep last char, untouched
}

In the replacements, a captured char may be mapped to another, too. For example, if the
first capture reads ([ΐΰ]), the replacement could be {1|ΐΰ|ίύ}, which maps ΐ to ί, and ΰ
to ύ, so that the diaeresis is removed.
This feature is activated with the first \babelposthyphenation or \babelprehyphenation.
New 3.67 With the optional argument you can associate a user defined transform to an
attribute, so that it’s active only when it’s set (currently its attribute value is ignored). With
this mechanism transforms can be set or unset even in the middle of paragraphs, and
applied to single words. To define, set and unset the attribute, the LaTeX kernel provides
the macros \newattribute, \setattribute and \unsetattribute. The following example
shows how to use it, provided an attribute named \latinnoj has been declared:

\babelprehyphenation[attribute=\latinnoj]{latin}{ J }{ string = I }

See the babel site for a more detailed description and some examples. It also describes a
few additional replacement types (string, penalty).
Although the main purpose of this command is non-standard hyphenation, it may actually
be used for other transformations (after hyphenation is applied, so you must take
discretionaries into account).
You are limited to substitutions as done by lua, although a future implementation may
alternatively accept lpeg.

\babelprehyphenation [hoptionsi]{hlocale-namei}{hlua-patterni}{hreplacementi}
New 3.44-3-52 It is similar to the latter, but (as its name implies) applied before
hyphenation, which is particularly useful in transliterations. There are other differences:
(1) the first argument is the locale instead of the name of the hyphenation patterns; (2) in
the search patterns = has no special meaning, while | stands for an ordinary space; (3) in
the replacement, discretionaries are not accepted.
See the description above for the optional argument.
This feature is activated with the first \babelposthyphenation or \babelprehyphenation.

EXAMPLE You can replace a character (or series of them) by another character (or series of them).
Thus, to enter ž as zh and š as sh in a newly created locale for transliterated Russian:

\babelprovide[hyphenrules=+]{russian-latin} % Create locale


\babelprehyphenation{russian-latin}{([sz])h} % Create rule
{
string = {1|sz|šž},
remove
}

EXAMPLE The following rule prevent the word “a” from being at the end of a line:

39
\babelprehyphenation{english}{|a|}
{}, {}, % Keep first space and a
{ insert, penalty = 10000 }, % Insert penalty
{} % Keep last space
}

NOTE With luatex there is another approach to make text transformations, with the function
fonts.handlers.otf.addfeature, which adds new features to an OTF font (substitution and
positioning). These features can be made language-dependent, and babel by default recognizes
this setting if the font has been declared with \babelfont. The transforms mechanism
supplements rather than replaces OTF features.
With xetex, where transforms are not available, there is still another approach, with font
mappings, mainly meant to perform encoding conversions and transliterations. Mappings,
however, are linked to fonts, not to languages.

1.22 Selection based on BCP 47 tags


New 3.43 The recommended way to select languages is that described at the beginning of
this document. However, BCP 47 tags are becoming customary, particularly in documents
(or parts of documents) generated by external sources, and therefore babel will provide a
set of tools to select the locales in different situations, adapted to the particular needs of
each case. Currently, babel provides autoloading of locales as described in this section. In
these contexts autoloading is particularly important because we may not know on
beforehand which languages will be requested.
It must be activated explicitly, because it is primarily meant for special tasks. Mapping
from BCP 47 codes to locale names are not hardcoded in babel. Instead the data is taken
from the ini files, which means currently about 250 tags are already recognized. Babel
performs a simple lookup in the following way: fr-Latn-FR → fr-Latn → fr-FR → fr.
Languages with the same resolved name are considered the same. Case is normalized
before, so that fr-latn-fr → fr-Latn-FR. If a tag and a name overlap, the tag takes
precedence.
Here is a minimal example:

\documentclass{article}

\usepackage[danish]{babel}

\babeladjust{
autoload.bcp47 = on,
autoload.bcp47.options = import
}

\begin{document}

Chapter in Danish: \chaptername.

\selectlanguage{de-AT}

\localedate{2020}{1}{30}

\end{document}

Currently the locales loaded are based on the ini files and decoupled from the main ldf
files. This is by design, to ensure code generated externally produces the same result
regardless of the languages requested in the document, but an option to use the ldf
instead will be added in a future release, because both options make sense depending on
the particular needs of each document (there will be some restrictions, however).
The behaviour is adjusted with \babeladjust with the following parameters:

40
autoload.bcp47 with values on and off.

autoload.bcp47.options, which are passed to \babelprovide; empty by default, but


you may add import (features defined in the corresponding babel-...tex file might
not be available).

autoload.bcp47.prefix. Although the public name used in selectors is the tag, the
internal name will be different and generated by prepending a prefix, which by default
is bcp47-. You may change it with this key.

New 3.46 If an ldf file has been loaded, you can enable the corresponding language tags
as selector names with:

\babeladjust{ bcp47.toname = on }

(You can deactivate it with off.) So, if dutch is one of the package (or class) options, you
can write \selectlanguage{nl}. Note the language name does not change (in this
example is still dutch), but you can get it with \localeinfo or \getlanguageproperty. It
must be turned on explicitly for similar reasons to those explained above.

1.23 Selecting scripts


Currently babel provides no standard interface to select scripts, because they are best
selected with either \fontencoding (low-level) or a language name (high-level). Even the
Latin script may require different encodings (ie, sets of glyphs) depending on the language,
and therefore such a switch would be in a sense incomplete.17
Some languages sharing the same script define macros to switch it (eg, \textcyrillic),
but be aware they may also set the language to a certain default. Even the babel core
defined \textlatin, but is was somewhat buggy because in some cases it messed up
encodings and fonts (for example, if the main Latin encoding was LY1), and therefore it has
been deprecated.18

\ensureascii {htexti}

New 3.9i This macro makes sure htexti is typeset with a LICR-savvy encoding in the ASCII
range. It is used to redefine \TeX and \LaTeX so that they are correctly typeset even with
LGR or X2 (the complete list is stored in \BabelNonASCII, which by default is LGR, X2, OT2,
OT3, OT6, LHE, LWN, LMA, LMC, LMS, LMU, but you can modify it). So, in some sense it fixes the
bug described in the previous paragraph.
If non-ASCII encodings are not loaded (or no encoding at all), it is no-op (also \TeX and
\LaTeX are not redefined); otherwise, \ensureascii switches to the encoding at the
beginning of the document if ASCII-savvy, or else the last ASCII-savvy encoding loaded. For
example, if you load LY1,LGR, then it is set to LY1, but if you load LY1,T2A it is set to T2A.
The symbol encodings TS1, T3, and TS3 are not taken into account, since they are not used
for “ordinary” text (they are stored in \BabelNonText, used in some special cases when no
Latin encoding is explicitly set).
The foregoing rules (which are applied “at begin document”) cover most of the cases. No
assumption is made on characters above 127, which may not follow the LICR conventions –
the goal is just to ensure most of the ASCII letters and symbols are the right ones.

1.24 Selecting directions


No macros to select the writing direction are provided, either – writing direction is
intrinsic to each script and therefore it is best set by the language (which can be a dummy
one). Furthermore, there are in fact two right-to-left modes, depending on the language,
17
The so-called Unicode fonts do not improve the situation either. So, a font suited for Vietnamese is not neces-
sarily suited for, say, the romanization of Indic languages, and the fact it contains glyphs for Modern Greek does
not mean it includes them for Classic Greek.
18
But still defined for backwards compatibility.

41
which differ in the way ‘weak’ numeric characters are ordered (eg, Arabic %123 vs Hebrew
123%).

WARNING The current code for text in luatex should be considered essentially stable, but, of course,
it is not bug-free and there can be improvements in the future, because setting bidi text has many
subtleties (see for example <https://www.w3.org/TR/html-bidi/>). A basic stable version for other
engines must wait. This applies to text; there is a basic support for graphical elements, including
the picture environment (with pict2e) and pfg/tikz. Also, indexes and the like are under study, as
well as math (there are progresses in the latter, including amsmath and mathtools too, but for
example gathered may fail).
An effort is being made to avoid incompatibilities in the future (this one of the reason currently
bidi must be explicitly requested as a package option, with a certain bidi model, and also the
layout options described below).

WARNING If characters to be mirrored are shown without changes with luatex, try with the
following line:

\babeladjust{bidi.mirroring=off}

There are some package options controlling bidi writing.

bidi= default | basic | basic-r | bidi-l | bidi-r


New 3.14 Selects the bidi algorithm to be used. With default the bidi mechanism is just
activated (by default it is not), but every change must be marked up. In xetex and pdftex
this is the only option.
In luatex, basic-r provides a simple and fast method for R text, which handles numbers
and unmarked L text within an R context many in typical cases. New 3.19 Finally, basic
supports both L and R text, and it is the preferred method (support for basic-r is
currently limited). (They are named basic mainly because they only consider the intrinsic
direction of scripts and weak directionality.)
New 3.29 In xetex, bidi-r and bidi-l resort to the package bidi (by Vafa Khalighi).
Integration is still somewhat tentative, but it mostly works. For RL documents use the
former, and for LR ones use the latter.
There are samples on GitHub, under /required/babel/samples. See particularly
lua-bidibasic.tex and lua-secenum.tex.

EXAMPLE The following text comes from the Arabic Wikipedia (article about Arabia). Copy-pasting
some text from the Wikipedia is a good way to test this feature. Remember basic is available in
luatex only.

\documentclass{article}

\usepackage[bidi=basic]{babel}

\babelprovide[import, main]{arabic}

\babelfont{rm}{FreeSerif}

\begin{document}

‫وﻗﺪ ﻋﺮﻓﺖ ﺷﺒﻪ ﺟﺰﻳﺮة اﻟﻌﺮب ﻃﻴﻠﺔ اﻟﻌﺼﺮ اﻟﻬﻴﻠﻴﻨﻲ )اﻻﻏﺮﻳﻘﻲ( ﺑـ‬
‫ اﺳﺘﺨﺪم اﻟﺮوﻣﺎن ﺛﻼث‬،(Αραβία ‫ )ﺑﺎﻻﻏﺮﻳﻘﻴﺔ‬Aravia ‫ أو‬Arabia
‫ إﻻ أﻧﻬﺎ‬،‫“ ﻋﻠﻰ ﺛﻼث ﻣﻨﺎﻃﻖ ﻣﻦ ﺷﺒﻪ اﻟﺠﺰﻳﺮة اﻟﻌﺮﺑﻴﺔ‬Arabia”‫ﺑﺎدﺋﺎت ﺑـ‬
.‫ﺣﻘﻴﻘﺔً ﻛﺎﻧﺖ أﻛﺒﺮ ﻣﻤﺎ ﺗﻌﺮف ﻋﻠﻴﻪ اﻟﻴﻮم‬

\end{document}

EXAMPLE With bidi=basic both L and R text can be mixed without explicit markup (the latter will
be only necessary in some special cases where the Unicode algorithm fails). It is used much like

42
bidi=basic-r, but with R text inside L text you may want to map the font so that the correct
features are in force. This is accomplished with an option in \babelprovide, as illustrated:

\documentclass{book}

\usepackage[english, bidi=basic]{babel}

\babelprovide[onchar=ids fonts]{arabic}

\babelfont{rm}{Crimson}
\babelfont[*arabic]{rm}{FreeSerif}

\begin{document}

Most Arabic speakers consider the two varieties to be two registers


of one language, although the two registers can be referred to in
Arabic as ‫\ ﻓﺼﺤﻰ اﻟﻌﺼﺮ‬textit{fuṣḥā l-ʻaṣr} (MSA) and
‫\ ﻓﺼﺤﻰ اﻟﺘﺮاث‬textit{fuṣḥā t-turāth} (CA).

\end{document}

In this example, and thanks to onchar=ids fonts, any Arabic letter (because the language is
arabic) changes its font to that set for this language (here defined via *arabic, because Crimson
does not provide Arabic letters).

NOTE Boxes are “black boxes”. Numbers inside an \hbox (for example in a \ref) do not know
anything about the surrounding chars. So, \ref{A}-\ref{B} are not rendered in the visual order
A-B, but in the wrong one B-A (because the hyphen does not “see” the digits inside the \hbox’es).
If you need \ref ranges, the best option is to define a dedicated macro like this (to avoid explicit
direction changes in the body; here \texthe must be defined to select the main language):

\newcommand\refrange[2]{\babelsublr{\texthe{\ref{#1}}-\texthe{\ref{#2}}}}

In the future a more complete method, reading recursively boxed text, may be added.

layout= sectioning | counters | lists | contents | footnotes | captions | columns | graphics |


extras
New 3.16 To be expanded. Selects which layout elements are adapted in bidi documents,
including some text elements (except with options loading the bidi package, which
provides its own mechanism to control these elements). You may use several options with
a dot-separated list (eg, layout=counters.contents.sectioning). This list will be
expanded in future releases. Note not all options are required by all engines.

sectioning makes sure the sectioning macros are typeset in the main language, but with
the title text in the current language (see below \BabelPatchSection for further
details).
counters required in all engines (except luatex with bidi=basic) to reorder section
numbers and the like (eg, hsubsectioni.hsectioni); required in xetex and pdftex for
counters in general, as well as in luatex with bidi=default; required in luatex for
numeric footnote marks >9 with bidi=basic-r (but not with bidi=basic); note,
however, it can depend on the counter format.
With counters, \arabic is not only considered L text always (with \babelsublr, see
below), but also an “isolated” block which does not interact with the surrounding chars.
So, while 1.2 in R text is rendered in that order with bidi=basic (as a decimal
number), in \arabic{c1}.\arabic{c2} the visual order is c2.c1. Of course, you may
always adjust the order by changing the language, if necessary.19
19
Next on the roadmap are counters and numeral systems in general. Expect some minor readjustments.

43
lists required in xetex and pdftex, but only in bidirectional (with both R and L
paragraphs) documents in luatex.

WARNING As of April 2019 there is a bug with \parshape in luatex (a TEX primitive) which
makes lists to be horizontally misplaced if they are inside a \vbox (like minipage) and the
current direction is different from the main one. A workaround is to restore the main
language before the box and then set the local one inside.

contents required in xetex and pdftex; in luatex toc entries are R by default if the main
language is R.
columns required in xetex and pdftex to reverse the column order (currently only the
standard two-column mode); in luatex they are R by default if the main language is R
(including multicol).
footnotes not required in monolingual documents, but it may be useful in bidirectional
documents (with both R and L paragraphs) in all engines; you may use alternatively
\BabelFootnote described below (what this option does exactly is also explained
there).
captions is similar to sectioning, but for \caption; not required in monolingual
documents with luatex, but may be required in xetex and pdftex in some styles (support
for the latter two engines is still experimental) New 3.18 .
tabular required in luatex for R tabular, so that the first column is the right one (it has
been tested only with simple tables, so expect some readjustments in the future);
ignored in pdftex or xetex (which will not support a similar option in the short term). It
patches an internal command, so it might be ignored by some packages and classes (or
even raise an error). New 3.18 .
graphics modifies the picture environment so that the whole figure is L but the text is R.
It does not work with the standard picture, and pict2e is required. It attempts to do the
same for pgf/tikz. Somewhat experimental. New 3.32 .
extras is used for miscellaneous readjustments which do not fit into the previous groups.
Currently redefines in luatex \underline and \LaTeX2e New 3.19 .

EXAMPLE Typically, in an Arabic document you would need:

\usepackage[bidi=basic,
layout=counters.tabular]{babel}

\babelsublr {hlr-texti}
Digits in pdftex must be marked up explicitly (unlike luatex with bidi=basic or
bidi=basic-r and, usually, xetex). This command is provided to set {hlr-texti} in L mode
if necessary. It’s intended for what Unicode calls weak characters, because words are best
set with the corresponding language. For this reason, there is no rl counterpart.
Any \babelsublr in explicit L mode is ignored. However, with bidi=basic and implicit L,
it first returns to R and then switches to explicit L. To clarify this point, consider, in an R
context:

RTL A ltr text \thechapter{} and still ltr RTL B

There are three R blocks and two L blocks, and the order is RTL B and still ltr 1 ltr text RTL
A. This is by design to provide the proper behavior in the most usual cases — but if you
need to use \ref in an L text inside R, the L text must be marked up explictly; for example:

RTL A \foreignlanguage{english}{ltr text \thechapter{} and still ltr} RTL B

44
\BabelPatchSection {hsection-namei}
Mainly for bidi text, but it can be useful in other cases. \BabelPatchSection and the
corresponding option layout=sectioning takes a more logical approach (at least in many
cases) because it applies the global language to the section format (including the
\chaptername in \chapter), while the section text is still the current language. The latter
is passed to tocs and marks, too, and with sectioning in layout they both reset the
“global” language to the main one, while the text uses the “local” language.
With layout=sectioning all the standard sectioning commands are redefined (it also
“isolates” the page number in heads, for a proper bidi behavior), but with this command
you can set them individually if necessary (but note then tocs and marks are not touched).

\BabelFootnote {hcmdi}{hlocal-languagei}{hbeforei}{hafteri}
New 3.17 Something like:

\BabelFootnote{\parsfootnote}{\languagename}{(}{)}

defines \parsfootnote so that \parsfootnote{note} is equivalent to:

\footnote{(\foreignlanguage{\languagename}{note})}

but the footnote itself is typeset in the main language (to unify its direction). In addition,
\parsfootnotetext is defined. The option footnotes just does the following:

\BabelFootnote{\footnote}{\languagename}{}{}%
\BabelFootnote{\localfootnote}{\languagename}{}{}%
\BabelFootnote{\mainfootnote}{}{}{}

(which also redefine \footnotetext and define \localfootnotetext and


\mainfootnotetext). If the language argument is empty, then no language is selected
inside the argument of the footnote. Note this command is available always in bidi
documents, even without layout=footnotes.

EXAMPLE If you want to preserve directionality in footnotes and there are many footnotes entirely
in English, you can define:

\BabelFootnote{\enfootnote}{english}{}{.}

It adds a period outside the English part, so that it is placed at the left in the last line. This means
the dot the end of the footnote text should be omitted.

1.25 Language attributes


\languageattribute
This is a user-level command, to be used in the preamble of a document (after
\usepackage[...]{babel}), that declares which attributes are to be used for a given
language. It takes two arguments: the first is the name of the language; the second, a (list
of) attribute(s) to be used. Attributes must be set in the preamble and only once – they
cannot be turned on and off. The command checks whether the language is known in this
document and whether the attribute(s) are known for this language.
Very often, using a modifier in a package option is better.
Several language definition files use their own methods to set options. For example, french
uses \frenchsetup, magyar (1.5) uses \magyarOptions; modifiers provided by spanish
have no attribute counterparts. Macros setting options are also used (eg,
\ProsodicMarksOn in latin).

45
1.26 Hooks
New 3.9a A hook is a piece of code to be executed at certain events. Some hooks are
predefined when luatex and xetex are used.
New 3.64 This is not the only way to inject code at those points. The events listed below
can be used as a hook name in \AddToHook in the form
babel/hlanguage-namei/hevent-namei (with * it’s applied to all languages), but there is a
limitation, because the parameters passed with the babel mechanism are not allowed. The
\AddToHook mechanism does not replace the current one in ‘babel‘. Its main advantage is
you can reconfigure ‘babel‘ even before loading it. See the example below.

\AddBabelHook [hlang i]{hnamei}{heventi}{hcodei}


The same name can be applied to several events. Hooks with a certain {hnamei} may be
enabled and disabled for all defined events with \EnableBabelHook{hnamei},
\DisableBabelHook{hnamei}. Names containing the string babel are reserved (they are
used, for example, by \useshortands* to add a hook for the event afterextras).
New 3.33 They may be also applied to a specific language with the optional argument;
language-specific settings are executed after global ones.
Current events are the following; in some of them you can use one to three TEX parameters
(#1, #2, #3), with the meaning given:

adddialect (language name, dialect name) Used by luababel.def to load the patterns if
not preloaded.
patterns (language name, language with encoding) Executed just after the \language has
been set. The second argument has the patterns name actually selected (in the form of
either lang:ENC or lang).
hyphenation (language name, language with encoding) Executed locally just before
exceptions given in \babelhyphenation are actually set.
defaultcommands Used (locally) in \StartBabelCommands.
encodedcommands (input, font encodings) Used (locally) in \StartBabelCommands. Both
xetex and luatex make sure the encoded text is read correctly.
stopcommands Used to reset the above, if necessary.
write This event comes just after the switching commands are written to the aux file.
beforeextras Just before executing \extrashlanguagei. This event and the next one
should not contain language-dependent code (for that, add it to \extrashlanguagei).
afterextras Just after executing \extrashlanguagei. For example, the following
deactivates shorthands in all languages:

\AddBabelHook{noshort}{afterextras}{\languageshorthands{none}}

stringprocess Instead of a parameter, you can manipulate the macro \BabelString


containing the string to be defined with \SetString. For example, to use an expanded
version of the string in the definition, write:

\AddBabelHook{myhook}{stringprocess}{%
\protected@edef\BabelString{\BabelString}}

initiateactive (char as active, char as other, original char) New 3.9i Executed just
after a shorthand has been ‘initiated’. The three parameters are the same character
with different catcodes: active, other (\string’ed) and the original one.
afterreset New 3.9i Executed when selecting a language just after \originalTeX is
run and reset to its base value, before executing \captionshlanguagei and
\datehlanguagei.

Four events are used in hyphen.cfg, which are handled in a quite different way for
efficiency reasons – unlike the precedent ones, they only have a single hook and replace a
default definition.

46
everylanguage (language) Executed before every language patterns are loaded.
loadkernel (file) By default just defines a few basic commands. It can be used to define
different versions of them or to load a file.
loadpatterns (patterns file) Loads the patterns file. Used by luababel.def.
loadexceptions (exceptions file) Loads the exceptions file. Used by luababel.def.

EXAMPLE The generic unlocalized LATEX hooks are predefined, so that you can write:

\AddToHook{babel/*/afterextras}{\frenchspacing}

which is executed always after the extras for the language being selected (and just before the
non-localized hooks defined with \AddBabelHook).
In addition, locale-specific hooks in the form babel/hlanguage-namei/hevent-namei are
recognized (executed just before the localized babel hooks), but they are not predefined. You have
to do it yourself. For example, to set \frenchspacing only in bengali:

\ActivateGenericHook{babel/bengali/afterextras}
\AddToHook{babel/bengali/afterextras}{\frenchspacing}

\BabelContentsFiles New 3.9a This macro contains a list of “toc” types requiring a command to switch the
language. Its default value is toc,lof,lot, but you may redefine it with \renewcommand
(it’s up to you to make sure no toc type is duplicated).

1.27 Languages supported by babel with ldf files


In the following table most of the languages supported by babel with and .ldf file are
listed, together with the names of the option which you can load babel with for each
language. Note this list is open and the current options may be different. It does not
include ini files.

Afrikaans afrikaans
Azerbaijani azerbaijani
Basque basque
Breton breton
Bulgarian bulgarian
Catalan catalan
Croatian croatian
Czech czech
Danish danish
Dutch dutch
English english, USenglish, american, UKenglish, british, canadian, australian, newzealand
Esperanto esperanto
Estonian estonian
Finnish finnish
French french, francais, canadien, acadian
Galician galician
German austrian, german, germanb, ngerman, naustrian
Greek greek, polutonikogreek
Hebrew hebrew
Icelandic icelandic
Indonesian indonesian (bahasa, indon, bahasai)
Interlingua interlingua
Irish Gaelic irish
Italian italian
Latin latin
Lower Sorbian lowersorbian

47
Malay malay, melayu (bahasam)
North Sami samin
Norwegian norsk, nynorsk
Polish polish
Portuguese portuguese, brazilian (portuges, brazil)20
Romanian romanian
Russian russian
Scottish Gaelic scottish
Spanish spanish
Slovakian slovak
Slovenian slovene
Swedish swedish
Serbian serbian
Turkish turkish
Ukrainian ukrainian
Upper Sorbian uppersorbian
Welsh welsh

There are more languages not listed above, including hindi, thai, thaicjk, latvian, turkmen,
magyar, mongolian, romansh, lithuanian, spanglish, vietnamese, japanese, pinyin, arabic,
farsi, ibygreek, bgreek, serbianc, frenchle, ethiop and friulan.
Most of them work out of the box, but some may require extra fonts, encoding files, a
preprocessor or even a complete framework (like CJK or luatexja). For example, if you have
got the velthuis/devnag package, you can create a file with extension .dn:

\documentclass{article}
\usepackage[hindi]{babel}
\begin{document}
{\dn devaanaa.m priya.h}
\end{document}

Then you preprocess it with devnag hfilei, which creates hfilei.tex; you can then typeset
the latter with LATEX.

1.28 Unicode character properties in luatex


New 3.32 Part of the babel job is to apply Unicode rules to some script-specific features
based on some properties. Currently, they are 3, namely, direction (ie, bidi class), mirroring
glyphs, and line breaking for CJK scripts. These properties are stored in lua tables, which
you can modify with the following macro (for example, to set them for glyphs in the PUA).

\babelcharproperty {hchar-codei}[hto-char-codei]{hpropertyi}{hvaluei}
New 3.32 Here, {hchar-codei} is a number (with TEX syntax). With the optional argument,
you can set a range of values. There are three properties (with a short name, taken from
Unicode): direction (bc), mirror (bmg), linebreak (lb). The settings are global, and this
command is allowed only in vertical mode (the preamble or between paragraphs).
For example:

\babelcharproperty{`¿}{mirror}{`?}
\babelcharproperty{`-}{direction}{l} % or al, r, en, an, on, et, cs
\babelcharproperty{`)}{linebreak}{cl} % or id, op, cl, ns, ex, in, hy

Please, refer to the Unicode standard (Annex #9 and Annex #14) for the meaning of the
available codes. For example, en is ‘European number’ and id is ‘ideographic’.
New 3.39 Another property is locale, which adds characters to the list used by onchar in
\babelprovide, or, if the last argument is empty, removes them. The last argument is the
locale name:
20
The two last name comes from the times when they had to be shortened to 8 characters

48
\babelcharproperty{`,}{locale}{english}

1.29 Tweaking some features

\babeladjust {hkey-value-listi}
New 3.36 Sometimes you might need to disable some babel features. Currently this
macro understands the following keys (and only for luatex), with values on or off:
bidi.text, bidi.mirroring, bidi.mapdigits, layout.lists, layout.tabular,
linebreak.sea, linebreak.cjk, justify.arabic. For example, you can set
\babeladjust{bidi.text=off} if you are using an alternative algorithm or with large
sections not requiring it. Use with care, because these options do not deactivate other
related options (like paragraph direction with bidi.text).

1.30 Tips, workarounds, known issues and notes


• If you use the document class book and you use \ref inside the argument of \chapter
(or just use \ref inside \MakeUppercase), LATEX will keep complaining about an
undefined label. To prevent such problems, you can revert to using uppercase labels,
you can use \lowercase{\ref{foo}} inside the argument of \chapter, or, if you will
not use shorthands in labels, set the safe option to none or bib.

• Both ltxdoc and babel use \AtBeginDocument to change some catcodes, and babel
reloads hhline to make sure : has the right one, so if you want to change the catcode of
| it has to be done using the same method at the proper place, with

\AtBeginDocument{\DeleteShortVerb{\|}}

before loading babel. This way, when the document begins the sequence is (1) make |
active (ltxdoc); (2) make it unactive (your settings); (3) make babel shorthands active
(babel); (4) reload hhline (babel, now with the correct catcodes for | and :).

• Documents with several input encodings are not frequent, but sometimes are useful.
You can set different encodings for different languages as the following example shows:

\addto\extrasfrench{\inputencoding{latin1}}
\addto\extrasrussian{\inputencoding{koi8-r}}

• For the hyphenation to work correctly, lccodes cannot change, because TEX only takes
into account the values when the paragraph is hyphenated, i.e., when it has been
finished.21 So, if you write a chunk of French text with \foreignlanguage, the
apostrophes might not be taken into account. This is a limitation of TEX, not of babel.
Alternatively, you may use \useshorthands to activate ' and \defineshorthand, or
redefine \textquoteright (the latter is called by the non-ASCII right quote).

• \bibitem is out of sync with \selectlanguage in the .aux file. The reason is \bibitem
uses \immediate (and others, in fact), while \selectlanguage doesn’t. There is a
similar issue with floats, too. There is no known workaround.

• Babel does not take into account \normalsfcodes and (non-)French spacing is not
always properly (un)set by languages. However, problems are unlikely to happen and
therefore this part remains untouched in version 3.9 (but it is in the ‘to do’ list).
21
This explains why LATEX assumes the lowercase mapping of T1 and does not provide a tool for multiple map-
pings. Unfortunately, \savinghyphcodes is not a solution either, because lccodes for hyphenation are frozen in the
format and cannot be changed.

49
• Using a character mathematically active (ie, with math code "8000) as a shorthand can
make TEX enter in an infinite loop in some rare cases. (Another issue in the ‘to do’ list,
although there is a partial solution.)

The following packages can be useful, too (the list is still far from complete):

csquotes Logical markup for quotes.


iflang Tests correctly the current language.
hyphsubst Selects a different set of patterns for a language.
translator An open platform for packages that need to be localized.
siunitx Typesetting of numbers and physical quantities.
biblatex Programmable bibliographies and citations.
bicaption Bilingual captions.
babelbib Multilingual bibliographies.
microtype Adjusts the typesetting according to some languages (kerning and spacing).
Ligatures can be disabled.
substitutefont Combines fonts in several encodings.
mkpattern Generates hyphenation patterns.
tracklang Tracks which languages have been requested.
ucharclasses (xetex) Switches fonts when you switch from one Unicode block to another.
zhspacing Spacing for CJK documents in xetex.

1.31 Current and future work


The current work is focused on the so-called complex scripts in luatex. In 8-bit engines,
babel provided a basic support for bidi text as part of the style for Hebrew, but it is
somewhat unsatisfactory and internally replaces some hardwired commands by other
hardwired commands (generic changes would be much better).
Useful additions would be, for example, time, currency, addresses and personal names.22 .
But that is the easy part, because they don’t require modifying the LATEX internals.
Calendars (Arabic, Persian, Indic, etc.) are under study.
Also interesting are differences in the sentence structure or related to it. For example, in
Basque the number precedes the name (including chapters), in Hungarian “from (1)” is
“(1)-ből”, but “from (3)” is “(3)-ból”, in Spanish an item labelled “3.o ” may be referred to as
either “ítem 3.o ” or “3.er ítem”, and so on.
An option to manage bidirectional document layout in luatex (lists, footnotes, etc.) is
almost finished, but xetex required more work. Unfortunately, proper support for xetex
requires patching somehow lots of macros and packages (and some issues related to
\specials remain, like color and hyperlinks), so babel resorts to the bidi package (by Vafa
Khalighi). See the babel repository for a small example (xe-bidi).

1.32 Tentative and experimental code


See the code section for \foreignlanguage* (a new starred version of \foreignlanguage).
For old an deprecated functions, see the babel site.

Options for locales loaded on the fly


New 3.51 \babeladjust{ autoload.options = ... } sets the options when a language
is loaded on the fly (by default, no options). A typical value would be import, which
defines captions, date, numerals, etc., but ignores the code in the tex file (for example,
extended numerals in Greek).

Labels
New 3.48 There is some work in progress for babel to deal with labels, both with the
relation to captions (chapters, part), and how counters are used to define them. It is still
somewhat tentative because it is far from trivial – see the babel site for further details.
22
See for example POSIX, ISO 14652 and the Unicode Common Locale Data Repository (CLDR). Those systems,
however, have limited application to TEX because their aim is just to display information and not fine typesetting.

50
2 Loading languages with language.dat
TEX and most engines based on it (pdfTEX, xetex, -TEX, the main exception being luatex)
require hyphenation patterns to be preloaded when a format is created (eg, LATEX, XeLATEX,
pdfLATEX). babel provides a tool which has become standard in many distributions and
based on a “configuration file” named language.dat. The exact way this file is used
depends on the distribution, so please, read the documentation for the latter (note also
some distributions generate the file with some tool).
New 3.9q With luatex, however, patterns are loaded on the fly when requested by the
language (except the “0th” language, typically english, which is preloaded always).23 Until
3.9n, this task was delegated to the package luatex-hyphen, by Khaled Hosny, Élie Roux,
and Manuel Pégourié-Gonnard, and required an extra file named language.dat.lua, but
now a new mechanism has been devised based solely on language.dat. You must rebuild
the formats if upgrading from a previous version. You may want to have a local
language.dat for a particular project (for example, a book on Chemistry).24

2.1 Format
In that file the person who maintains a TEX environment has to record for which languages
he has hyphenation patterns and in which files these are stored25 . When hyphenation
exceptions are stored in a separate file this can be indicated by naming that file after the
file with the hyphenation patterns.
The file can contain empty lines and comments, as well as lines which start with an equals
(=) sign. Such a line will instruct LATEX that the hyphenation patterns just processed have to
be known under an alternative name. Here is an example:

% File : language.dat
% Purpose : tell iniTeX what files with patterns to load.
english english.hyphenations
=british

dutch hyphen.dutch exceptions.dutch % Nederlands


german hyphen.ger

You may also set the font encoding the patterns are intended for by following the language
name by a colon and the encoding code.26 For example:

german:T1 hyphenT1.ger
german hyphen.ger

With the previous settings, if the encoding when the language is selected is T1 then the
patterns in hyphenT1.ger are used, but otherwise use those in hyphen.ger (note the
encoding can be set in \extrashlang i).
A typical error when using babel is the following:

No hyphenation patterns were preloaded for


the language `<lang>' into the format.
Please, configure your TeX system to add them and
rebuild the format. Now I will use the patterns
preloaded for english instead}}

It simply means you must reconfigure language.dat, either by hand or with the tools
provided by your distribution.
23
This feature was added to 3.9o, but it was buggy. Both 3.9o and 3.9p are deprecated.
24
The loader for lua(e)tex is slightly different as it’s not based on babel but on etex.src. Until 3.9p it just didn’t
work, but thanks to the new code it works by reloading the data in the babel way, i.e., with language.dat.
25
This is because different operating systems sometimes use very different file-naming conventions.
26
This is not a new feature, but in former versions it didn’t work correctly.

51
3 The interface between the core of babel and the
language definition files
The language definition files (ldf) must conform to a number of conventions, because these
files have to fill in the gaps left by the common code in babel.def, i. e., the definitions of
the macros that produce texts. Also the language-switching possibility which has been
built into the babel system has its implications.
The following assumptions are made:

• Some of the language-specific definitions might be used by plain TEX users, so the files
have to be coded so that they can be read by both LATEX and plain TEX. The current
format can be checked by looking at the value of the macro \fmtname.

• The common part of the babel system redefines a number of macros and environments
(defined previously in the document style) to put in the names of macros that replace
the previously hard-wired texts. These macros have to be defined in the language
definition files.

• The language definition files must define five macros, used to activate and deactivate
the language-specific definitions. These macros are \hlang ihyphenmins,
\captionshlang i, \datehlang i, \extrashlang i and \noextrashlang i(the last two may
be left empty); where hlang i is either the name of the language definition file or the
name of the LATEX option that is to be used. These macros and their functions are
discussed below. You must define all or none for a language (or a dialect); defining, say,
\datehlang i but not \captionshlang i does not raise an error but can lead to
unexpected results.

• When a language definition file is loaded, it can define \l@hlang i to be a dialect of


\language0 when \l@hlang i is undefined.

• Language names must be all lowercase. If an unknown language is selected, babel will
attempt setting it after lowercasing its name.

• The semantics of modifiers is not defined (on purpose). In most cases, they will just be
simple separated options (eg, spanish), but a language might require, say, a set of
options organized as a tree with suboptions (in such a case, the recommended
separator is /).

Some recommendations:

• The preferred shorthand is ", which is not used in LATEX (quotes are entered as `` and
''). Other good choices are characters which are not used in a certain context (eg, = in
an ancient language). Note however =, <, >, : and the like can be dangerous, because
they may be used as part of the syntax of some elements (numeric expressions,
key/value pairs, etc.).

• Captions should not contain shorthands or encoding-dependent commands (the latter is


not always possible, but should be clearly documented). They should be defined using
the LICR. You may also use the new tools for encoded strings, described below.

• Avoid adding things to \noextrashlang i except for umlauthigh and friends,


\bbl@deactivate, \bbl@(non)frenchspacing, and language-specific macros. Use
always, if possible, \bbl@save and \bbl@savevariable (except if you still want to have
access to the previous value). Do not reset a macro or a setting to a hardcoded value.
Never. Instead save its value in \extrashlang i.

• Do not switch scripts. If you want to make sure a set of glyphs is used, switch either the
font encoding (low-level) or the language (high-level, which in turn may switch the font
encoding). Usage of things like \latintext is deprecated.27
27
But not removed, for backward compatibility.

52
• Please, for “private” internal macros do not use the \bbl@ prefix. It is used by babel and
it can lead to incompatibilities.

There are no special requirements for documenting your language files. Now they are not
included in the base babel manual, so provide a standalone document suited for your
needs, as well as other files you think can be useful. A PDF and a “readme” are strongly
recommended.

3.1 Guidelines for contributed languages


Currently, the easiest way to contribute a new language is by taking one the the 500 or so
ini templates available on GitHub as a basis. Just make a pull request o dowonload it and
then, after filling the fields, sent it to me. Fell free to ask for help or to make feature
requests.
As to ldf files, now language files are “outsourced” and are located in a separate directory
(/macros/latex/contrib/babel-contrib), so that they are contributed directly to CTAN
(please, do not send to me language styles just to upload them to CTAN).
Of course, placing your style files in this directory is not mandatory, but if you want to do
it, here are a few guidelines.

• Do not hesitate stating on the file heads you are the author and the maintainer, if you
actually are. There is no need to state the babel maintainer(s) as authors if they have
not contributed significantly to your language files.

• Fonts are not strictly part of a language, so they are best placed in the corresponding
TeX tree. This includes not only tfm, vf, ps1, otf, mf files and the like, but also fd ones.

• Font and input encodings are usually best placed in the corresponding tree, too, but
sometimes they belong more naturally to the babel style. Note you may also need to
define a LICR.

• Babel ldf files may just interface a framework, as it happens often with Oriental
languages/scripts. This framework is best placed in its own directory.

The following page provides a starting point for ldf files:


http://www.texnia.com/incubator.html. See also
https://latex3.github.io/babel/guides/list-of-locale-templates.html.
If you need further assistance and technical advice in the development of language styles, I
am willing to help you. And of course, you can make any suggestion you like.

3.2 Basic macros


In the core of the babel system, several macros are defined for use in language definition
files. Their purpose is to make a new language known. The first two are related to
hyphenation patterns.
\addlanguage The macro \addlanguage is a non-outer version of the macro \newlanguage, defined in
plain.tex version 3.x. Here “language” is used in the TEX sense of set of hyphenation
patterns.
\adddialect The macro \adddialect can be used when two languages can (or must) use the same
hyphenation patterns. This can also be useful for languages for which no patterns are
preloaded in the format. In such cases the default behavior of the babel system is to define
this language as a ‘dialect’ of the language for which the patterns were loaded as
\language0. Here “language” is used in the TEX sense of set of hyphenation patterns.
\<lang>hyphenmins The macro \hlang ihyphenmins is used to store the values of the \lefthyphenmin and
\righthyphenmin. Redefine this macro to set your own values, with two numbers
corresponding to these two parameters. For example:

\renewcommand\spanishhyphenmins{34}

53
(Assigning \lefthyphenmin and \righthyphenmin directly in \extras<lang> has no
effect.)
\providehyphenmins The macro \providehyphenmins should be used in the language definition files to set
\lefthyphenmin and \righthyphenmin. This macro will check whether these parameters
were provided by the hyphenation file before it takes any action. If these values have been
already set, this command is ignored (currently, default pattern files do not set them).
\captionshlang i The macro \captionshlang i defines the macros that hold the texts to replace the original
hard-wired texts.
\datehlang i The macro \datehlang i defines \today.
\extrashlang i The macro \extrashlang i contains all the extra definitions needed for a specific language.
This macro, like the following, is a hook – you can add things to it, but it must not be used
directly.
\noextrashlang i Because we want to let the user switch between languages, but we do not know what state
TEX might be in after the execution of \extrashlang i, a macro that brings TEX into a
predefined state is needed. It will be no surprise that the name of this macro is
\noextrashlang i.
\bbl@declare@ttribute This is a command to be used in the language definition files for declaring a language
attribute. It takes three arguments: the name of the language, the attribute to be defined,
and the code to be executed when the attribute is to be used.
\main@language To postpone the activation of the definitions needed for a language until the beginning of a
document, all language definition files should use \main@language instead of
\selectlanguage. This will just store the name of the language, and the proper language
will be activated at the start of the document.
\ProvidesLanguage The macro \ProvidesLanguage should be used to identify the language definition files. Its
syntax is similar to the syntax of the LATEX command \ProvidesPackage.
\LdfInit The macro \LdfInit performs a couple of standard checks that must be made at the
beginning of a language definition file, such as checking the category code of the @-sign,
preventing the .ldf file from being processed twice, etc.
\ldf@quit The macro \ldf@quit does work needed if a .ldf file was processed earlier. This includes
resetting the category code of the @-sign, preparing the language to be activated at
\begin{document} time, and ending the input stream.
\ldf@finish The macro \ldf@finish does work needed at the end of each .ldf file. This includes
resetting the category code of the @-sign, loading a local configuration file, and preparing
the language to be activated at \begin{document} time.
\loadlocalcfg After processing a language definition file, LATEX can be instructed to load a local
configuration file. This file can, for instance, be used to add strings to \captionshlang i to
support local document classes. The user will be informed that this configuration file has
been loaded. This macro is called by \ldf@finish.
\substitutefontfamily (Deprecated.) This command takes three arguments, a font encoding and two font family
names. It creates a font description file for the first font in the given encoding. This .fd file
will instruct LATEX to use a font from the second family when a font from the first family in
the given encoding seems to be needed.

3.3 Skeleton
Here is the basic structure of an ldf file, with a language, a dialect and an attribute.
Strings are best defined using the method explained in sec. 3.8 (babel 3.9 and later).

\ProvidesLanguage{<language>}
[2016/04/23 v0.0 <Language> support from the babel system]
\LdfInit{<language>}{captions<language>}

\ifx\undefined\l@<language>
\@nopatterns{<Language>}
\adddialect\l@<language>0
\fi

\adddialect\l@<dialect>\l@<language>

54
\bbl@declare@ttribute{<language>}{<attrib>}{%
\expandafter\addto\expandafter\extras<language>
\expandafter{\extras<attrib><language>}%
\let\captions<language>\captions<attrib><language>}

\providehyphenmins{<language>}{\tw@\thr@@}

\StartBabelCommands*{<language>}{captions}
\SetString\chaptername{<chapter name>}
% More strings

\StartBabelCommands*{<language>}{date}
\SetString\monthiname{<name of first month>}
% More strings

\StartBabelCommands*{<dialect>}{captions}
\SetString\chaptername{<chapter name>}
% More strings

\StartBabelCommands*{<dialect>}{date}
\SetString\monthiname{<name of first month>}
% More strings

\EndBabelCommands

\addto\extras<language>{}
\addto\noextras<language>{}
\let\extras<dialect>\extras<language>
\let\noextras<dialect>\noextras<language>

\ldf@finish{<language>}

NOTE If for some reason you want to load a package in your style, you should be aware it cannot be
done directly in the ldf file, but it can be delayed with \AtEndOfPackage. Macros from external
packages can be used inside definitions in the ldf itself (for example, \extras<language>), but if
executed directly, the code must be placed inside \AtEndOfPackage. A trivial example illustrating
these points is:

\AtEndOfPackage{%
\RequirePackage{dingbat}% Delay package
\savebox{\myeye}{\eye}}% And direct usage
\newsavebox{\myeye}
\newcommand\myanchor{\anchor}% But OK inside command

3.4 Support for active characters


In quite a number of language definition files, active characters are introduced. To
facilitate this, some support macros are provided.
\initiate@active@char The internal macro \initiate@active@char is used in language definition files to instruct
LATEX to give a character the category code ‘active’. When a character has been made active
it will remain that way until the end of the document. Its definition may vary.
\bbl@activate The command \bbl@activate is used to change the way an active character expands.
\bbl@deactivate \bbl@activate ‘switches on’ the active behavior of the character. \bbl@deactivate lets
the active character expand to its former (mostly) non-active self.
\declare@shorthand The macro \declare@shorthand is used to define the various shorthands. It takes three
arguments: the name for the collection of shorthands this definition belongs to; the
character (sequence) that makes up the shorthand, i.e. ~ or "a; and the code to be executed
when the shorthand is encountered. (It does not raise an error if the shorthand character
has not been “initiated”.)

55
\bbl@add@special The TEXbook states: “Plain TEX includes a macro called \dospecials that is essentially a set
\bbl@remove@special macro, representing the set of all characters that have a special category code.” [4, p. 380]
It is used to set text ‘verbatim’. To make this work if more characters get a special category
code, you have to add this character to the macro \dospecial. LATEX adds another macro
called \@sanitize representing the same character set, but without the curly braces. The
macros \bbl@add@specialhchari and \bbl@remove@specialhchari add and remove the
character hchari to these two sets.

3.5 Support for saving macro definitions


Language definition files may want to redefine macros that already exist. Therefore a
mechanism for saving (and restoring) the original definition of those macros is provided.
We provide two macros for this28 .
\babel@save To save the current meaning of any control sequence, the macro \babel@save is provided.
It takes one argument, hcsnamei, the control sequence for which the meaning has to be
saved.
\babel@savevariable A second macro is provided to save the current value of a variable. In this context,
anything that is allowed after the \the primitive is considered to be a variable. The macro
takes one argument, the hvariablei.
The effect of the preceding macros is to append a piece of code to the current definition of
\originalTeX. When \originalTeX is expanded, this code restores the previous definition
of the control sequence or the previous value of the variable.

3.6 Support for extending macros


\addto The macro \addto{hcontrol sequencei}{hTEX codei} can be used to extend the definition of
a macro. The macro need not be defined (ie, it can be undefined or \relax). This macro
can, for instance, be used in adding instructions to a macro like \extrasenglish.
Be careful when using this macro, because depending on the case the assignment can be
either global (usually) or local (sometimes). That does not seem very consistent, but this
behavior is preserved for backward compatibility. If you are using etoolbox, by Philipp
Lehman, consider using the tools provided by this package instead of \addto.

3.7 Macros common to a number of languages


\bbl@allowhyphens In several languages compound words are used. This means that when TEX has to
hyphenate such a compound word, it only does so at the ‘-’ that is used in such words. To
allow hyphenation in the rest of such a compound word, the macro \bbl@allowhyphens
can be used.
\allowhyphens Same as \bbl@allowhyphens, but does nothing if the encoding is T1. It is intended mainly
for characters provided as real glyphs by this encoding but constructed with \accent in
OT1.
Note the previous command (\bbl@allowhyphens) has different applications (hyphens and
discretionaries) than this one (composite chars). Note also prior to version 3.7,
\allowhyphens had the behavior of \bbl@allowhyphens.
\set@low@box For some languages, quotes need to be lowered to the baseline. For this purpose the macro
\set@low@box is available. It takes one argument and puts that argument in an \hbox, at
the baseline. The result is available in \box0 for further processing.
\save@sf@q Sometimes it is necessary to preserve the \spacefactor. For this purpose the macro
\save@sf@q is available. It takes one argument, saves the current spacefactor, executes the
argument, and restores the spacefactor.
\bbl@frenchspacing The commands \bbl@frenchspacing and \bbl@nonfrenchspacing can be used to
\bbl@nonfrenchspacing properly switch French spacing on and off.
28
This mechanism was introduced by Bernd Raichle.

56
3.8 Encoding-dependent strings
New 3.9a Babel 3.9 provides a way of defining strings in several encodings, intended
mainly for luatex and xetex. This is the only new feature requiring changes in language
files if you want to make use of it.
Furthermore, it must be activated explicitly, with the package option strings. If there is
no strings, these blocks are ignored, except \SetCases (and except if forced as described
below). In other words, the old way of defining/switching strings still works and it’s used
by default.
It consist is a series of blocks started with \StartBabelCommands. The last block is closed
with \EndBabelCommands. Each block is a single group (ie, local declarations apply until
the next \StartBabelCommands or \EndBabelCommands). An ldf may contain several
series of this kind.
Thanks to this new feature, string values and string language switching are not mixed any
more. No need of \addto. If the language is french, just redefine \frenchchaptername.

\StartBabelCommands {hlanguage-listi}{hcategoryi}[hselectori]
The hlanguage-listi specifies which languages the block is intended for. A block is taken
into account only if the \CurrentOption is listed here. Alternatively, you can define
\BabelLanguages to a comma-separated list of languages to be defined (if undefined,
\StartBabelCommands sets it to \CurrentOption). You may write \CurrentOption as the
language, but this is discouraged – a explicit name (or names) is much better and clearer.
A “selector” is a name to be used as value in package option strings, optionally followed
by extra info about the encodings to be used. The name unicode must be used for xetex
and luatex (the key strings has also other two special values: generic and encoded).
If a string is set several times (because several blocks are read), the first one takes
precedence (ie, it works much like \providecommand).
Encoding info is charset= followed by a charset, which if given sets how the strings should
be translated to the internal representation used by the engine, typically utf8, which is the
only value supported currently (default is no translations). Note charset is applied by
luatex and xetex when reading the file, not when the macro or string is used in the
document.
A list of font encodings which the strings are expected to work with can be given after
fontenc= (separated with spaces, if two or more) – recommended, but not mandatory,
although blocks without this key are not taken into account if you have requested
strings=encoded.
Blocks without a selector are read always if the key strings has been used. They provide
fallback values, and therefore must be the last blocks; they should be provided always if
possible and all strings should be defined somehow inside it; they can be the only blocks
(mainly LGC scripts using the LICR). Blocks without a selector can be activated explicitly
with strings=generic (no block is taken into account except those). With
strings=encoded, strings in those blocks are set as default (internally, ?). With
strings=encoded strings are protected, but they are correctly expanded in
\MakeUppercase and the like. If there is no key strings, string definitions are ignored, but
\SetCases are still honored (in a encoded way).
The hcategoryi is either captions, date or extras. You must stick to these three categories,
even if no error is raised when using other name.29 It may be empty, too, but in such a case
using \SetString is an error (but not \SetCase).

\StartBabelCommands{language}{captions}
[unicode, fontenc=TU EU1 EU2, charset=utf8]
\SetString{\chaptername}{utf8-string}

\StartBabelCommands{language}{captions}
\SetString{\chaptername}{ascii-maybe-LICR-string}

29
In future releases further categories may be added.

57
\EndBabelCommands

A real example is:

\StartBabelCommands{austrian}{date}
[unicode, fontenc=TU EU1 EU2, charset=utf8]
\SetString\monthiname{Jänner}

\StartBabelCommands{german,austrian}{date}
[unicode, fontenc=TU EU1 EU2, charset=utf8]
\SetString\monthiiiname{März}

\StartBabelCommands{austrian}{date}
\SetString\monthiname{J\"{a}nner}

\StartBabelCommands{german}{date}
\SetString\monthiname{Januar}

\StartBabelCommands{german,austrian}{date}
\SetString\monthiiname{Februar}
\SetString\monthiiiname{M\"{a}rz}
\SetString\monthivname{April}
\SetString\monthvname{Mai}
\SetString\monthviname{Juni}
\SetString\monthviiname{Juli}
\SetString\monthviiiname{August}
\SetString\monthixname{September}
\SetString\monthxname{Oktober}
\SetString\monthxiname{November}
\SetString\monthxiiname{Dezenber}
\SetString\today{\number\day.~%
\csname month\romannumeral\month name\endcsname\space
\number\year}

\StartBabelCommands{german,austrian}{captions}
\SetString\prefacename{Vorwort}
[etc.]

\EndBabelCommands

When used in ldf files, previous values of \hcategoryihlanguagei are overridden, which
means the old way to define strings still works and used by default (to be precise, is first set
to undefined and then strings are added). However, when used in the preamble or in a
package, new settings are added to the previous ones, if the language exists (in the babel
sense, ie, if \datehlanguagei exists).

\StartBabelCommands * {hlanguage-listi}{hcategoryi}[hselectori]
The starred version just forces strings to take a value – if not set as package option, then
the default for the engine is used. This is not done by default to prevent backward
incompatibilities, but if you are creating a new language this version is better. It’s up to the
maintainers of the current languages to decide if using it is appropriate.30

\EndBabelCommands Marks the end of the series of blocks.

\AfterBabelCommands {hcodei}
The code is delayed and executed at the global scope just after \EndBabelCommands.
30
This replaces in 3.9g a short-lived \UseStrings which has been removed because it did not work.

58
\SetString {hmacro-namei}{hstring i}
Adds hmacro-namei to the current category, and defines globally hlang-macro-namei to
hcodei (after applying the transformation corresponding to the current charset or defined
with the hook stringprocess).
Use this command to define strings, without including any “logic” if possible, which should
be a separated macro. See the example above for the date.

\SetStringLoop {hmacro-namei}{hstring-listi}
A convenient way to define several ordered names at once. For example, to define
\abmoniname, \abmoniiname, etc. (and similarly with abday):

\SetStringLoop{abmon#1name}{en,fb,mr,ab,my,jn,jl,ag,sp,oc,nv,dc}
\SetStringLoop{abday#1name}{lu,ma,mi,ju,vi,sa,do}

#1 is replaced by the roman numeral.

\SetCase [hmap-listi]{htoupper-codei}{htolower-codei}
Sets globally code to be executed at \MakeUppercase and \MakeLowercase. The code
would typically be things like \let\BB\bb and \uccode or \lccode (although for the
reasons explained above, changes in lc/uc codes may not work). A hmap-listi is a series of
macros using the internal format of \@uclclist (eg, \bb\BB\cc\CC). The mandatory
arguments take precedence over the optional one. This command, unlike \SetString, is
executed always (even without strings), and it is intended for minor readjustments only.
For example, as T1 is the default case mapping in LATEX, we can set for Turkish:

\StartBabelCommands{turkish}{}[ot1enc, fontenc=OT1]
\SetCase
{\uccode"10=`I\relax}
{\lccode`I="10\relax}

\StartBabelCommands{turkish}{}[unicode, fontenc=TU EU1 EU2, charset=utf8]


\SetCase
{\uccode`i=`İ\relax
\uccode`ı=`I\relax}
{\lccode`İ=`i\relax
\lccode`I=`ı\relax}

\StartBabelCommands{turkish}{}
\SetCase
{\uccode`i="9D\relax
\uccode"19=`I\relax}
{\lccode"9D=`i\relax
\lccode`I="19\relax}

\EndBabelCommands

(Note the mapping for OT1 is not complete.)

\SetHyphenMap {hto-lower-macrosi}
New 3.9g Case mapping serves in TEX for two unrelated purposes: case transforms
(upper/lower) and hyphenation. \SetCase handles the former, while hyphenation is
handled by \SetHyphenMap and controlled with the package option hyphenmap. So, even if
internally they are based on the same TEX primitive (\lccode), babel sets them separately.
There are three helper macros to be used inside \SetHyphenMap:
• \BabelLower{huccodei}{hlccodei} is similar to \lccode but it’s ignored if the char has
been set and saves the original lccode to restore it when switching the language (except
with hyphenmap=first).

59
• \BabelLowerMM{huccode-fromi}{huccode-toi}{hstepi}{hlccode-fromi} loops though the
given uppercase codes, using the step, and assigns them the lccode, which is also
increased (MM stands for many-to-many).

• \BabelLowerMO{huccode-fromi}{huccode-toi}{hstepi}{hlccodei} loops though the given


uppercase codes, using the step, and assigns them the lccode, which is fixed (MO stands
for many-to-one).

An example is (which is redundant, because these assignments are done by both luatex
and xetex):

\SetHyphenMap{\BabelLowerMM{"100}{"11F}{2}{"101}}

This macro is not intended to fix wrong mappings done by Unicode (which are the default
in both xetex and luatex) – if an assignment is wrong, fix it directly.

3.9 Executing code based on the selector

\IfBabelSelectorTF {hselectorsi}{htruei}{hfalsei}
New 3.67 Sometimes a different setup is desired depending on the selector used. Values
allowed in hselectorsi are select, other, foreign, other* (and also foreign* for the
tentative starred version), and it can consist of a comma-separated list. For example:

\IfBabelSelectorTF{other, other*}{A}{B}

is true with these two environment selectors.


Its natural place of use is in hooks or in \extrashlanguagei.

Part II
Source code
babel is being developed incrementally, which means parts of the code are under development and
therefore incomplete. Only documented features are considered complete. In other words, use babel
only as documented (except, of course, if you want to explore and test them – you can post
suggestions about multilingual issues to [email protected] on
http://tug.org/mailman/listinfo/kadingira).

4 Identification and loading of required files


Code documentation is still under revision.
The following description is no longer valid, because switch and plain have been merged into
babel.def.
The babel package after unpacking consists of the following files:
switch.def defines macros to set and switch languages.
babel.def defines the rest of macros. It has tow parts: a generic one and a second one only for
LaTeX.
babel.sty is the LATEX package, which set options and load language styles.
plain.def defines some LATEX macros required by babel.def and provides a few tools for Plain.
hyphen.cfg is the file to be used when generating the formats to load hyphenation patterns.
The babel installer extends docstrip with a few “pseudo-guards” to set “variables” used at installation
time. They are used with <@name@> at the appropiated places in the source code and shown below
with hhnameii. That brings a little bit of literate programming.

60
5 locale directory
A required component of babel is a set of ini files with basic definitions for about 200 languages.
They are distributed as a separate zip file, not packed as dtx. With them, babel will fully support
Unicode engines.
Most of them are essentially finished (except bugs and mistakes, of course). Some of them are still
incomplete (but they will be usable), and there are some omissions (eg, Latin and polytonic Greek,
and there are no geographic areas in Spanish). Hindi, French, Occitan and Breton will show a
warning related to dates. Not all include LICR variants.
This is a preliminary documentation.
ini files contain the actual data; tex files are currently just proxies to the corresponding ini files.
Most keys are self-explanatory.
charset the encoding used in the ini file.
version of the ini file
level “version” of the ini specification . which keys are available (they may grow in a compatible
way) and how they should be read.
encodings a descriptive list of font encondings.
[captions] section of captions in the file charset
[captions.licr] same, but in pure ASCII using the LICR
date.long fields are as in the CLDR, but the syntax is different. Anything inside brackets is a date
field (eg, MMMM for the month name) and anything outside is text. In addition, [ ] is a non
breakable space and [.] is an abbreviation dot.
Keys may be further qualified in a particular language with a suffix starting with a uppercase letter.
It can be just a letter (eg, babel.name.A, babel.name.B) or a name (eg, date.long.Nominative,
date.long.Formal, but no language is currently using the latter). Multi-letter qualifiers are forward
compatible in the sense they won’t conflict with new “global” keys (which start always with a
lowercase case). There is an exception, however: the section counters has been devised to have
arbitrary keys, so you can add lowercased keys if you want.

6 Tools
1 hhversion=3.83ii
2 hhdate=2022/11/30ii

Do not use the following macros in ldf files. They may change in the future. This applies mainly
to those recently added for replacing, trimming and looping. The older ones, like \bbl@afterfi, will
not change.
We define some basic macros which just make the code cleaner. \bbl@add is now used internally
instead of \addto because of the unpredictable behavior of the latter. Used in babel.def and in
babel.sty, which means in LATEX is executed twice, but we need them when defining options and
babel.def cannot be load until options have been defined. This does not hurt, but should be fixed
somehow.
3 hh∗Basic macrosii ≡
4 \bbl@trace{Basic macros}
5 \def\bbl@stripslash{\expandafter\@gobble\string}
6 \def\bbl@add#1#2{%
7 \bbl@ifunset{\bbl@stripslash#1}%
8 {\def#1{#2}}%
9 {\expandafter\def\expandafter#1\expandafter{#1#2}}}
10 \def\bbl@xin@{\@expandtwoargs\in@}
11 \def\bbl@carg#1#2{\expandafter#1\csname#2\endcsname}%
12 \def\bbl@ncarg#1#2#3{\expandafter#1\expandafter#2\csname#3\endcsname}%
13 \def\bbl@ccarg#1#2#3{%
14 \expandafter#1\csname#2\expandafter\endcsname\csname#3\endcsname}%
15 \def\bbl@csarg#1#2{\expandafter#1\csname bbl@#2\endcsname}%
16 \def\bbl@cs#1{\csname bbl@#1\endcsname}
17 \def\bbl@cl#1{\csname bbl@#1@\languagename\endcsname}
18 \def\bbl@loop#1#2#3{\bbl@@loop#1{#3}#2,\@nnil,}
19 \def\bbl@loopx#1#2{\expandafter\bbl@loop\expandafter#1\expandafter{#2}}
20 \def\bbl@@loop#1#2#3,{%
21 \ifx\@nnil#3\relax\else
22 \def#1{#3}#2\bbl@afterfi\bbl@@loop#1{#2}%

61
23 \fi}
24 \def\bbl@for#1#2#3{\bbl@loopx#1{#2}{\ifx#1\@empty\else#3\fi}}

\bbl@add@list This internal macro adds its second argument to a comma separated list in its first argument. When
the list is not defined yet (or empty), it will be initiated. It presumes expandable character strings.

25 \def\bbl@add@list#1#2{%
26 \edef#1{%
27 \bbl@ifunset{\bbl@stripslash#1}%
28 {}%
29 {\ifx#1\@empty\else#1,\fi}%
30 #2}}

\bbl@afterelse Because the code that is used in the handling of active characters may need to look ahead, we take
\bbl@afterfi extra care to ‘throw’ it over the \else and \fi parts of an \if-statement31 . These macros will break
if another \if...\fi statement appears in one of the arguments and it is not enclosed in braces.

31 \long\def\bbl@afterelse#1\else#2\fi{\fi#1}
32 \long\def\bbl@afterfi#1\fi{\fi#1}

\bbl@exp Now, just syntactical sugar, but it makes partial expansion of some code a lot more simple and
readable. Here \\ stands for \noexpand, \<..> for \noexpand applied to a built macro name (which
does not define the macro if undefined to \relax, because it is created locally), and \[..] for
one-level expansion (where .. is the macro name without the backslash). The result may be
followed by extra arguments, if necessary.

33 \def\bbl@exp#1{%
34 \begingroup
35 \let\\\noexpand
36 \let\<\bbl@exp@en
37 \let\[\bbl@exp@ue
38 \edef\bbl@exp@aux{\endgroup#1}%
39 \bbl@exp@aux}
40 \def\bbl@exp@en#1>{\expandafter\noexpand\csname#1\endcsname}%
41 \def\bbl@exp@ue#1]{%
42 \unexpanded\expandafter\expandafter\expandafter{\csname#1\endcsname}}%

\bbl@trim The following piece of code is stolen (with some changes) from keyval, by David Carlisle. It defines
two macros: \bbl@trim and \bbl@trim@def. The first one strips the leading and trailing spaces from
the second argument and then applies the first argument (a macro, \toks@ and the like). The second
one, as its name suggests, defines the first argument as the stripped second argument.

43 \def\bbl@tempa#1{%
44 \long\def\bbl@trim##1##2{%
45 \futurelet\bbl@trim@a\bbl@trim@c##2\@nil\@nil#1\@nil\relax{##1}}%
46 \def\bbl@trim@c{%
47 \ifx\bbl@trim@a\@sptoken
48 \expandafter\bbl@trim@b
49 \else
50 \expandafter\bbl@trim@b\expandafter#1%
51 \fi}%
52 \long\def\bbl@trim@b#1##1 \@nil{\bbl@trim@i##1}}
53 \bbl@tempa{ }
54 \long\def\bbl@trim@i#1\@nil#2\relax#3{#3{#1}}
55 \long\def\bbl@trim@def#1{\bbl@trim{\def#1}}

\bbl@ifunset To check if a macro is defined, we create a new macro, which does the same as \@ifundefined.
However, in an -tex engine, it is based on \ifcsname, which is more efficient, and does not waste
memory. Defined inside a group, to avoid \ifcsname being implicitly set to \relax by the \csname
test.

56 \begingroup
57 \gdef\bbl@ifunset#1{%
58 \expandafter\ifx\csname#1\endcsname\relax
59 \expandafter\@firstoftwo
31
This code is based on code presented in TUGboat vol. 12, no2, June 1991 in “An expansion Power Lemma” by
Sonja Maus.

62
60 \else
61 \expandafter\@secondoftwo
62 \fi}
63 \bbl@ifunset{ifcsname}%
64 {}%
65 {\gdef\bbl@ifunset#1{%
66 \ifcsname#1\endcsname
67 \expandafter\ifx\csname#1\endcsname\relax
68 \bbl@afterelse\expandafter\@firstoftwo
69 \else
70 \bbl@afterfi\expandafter\@secondoftwo
71 \fi
72 \else
73 \expandafter\@firstoftwo
74 \fi}}
75 \endgroup

\bbl@ifblank A tool from url, by Donald Arseneau, which tests if a string is empty or space. The companion macros
tests if a macro is defined with some ‘real’ value, ie, not \relax and not empty,
76 \def\bbl@ifblank#1{%
77 \bbl@ifblank@i#1\@nil\@nil\@secondoftwo\@firstoftwo\@nil}
78 \long\def\bbl@ifblank@i#1#2\@nil#3#4#5\@nil{#4}
79 \def\bbl@ifset#1#2#3{%
80 \bbl@ifunset{#1}{#3}{\bbl@exp{\\\bbl@ifblank{\@nameuse{#1}}}{#3}{#2}}}

For each element in the comma separated <key>=<value> list, execute <code> with #1 and #2 as the
key and the value of current item (trimmed). In addition, the item is passed verbatim as #3. With the
<key> alone, it passes \@empty (ie, the macro thus named, not an empty argument, which is what you
get with <key>= and no value).
81 \def\bbl@forkv#1#2{%
82 \def\bbl@kvcmd##1##2##3{#2}%
83 \bbl@kvnext#1,\@nil,}
84 \def\bbl@kvnext#1,{%
85 \ifx\@nil#1\relax\else
86 \bbl@ifblank{#1}{}{\bbl@forkv@eq#1=\@empty=\@nil{#1}}%
87 \expandafter\bbl@kvnext
88 \fi}
89 \def\bbl@forkv@eq#1=#2=#3\@nil#4{%
90 \bbl@trim@def\bbl@forkv@a{#1}%
91 \bbl@trim{\expandafter\bbl@kvcmd\expandafter{\bbl@forkv@a}}{#2}{#4}}
A for loop. Each item (trimmed), is #1. It cannot be nested (it’s doable, but we don’t need it).
92 \def\bbl@vforeach#1#2{%
93 \def\bbl@forcmd##1{#2}%
94 \bbl@fornext#1,\@nil,}
95 \def\bbl@fornext#1,{%
96 \ifx\@nil#1\relax\else
97 \bbl@ifblank{#1}{}{\bbl@trim\bbl@forcmd{#1}}%
98 \expandafter\bbl@fornext
99 \fi}
100 \def\bbl@foreach#1{\expandafter\bbl@vforeach\expandafter{#1}}

\bbl@replace Returns implicitly \toks@ with the modified string.


101 \def\bbl@replace#1#2#3{% in #1 -> repl #2 by #3
102 \toks@{}%
103 \def\bbl@replace@aux##1#2##2#2{%
104 \ifx\bbl@nil##2%
105 \toks@\expandafter{\the\toks@##1}%
106 \else
107 \toks@\expandafter{\the\toks@##1#3}%
108 \bbl@afterfi
109 \bbl@replace@aux##2#2%
110 \fi}%

63
111 \expandafter\bbl@replace@aux#1#2\bbl@nil#2%
112 \edef#1{\the\toks@}}
An extensison to the previous macro. It takes into account the parameters, and it is string based (ie, if
you replace elax by ho, then \relax becomes \rho). No checking is done at all, because it is not a
general purpose macro, and it is used by babel only when it works (an example where it does not
work is in \bbl@TG@@date, and also fails if there are macros with spaces, because they are
retokenized). It may change! (or even merged with \bbl@replace; I’m not sure ckecking the
replacement is really necessary or just paranoia).
113 \ifx\detokenize\@undefined\else % Unused macros if old Plain TeX
114 \bbl@exp{\def\\\bbl@parsedef##1\detokenize{macro:}}#2->#3\relax{%
115 \def\bbl@tempa{#1}%
116 \def\bbl@tempb{#2}%
117 \def\bbl@tempe{#3}}
118 \def\bbl@sreplace#1#2#3{%
119 \begingroup
120 \expandafter\bbl@parsedef\meaning#1\relax
121 \def\bbl@tempc{#2}%
122 \edef\bbl@tempc{\expandafter\strip@prefix\meaning\bbl@tempc}%
123 \def\bbl@tempd{#3}%
124 \edef\bbl@tempd{\expandafter\strip@prefix\meaning\bbl@tempd}%
125 \bbl@xin@{\bbl@tempc}{\bbl@tempe}% If not in macro, do nothing
126 \ifin@
127 \bbl@exp{\\\bbl@replace\\\bbl@tempe{\bbl@tempc}{\bbl@tempd}}%
128 \def\bbl@tempc{% Expanded an executed below as 'uplevel'
129 \\\makeatletter % "internal" macros with @ are assumed
130 \\\scantokens{%
131 \bbl@tempa\\\@namedef{\bbl@stripslash#1}\bbl@tempb{\bbl@tempe}}%
132 \catcode64=\the\catcode64\relax}% Restore @
133 \else
134 \let\bbl@tempc\@empty % Not \relax
135 \fi
136 \bbl@exp{% For the 'uplevel' assignments
137 \endgroup
138 \bbl@tempc}} % empty or expand to set #1 with changes
139 \fi

Two further tools. \bbl@ifsamestring first expand its arguments and then compare their expansion
(sanitized, so that the catcodes do not matter). \bbl@engine takes the following values: 0 is pdfTEX, 1
is luatex, and 2 is xetex. You may use the latter it in your language style if you want.
140 \def\bbl@ifsamestring#1#2{%
141 \begingroup
142 \protected@edef\bbl@tempb{#1}%
143 \edef\bbl@tempb{\expandafter\strip@prefix\meaning\bbl@tempb}%
144 \protected@edef\bbl@tempc{#2}%
145 \edef\bbl@tempc{\expandafter\strip@prefix\meaning\bbl@tempc}%
146 \ifx\bbl@tempb\bbl@tempc
147 \aftergroup\@firstoftwo
148 \else
149 \aftergroup\@secondoftwo
150 \fi
151 \endgroup}
152 \chardef\bbl@engine=%
153 \ifx\directlua\@undefined
154 \ifx\XeTeXinputencoding\@undefined
155 \z@
156 \else
157 \tw@
158 \fi
159 \else
160 \@ne
161 \fi
A somewhat hackish tool (hence its name) to avoid spurious spaces in some contexts.

64
162 \def\bbl@bsphack{%
163 \ifhmode
164 \hskip\z@skip
165 \def\bbl@esphack{\loop\ifdim\lastskip>\z@\unskip\repeat\unskip}%
166 \else
167 \let\bbl@esphack\@empty
168 \fi}

Another hackish tool, to apply case changes inside a protected macros. It’s based on the internal
\let’s made by \MakeUppercase and \MakeLowercase between things like \oe and \OE.

169 \def\bbl@cased{%
170 \ifx\oe\OE
171 \expandafter\in@\expandafter
172 {\expandafter\OE\expandafter}\expandafter{\oe}%
173 \ifin@
174 \bbl@afterelse\expandafter\MakeUppercase
175 \else
176 \bbl@afterfi\expandafter\MakeLowercase
177 \fi
178 \else
179 \expandafter\@firstofone
180 \fi}

An alternative to \IfFormatAtLeastTF for old versions. Temporary.

181 \ifx\IfFormatAtLeastTF\@undefined
182 \def\bbl@ifformatlater{\@ifl@t@r\fmtversion}
183 \else
184 \let\bbl@ifformatlater\IfFormatAtLeastTF
185 \fi

The following adds some code to \extras... both before and after, while avoiding doing it twice. It’s
somewhat convoluted, to deal with #’s. Used to deal with alph, Alph and frenchspacing when there
are already changes (with \babel@save).

186 \def\bbl@extras@wrap#1#2#3{% 1:in-test, 2:before, 3:after


187 \toks@\expandafter\expandafter\expandafter{%
188 \csname extras\languagename\endcsname}%
189 \bbl@exp{\\\in@{#1}{\the\toks@}}%
190 \ifin@\else
191 \@temptokena{#2}%
192 \edef\bbl@tempc{\the\@temptokena\the\toks@}%
193 \toks@\expandafter{\bbl@tempc#3}%
194 \expandafter\edef\csname extras\languagename\endcsname{\the\toks@}%
195 \fi}
196 hh/Basic macrosii

Some files identify themselves with a LATEX macro. The following code is placed before them to define
(and then undefine) if not in LATEX.

197 hh∗Make sure ProvidesFile is definedii ≡


198 \ifx\ProvidesFile\@undefined
199 \def\ProvidesFile#1[#2 #3 #4]{%
200 \wlog{File: #1 #4 #3 <#2>}%
201 \let\ProvidesFile\@undefined}
202 \fi
203 hh/Make sure ProvidesFile is definedii

6.1 Multiple languages


\language Plain TEX version 3.0 provides the primitive \language that is used to store the current language.
When used with a pre-3.0 version this function has to be implemented by allocating a counter. The
following block is used in switch.def and hyphen.cfg; the latter may seem redundant, but
remember babel doesn’t requires loading switch.def in the format.

204 hh∗Define core switching macrosii ≡


205 \ifx\language\@undefined

65
206 \csname newcount\endcsname\language
207 \fi
208 hh/Define core switching macrosii

\last@language Another counter is used to keep track of the allocated languages. TEX and LATEX reserves for this
purpose the count 19.

\addlanguage This macro was introduced for TEX < 2. Preserved for compatibility.

209 hh∗Define core switching macrosii ≡


210 \countdef\last@language=19
211 \def\addlanguage{\csname newlanguage\endcsname}
212 hh/Define core switching macrosii

Now we make sure all required files are loaded. When the command \AtBeginDocument doesn’t
exist we assume that we are dealing with a plain-based format. In that case the file plain.def is
needed (which also defines \AtBeginDocument, and therefore it is not loaded twice). We need the
first part when the format is created, and \orig@dump is used as a flag. Otherwise, we need to use the
second part, so \orig@dump is not defined (plain.def undefines it).
Check if the current version of switch.def has been previously loaded (mainly, hyphen.cfg). If not,
load it now. We cannot load babel.def here because we first need to declare and process the
package options.

6.2 The Package File (LATEX, babel.sty)


213 h∗packagei
214 \NeedsTeXFormat{LaTeX2e}[2005/12/01]
215 \ProvidesPackage{babel}[ hhdateii hhversionii The Babel package]
Start with some “private” debugging tool, and then define macros for errors.
216 \@ifpackagewith{babel}{debug}
217 {\providecommand\bbl@trace[1]{\message{^^J[ #1 ]}}%
218 \let\bbl@debug\@firstofone
219 \ifx\directlua\@undefined\else
220 \directlua{ Babel = Babel or {}
221 Babel.debug = true }%
222 \input{babel-debug.tex}%
223 \fi}
224 {\providecommand\bbl@trace[1]{}%
225 \let\bbl@debug\@gobble
226 \ifx\directlua\@undefined\else
227 \directlua{ Babel = Babel or {}
228 Babel.debug = false }%
229 \fi}
230 \def\bbl@error#1#2{%
231 \begingroup
232 \def\\{\MessageBreak}%
233 \PackageError{babel}{#1}{#2}%
234 \endgroup}
235 \def\bbl@warning#1{%
236 \begingroup
237 \def\\{\MessageBreak}%
238 \PackageWarning{babel}{#1}%
239 \endgroup}
240 \def\bbl@infowarn#1{%
241 \begingroup
242 \def\\{\MessageBreak}%
243 \PackageNote{babel}{#1}%
244 \endgroup}
245 \def\bbl@info#1{%
246 \begingroup
247 \def\\{\MessageBreak}%
248 \PackageInfo{babel}{#1}%
249 \endgroup}

66
This file also takes care of a number of compatibility issues with other packages an defines a few
aditional package options. Apart from all the language options below we also have a few options that
influence the behavior of language definition files.
Many of the following options don’t do anything themselves, they are just defined in order to make it
possible for babel and language definition files to check if one of them was specified by the user.
But first, include here the Basic macros defined above.
250 hhBasic macrosii
251 \@ifpackagewith{babel}{silent}
252 {\let\bbl@info\@gobble
253 \let\bbl@infowarn\@gobble
254 \let\bbl@warning\@gobble}
255 {}
256 %
257 \def\AfterBabelLanguage#1{%
258 \global\expandafter\bbl@add\csname#1.ldf-h@@k\endcsname}%

If the format created a list of loaded languages (in \bbl@languages), get the name of the 0-th to show
the actual language used. Also avaliable with base, because it just shows info.
259 \ifx\bbl@languages\@undefined\else
260 \begingroup
261 \catcode`\^^I=12
262 \@ifpackagewith{babel}{showlanguages}{%
263 \begingroup
264 \def\bbl@elt#1#2#3#4{\wlog{#2^^I#1^^I#3^^I#4}}%
265 \wlog{<*languages>}%
266 \bbl@languages
267 \wlog{</languages>}%
268 \endgroup}{}
269 \endgroup
270 \def\bbl@elt#1#2#3#4{%
271 \ifnum#2=\z@
272 \gdef\bbl@nulllanguage{#1}%
273 \def\bbl@elt##1##2##3##4{}%
274 \fi}%
275 \bbl@languages
276 \fi%

6.3 base
The first ‘real’ option to be processed is base, which set the hyphenation patterns then resets
[email protected] so that LATEXforgets about the first loading. After a subset of babel.def has been
loaded (the old switch.def) and \AfterBabelLanguage defined, it exits.
Now the base option. With it we can define (and load, with luatex) hyphenation patterns, even if we
are not interesed in the rest of babel.

277 \bbl@trace{Defining option 'base'}


278 \@ifpackagewith{babel}{base}{%
279 \let\bbl@onlyswitch\@empty
280 \let\bbl@provide@locale\relax
281 \input babel.def
282 \let\bbl@onlyswitch\@undefined
283 \ifx\directlua\@undefined
284 \DeclareOption*{\bbl@patterns{\CurrentOption}}%
285 \else
286 \input luababel.def
287 \DeclareOption*{\bbl@patterns@lua{\CurrentOption}}%
288 \fi
289 \DeclareOption{base}{}%
290 \DeclareOption{showlanguages}{}%
291 \ProcessOptions
292 \global\expandafter\let\csname [email protected]\endcsname\relax
293 \global\expandafter\let\csname [email protected]\endcsname\relax
294 \global\let\@ifl@ter@@\@ifl@ter
295 \def\@ifl@ter#1#2#3#4#5{\global\let\@ifl@ter\@ifl@ter@@}%
296 \endinput}{}%

67
6.4 key=value options and other general option
The following macros extract language modifiers, and only real package options are kept in the
option list. Modifiers are saved and assigned to \BabelModifiers at \bbl@load@language; when no
modifiers have been given, the former is \relax. How modifiers are handled are left to language
styles; they can use \in@, loop them with \@for or load keyval, for example.
297 \bbl@trace{key=value and another general options}
298 \bbl@csarg\let{tempa\expandafter}\csname [email protected]\endcsname
299 \def\bbl@tempb#1.#2{% Remove trailing dot
300 #1\ifx\@empty#2\else,\bbl@afterfi\bbl@tempb#2\fi}%
301 \def\bbl@tempd#1.#2\@nnil{% TODO. Refactor lists?
302 \ifx\@empty#2%
303 \edef\bbl@tempc{\ifx\bbl@tempc\@empty\else\bbl@tempc,\fi#1}%
304 \else
305 \in@{,provide=}{,#1}%
306 \ifin@
307 \edef\bbl@tempc{%
308 \ifx\bbl@tempc\@empty\else\bbl@tempc,\fi#1.\bbl@tempb#2}%
309 \else
310 \in@{=}{#1}%
311 \ifin@
312 \edef\bbl@tempc{\ifx\bbl@tempc\@empty\else\bbl@tempc,\fi#1.#2}%
313 \else
314 \edef\bbl@tempc{\ifx\bbl@tempc\@empty\else\bbl@tempc,\fi#1}%
315 \bbl@csarg\edef{mod@#1}{\bbl@tempb#2}%
316 \fi
317 \fi
318 \fi}
319 \let\bbl@tempc\@empty
320 \bbl@foreach\bbl@tempa{\bbl@tempd#1.\@empty\@nnil}
321 \expandafter\let\csname [email protected]\endcsname\bbl@tempc

The next option tells babel to leave shorthand characters active at the end of processing the package.
This is not the default as it can cause problems with other packages, but for those who want to use
the shorthand characters in the preamble of their documents this can help.
322 \DeclareOption{KeepShorthandsActive}{}
323 \DeclareOption{activeacute}{}
324 \DeclareOption{activegrave}{}
325 \DeclareOption{debug}{}
326 \DeclareOption{noconfigs}{}
327 \DeclareOption{showlanguages}{}
328 \DeclareOption{silent}{}
329 % \DeclareOption{mono}{}
330 \DeclareOption{shorthands=off}{\bbl@tempa shorthands=\bbl@tempa}
331 \chardef\bbl@iniflag\z@
332 \DeclareOption{provide=*}{\chardef\bbl@iniflag\@ne} % main -> +1
333 \DeclareOption{provide+=*}{\chardef\bbl@iniflag\tw@} % add = 2
334 \DeclareOption{provide*=*}{\chardef\bbl@iniflag\thr@@} % add + main
335 % A separate option
336 \let\bbl@autoload@options\@empty
337 \DeclareOption{provide@=*}{\def\bbl@autoload@options{import}}
338 % Don't use. Experimental. TODO.
339 \newif\ifbbl@single
340 \DeclareOption{selectors=off}{\bbl@singletrue}
341 hhMore package optionsii

Handling of package options is done in three passes. (I [JBL] am not very happy with the idea,
anyway.) The first one processes options which has been declared above or follow the syntax
<key>=<value>, the second one loads the requested languages, except the main one if set with the
key main, and the third one loads the latter. First, we “flag” valid keys with a nil value.
342 \let\bbl@opt@shorthands\@nnil
343 \let\bbl@opt@config\@nnil
344 \let\bbl@opt@main\@nnil
345 \let\bbl@opt@headfoot\@nnil

68
346 \let\bbl@opt@layout\@nnil
347 \let\bbl@opt@provide\@nnil

The following tool is defined temporarily to store the values of options.


348 \def\bbl@tempa#1=#2\bbl@tempa{%
349 \bbl@csarg\ifx{opt@#1}\@nnil
350 \bbl@csarg\edef{opt@#1}{#2}%
351 \else
352 \bbl@error
353 {Bad option '#1=#2'. Either you have misspelled the\\%
354 key or there is a previous setting of '#1'. Valid\\%
355 keys are, among others, 'shorthands', 'main', 'bidi',\\%
356 'strings', 'config', 'headfoot', 'safe', 'math'.}%
357 {See the manual for further details.}
358 \fi}
Now the option list is processed, taking into account only currently declared options (including those
declared with a =), and <key>=<value> options (the former take precedence). Unrecognized options
are saved in \bbl@language@opts, because they are language options.
359 \let\bbl@language@opts\@empty
360 \DeclareOption*{%
361 \bbl@xin@{\string=}{\CurrentOption}%
362 \ifin@
363 \expandafter\bbl@tempa\CurrentOption\bbl@tempa
364 \else
365 \bbl@add@list\bbl@language@opts{\CurrentOption}%
366 \fi}
Now we finish the first pass (and start over).
367 \ProcessOptions*

368 \ifx\bbl@opt@provide\@nnil
369 \let\bbl@opt@provide\@empty % %%% MOVE above
370 \else
371 \chardef\bbl@iniflag\@ne
372 \bbl@exp{\\\bbl@forkv{\@nameuse{@raw@[email protected]}}}{%
373 \in@{,provide,}{,#1,}%
374 \ifin@
375 \def\bbl@opt@provide{#2}%
376 \bbl@replace\bbl@opt@provide{;}{,}%
377 \fi}
378 \fi
379 %

6.5 Conditional loading of shorthands


If there is no shorthands=<chars>, the original babel macros are left untouched, but if there is, these
macros are wrapped (in babel.def) to define only those given.
A bit of optimization: if there is no shorthands=, then \bbl@ifshorthand is always true, and it is
always false if shorthands is empty. Also, some code makes sense only with shorthands=....
380 \bbl@trace{Conditional loading of shorthands}
381 \def\bbl@sh@string#1{%
382 \ifx#1\@empty\else
383 \ifx#1t\string~%
384 \else\ifx#1c\string,%
385 \else\string#1%
386 \fi\fi
387 \expandafter\bbl@sh@string
388 \fi}
389 \ifx\bbl@opt@shorthands\@nnil
390 \def\bbl@ifshorthand#1#2#3{#2}%
391 \else\ifx\bbl@opt@shorthands\@empty
392 \def\bbl@ifshorthand#1#2#3{#3}%
393 \else

69
The following macro tests if a shorthand is one of the allowed ones.

394 \def\bbl@ifshorthand#1{%
395 \bbl@xin@{\string#1}{\bbl@opt@shorthands}%
396 \ifin@
397 \expandafter\@firstoftwo
398 \else
399 \expandafter\@secondoftwo
400 \fi}

We make sure all chars in the string are ‘other’, with the help of an auxiliary macro defined above
(which also zaps spaces).

401 \edef\bbl@opt@shorthands{%
402 \expandafter\bbl@sh@string\bbl@opt@shorthands\@empty}%

The following is ignored with shorthands=off, since it is intended to take some aditional actions for
certain chars.

403 \bbl@ifshorthand{'}%
404 {\PassOptionsToPackage{activeacute}{babel}}{}
405 \bbl@ifshorthand{`}%
406 {\PassOptionsToPackage{activegrave}{babel}}{}
407 \fi\fi

With headfoot=lang we can set the language used in heads/foots. For example, in babel/3796 just
adds headfoot=english. It misuses \@resetactivechars but seems to work.

408 \ifx\bbl@opt@headfoot\@nnil\else
409 \g@addto@macro\@resetactivechars{%
410 \set@typeset@protect
411 \expandafter\select@language@x\expandafter{\bbl@opt@headfoot}%
412 \let\protect\noexpand}
413 \fi

For the option safe we use a different approach – \bbl@opt@safe says which macros are redefined (B
for bibs and R for refs). By default, both are currently set, but in a future release it will be set to none.

414 \ifx\bbl@opt@safe\@undefined
415 \def\bbl@opt@safe{BR}
416 % \let\bbl@opt@safe\@empty % Pending of \cite
417 \fi

For layout an auxiliary macro is provided, available for packages and language styles. Optimization:
if there is no layout, just do nothing.

418 \bbl@trace{Defining IfBabelLayout}


419 \ifx\bbl@opt@layout\@nnil
420 \newcommand\IfBabelLayout[3]{#3}%
421 \else
422 \newcommand\IfBabelLayout[1]{%
423 \@expandtwoargs\in@{.#1.}{.\bbl@opt@layout.}%
424 \ifin@
425 \expandafter\@firstoftwo
426 \else
427 \expandafter\@secondoftwo
428 \fi}
429 \fi
430 h/packagei
431 h∗corei

6.6 Interlude for Plain


Because of the way docstrip works, we need to insert some code for Plain here. However, the tools
provided by the babel installer for literate programming makes this section a short interlude,
because the actual code is below, tagged as Emulate LaTeX.

432 \ifx\ldf@quit\@undefined\else
433 \endinput\fi % Same line!
434 hhMake sure ProvidesFile is definedii

70
435 \ProvidesFile{babel.def}[ hhdateii hhversionii Babel common definitions]
436 \ifx\AtBeginDocument\@undefined % TODO. change test.
437 hhEmulate LaTeX ii
438 \fi

That is all for the moment. Now follows some common stuff, for both Plain and LATEX. After it, we will
resume the LATEX-only stuff.

439 h/corei
440 h∗package | corei

7 Multiple languages
This is not a separate file (switch.def) anymore.
Plain TEX version 3.0 provides the primitive \language that is used to store the current language.
When used with a pre-3.0 version this function has to be implemented by allocating a counter.

441 \def\bbl@version{ hhversionii}


442 \def\bbl@date{ hhdateii}
443 hhDefine core switching macrosii

\adddialect The macro \adddialect can be used to add the name of a dialect or variant language, for which an
already defined hyphenation table can be used.

444 \def\adddialect#1#2{%
445 \global\chardef#1#2\relax
446 \bbl@usehooks{adddialect}{{#1}{#2}}%
447 \begingroup
448 \count@#1\relax
449 \def\bbl@elt##1##2##3##4{%
450 \ifnum\count@=##2\relax
451 \edef\bbl@tempa{\expandafter\@gobbletwo\string#1}%
452 \bbl@info{Hyphen rules for '\expandafter\@gobble\bbl@tempa'
453 set to \expandafter\string\csname l@##1\endcsname\\%
454 (\string\language\the\count@). Reported}%
455 \def\bbl@elt####1####2####3####4{}%
456 \fi}%
457 \bbl@cs{languages}%
458 \endgroup}

\bbl@iflanguage executes code only if the language l@ exists. Otherwise raises an error.
The argument of \bbl@fixname has to be a macro name, as it may get “fixed” if casing (lc/uc) is
wrong. It’s an attempt to fix a long-standing bug when \foreignlanguage and the like appear in a
\MakeXXXcase. However, a lowercase form is not imposed to improve backward compatibility
(perhaps you defined a language named MYLANG, but unfortunately mixed case names cannot be
trapped). Note l@ is encapsulated, so that its case does not change.

459 \def\bbl@fixname#1{%
460 \begingroup
461 \def\bbl@tempe{l@}%
462 \edef\bbl@tempd{\noexpand\@ifundefined{\noexpand\bbl@tempe#1}}%
463 \bbl@tempd
464 {\lowercase\expandafter{\bbl@tempd}%
465 {\uppercase\expandafter{\bbl@tempd}%
466 \@empty
467 {\edef\bbl@tempd{\def\noexpand#1{#1}}%
468 \uppercase\expandafter{\bbl@tempd}}}%
469 {\edef\bbl@tempd{\def\noexpand#1{#1}}%
470 \lowercase\expandafter{\bbl@tempd}}}%
471 \@empty
472 \edef\bbl@tempd{\endgroup\def\noexpand#1{#1}}%
473 \bbl@tempd
474 \bbl@exp{\\\bbl@usehooks{languagename}{{\languagename}{#1}}}}
475 \def\bbl@iflanguage#1{%
476 \@ifundefined{l@#1}{\@nolanerr{#1}\@gobble}\@firstofone}

71
After a name has been ‘fixed’, the selectors will try to load the language. If even the fixed name is not
defined, will load it on the fly, either based on its name, or if activated, its BCP47 code.
We first need a couple of macros for a simple BCP 47 look up. It also makes sure, with \bbl@bcpcase,
casing is the correct one, so that sr-latn-ba becomes fr-Latn-BA. Note #4 may contain some \@empty’s,
but they are eventually removed. \bbl@bcplookup either returns the found ini or it is \relax.

477 \def\bbl@bcpcase#1#2#3#4\@@#5{%
478 \ifx\@empty#3%
479 \uppercase{\def#5{#1#2}}%
480 \else
481 \uppercase{\def#5{#1}}%
482 \lowercase{\edef#5{#5#2#3#4}}%
483 \fi}
484 \def\bbl@bcplookup#1-#2-#3-#4\@@{%
485 \let\bbl@bcp\relax
486 \lowercase{\def\bbl@tempa{#1}}%
487 \ifx\@empty#2%
488 \IfFileExists{babel-\[email protected]}{\let\bbl@bcp\bbl@tempa}{}%
489 \else\ifx\@empty#3%
490 \bbl@bcpcase#2\@empty\@empty\@@\bbl@tempb
491 \IfFileExists{babel-\bbl@tempa-\[email protected]}%
492 {\edef\bbl@bcp{\bbl@tempa-\bbl@tempb}}%
493 {}%
494 \ifx\bbl@bcp\relax
495 \IfFileExists{babel-\[email protected]}{\let\bbl@bcp\bbl@tempa}{}%
496 \fi
497 \else
498 \bbl@bcpcase#2\@empty\@empty\@@\bbl@tempb
499 \bbl@bcpcase#3\@empty\@empty\@@\bbl@tempc
500 \IfFileExists{babel-\bbl@tempa-\bbl@tempb-\[email protected]}%
501 {\edef\bbl@bcp{\bbl@tempa-\bbl@tempb-\bbl@tempc}}%
502 {}%
503 \ifx\bbl@bcp\relax
504 \IfFileExists{babel-\bbl@tempa-\[email protected]}%
505 {\edef\bbl@bcp{\bbl@tempa-\bbl@tempc}}%
506 {}%
507 \fi
508 \ifx\bbl@bcp\relax
509 \IfFileExists{babel-\bbl@tempa-\[email protected]}%
510 {\edef\bbl@bcp{\bbl@tempa-\bbl@tempc}}%
511 {}%
512 \fi
513 \ifx\bbl@bcp\relax
514 \IfFileExists{babel-\[email protected]}{\let\bbl@bcp\bbl@tempa}{}%
515 \fi
516 \fi\fi}
517 \let\bbl@initoload\relax
518 \def\bbl@provide@locale{%
519 \ifx\babelprovide\@undefined
520 \bbl@error{For a language to be defined on the fly 'base'\\%
521 is not enough, and the whole package must be\\%
522 loaded. Either delete the 'base' option or\\%
523 request the languages explicitly}%
524 {See the manual for further details.}%
525 \fi
526 \let\bbl@auxname\languagename % Still necessary. TODO
527 \bbl@ifunset{bbl@bcp@map@\languagename}{}% Move uplevel??
528 {\edef\languagename{\@nameuse{bbl@bcp@map@\languagename}}}%
529 \ifbbl@bcpallowed
530 \expandafter\ifx\csname date\languagename\endcsname\relax
531 \expandafter
532 \bbl@bcplookup\languagename-\@empty-\@empty-\@empty\@@
533 \ifx\bbl@bcp\relax\else % Returned by \bbl@bcplookup
534 \edef\languagename{\bbl@bcp@prefix\bbl@bcp}%

72
535 \edef\localename{\bbl@bcp@prefix\bbl@bcp}%
536 \expandafter\ifx\csname date\languagename\endcsname\relax
537 \let\bbl@initoload\bbl@bcp
538 \bbl@exp{\\\babelprovide[\bbl@autoload@bcpoptions]{\languagename}}%
539 \let\bbl@initoload\relax
540 \fi
541 \bbl@csarg\xdef{bcp@map@\bbl@bcp}{\localename}%
542 \fi
543 \fi
544 \fi
545 \expandafter\ifx\csname date\languagename\endcsname\relax
546 \IfFileExists{babel-\languagename.tex}%
547 {\bbl@exp{\\\babelprovide[\bbl@autoload@options]{\languagename}}}%
548 {}%
549 \fi}

\iflanguage Users might want to test (in a private package for instance) which language is currently active. For
this we provide a test macro, \iflanguage, that has three arguments. It checks whether the first
argument is a known language. If so, it compares the first argument with the value of \language.
Then, depending on the result of the comparison, it executes either the second or the third argument.

550 \def\iflanguage#1{%
551 \bbl@iflanguage{#1}{%
552 \ifnum\csname l@#1\endcsname=\language
553 \expandafter\@firstoftwo
554 \else
555 \expandafter\@secondoftwo
556 \fi}}

7.1 Selecting the language


\selectlanguage The macro \selectlanguage checks whether the language is already defined before it performs its
actual task, which is to update \language and activate language-specific definitions.

557 \let\bbl@select@type\z@
558 \edef\selectlanguage{%
559 \noexpand\protect
560 \expandafter\noexpand\csname selectlanguage \endcsname}

Because the command \selectlanguage could be used in a moving argument it expands to


\protect\selectlanguage . Therefore, we have to make sure that a macro \protect exists. If it
doesn’t it is \let to \relax.

561 \ifx\@undefined\protect\let\protect\relax\fi

The following definition is preserved for backwards compatibility (eg, arabi, koma). It is related to a
trick for 2.09, now discarded.

562 \let\xstring\string

Since version 3.5 babel writes entries to the auxiliary files in order to typeset table of contents etc. in
the correct language environment.

\bbl@pop@language But when the language change happens inside a group the end of the group doesn’t write anything to
the auxiliary files. Therefore we need TEX’s aftergroup mechanism to help us. The command
\aftergroup stores the token immediately following it to be executed when the current group is
closed. So we define a temporary control sequence \bbl@pop@language to be executed at the end of
the group. It calls \bbl@set@language with the name of the current language as its argument.

\bbl@language@stack The previous solution works for one level of nesting groups, but as soon as more levels are used it is
no longer adequate. For that case we need to keep track of the nested languages using a stack
mechanism. This stack is called \bbl@language@stack and initially empty.

563 \def\bbl@language@stack{}

When using a stack we need a mechanism to push an element on the stack and to retrieve the
information afterwards.

73
\bbl@push@language The stack is simply a list of languagenames, separated with a ‘+’ sign; the push function can be simple:
\bbl@pop@language
564 \def\bbl@push@language{%
565 \ifx\languagename\@undefined\else
566 \ifx\currentgrouplevel\@undefined
567 \xdef\bbl@language@stack{\languagename+\bbl@language@stack}%
568 \else
569 \ifnum\currentgrouplevel=\z@
570 \xdef\bbl@language@stack{\languagename+}%
571 \else
572 \xdef\bbl@language@stack{\languagename+\bbl@language@stack}%
573 \fi
574 \fi
575 \fi}
Retrieving information from the stack is a little bit less simple, as we need to remove the element
from the stack while storing it in the macro \languagename. For this we first define a helper
function.

\bbl@pop@lang This macro stores its first element (which is delimited by the ‘+’-sign) in \languagename and stores
the rest of the string in \bbl@language@stack.
576 \def\bbl@pop@lang#1+#2\@@{%
577 \edef\languagename{#1}%
578 \xdef\bbl@language@stack{#2}}

The reason for the somewhat weird arrangement of arguments to the helper function is the fact it is
called in the following way. This means that before \bbl@pop@lang is executed TEX first expands the
stack, stored in \bbl@language@stack. The result of that is that the argument string of
\bbl@pop@lang contains one or more language names, each followed by a ‘+’-sign (zero language
names won’t occur as this macro will only be called after something has been pushed on the stack).
579 \let\bbl@ifrestoring\@secondoftwo
580 \def\bbl@pop@language{%
581 \expandafter\bbl@pop@lang\bbl@language@stack\@@
582 \let\bbl@ifrestoring\@firstoftwo
583 \expandafter\bbl@set@language\expandafter{\languagename}%
584 \let\bbl@ifrestoring\@secondoftwo}
Once the name of the previous language is retrieved from the stack, it is fed to \bbl@set@language
to do the actual work of switching everything that needs switching.

An alternative way to identify languages (in the babel sense) with a numerical value is introduced in
3.30. This is one of the first steps for a new interface based on the concept of locale, which explains
the name of \localeid. This means \l@... will be reserved for hyphenation patterns (so that two
locales can share the same rules).
585 \chardef\localeid\z@
586 \def\bbl@id@last{0} % No real need for a new counter
587 \def\bbl@id@assign{%
588 \bbl@ifunset{bbl@id@@\languagename}%
589 {\count@\bbl@id@last\relax
590 \advance\count@\@ne
591 \bbl@csarg\chardef{id@@\languagename}\count@
592 \edef\bbl@id@last{\the\count@}%
593 \ifcase\bbl@engine\or
594 \directlua{
595 Babel = Babel or {}
596 Babel.locale_props = Babel.locale_props or {}
597 Babel.locale_props[\bbl@id@last] = {}
598 Babel.locale_props[\bbl@id@last].name = '\languagename'
599 }%
600 \fi}%
601 {}%
602 \chardef\localeid\bbl@cl{id@}}
The unprotected part of \selectlanguage.
603 \expandafter\def\csname selectlanguage \endcsname#1{%

74
604 \ifnum\bbl@hymapsel=\@cclv\let\bbl@hymapsel\tw@\fi
605 \bbl@push@language
606 \aftergroup\bbl@pop@language
607 \bbl@set@language{#1}}

\bbl@set@language The macro \bbl@set@language takes care of switching the language environment and of writing
entries on the auxiliary files. For historial reasons, language names can be either language of
\language. To catch either form a trick is used, but unfortunately as a side effect the catcodes of
letters in \languagename are messed up. This is a bug, but preserved for backwards compatibility.
The list of auxiliary files can be extended by redefining \BabelContentsFiles, but make sure they
are loaded inside a group (as aux, toc, lof, and lot do) or the last language of the document will
remain active afterwards.
We also write a command to change the current language in the auxiliary files.
\bbl@savelastskip is used to deal with skips before the write whatsit (as suggested by U Fischer).
Adapted from hyperref, but it might fail, so I’ll consider it a temporary hack, while I study other
options (the ideal, but very likely unfeasible except perhaps in luatex, is to avoid the \write
altogether when not needed).

608 \def\BabelContentsFiles{toc,lof,lot}
609 \def\bbl@set@language#1{% from selectlanguage, pop@
610 % The old buggy way. Preserved for compatibility.
611 \edef\languagename{%
612 \ifnum\escapechar=\expandafter`\string#1\@empty
613 \else\string#1\@empty\fi}%
614 \ifcat\relax\noexpand#1%
615 \expandafter\ifx\csname date\languagename\endcsname\relax
616 \edef\languagename{#1}%
617 \let\localename\languagename
618 \else
619 \bbl@info{Using '\string\language' instead of 'language' is\\%
620 deprecated. If what you want is to use a\\%
621 macro containing the actual locale, make\\%
622 sure it does not not match any language.\\%
623 Reported}%
624 \ifx\scantokens\@undefined
625 \def\localename{??}%
626 \else
627 \scantokens\expandafter{\expandafter
628 \def\expandafter\localename\expandafter{\languagename}}%
629 \fi
630 \fi
631 \else
632 \def\localename{#1}% This one has the correct catcodes
633 \fi
634 \select@language{\languagename}%
635 % write to auxs
636 \expandafter\ifx\csname date\languagename\endcsname\relax\else
637 \if@filesw
638 \ifx\babel@aux\@gobbletwo\else % Set if single in the first, redundant
639 \bbl@savelastskip
640 \protected@write\@auxout{}{\string\babel@aux{\bbl@auxname}{}}%
641 \bbl@restorelastskip
642 \fi
643 \bbl@usehooks{write}{}%
644 \fi
645 \fi}
646 %
647 \let\bbl@restorelastskip\relax
648 \let\bbl@savelastskip\relax
649 %
650 \newif\ifbbl@bcpallowed
651 \bbl@bcpallowedfalse
652 \def\select@language#1{% from set@, babel@aux
653 \ifx\bbl@selectorname\@empty

75
654 \def\bbl@selectorname{select}%
655 % set hymap
656 \fi
657 \ifnum\bbl@hymapsel=\@cclv\chardef\bbl@hymapsel4\relax\fi
658 % set name
659 \edef\languagename{#1}%
660 \bbl@fixname\languagename
661 % TODO. name@map must be here?
662 \bbl@provide@locale
663 \bbl@iflanguage\languagename{%
664 \let\bbl@select@type\z@
665 \expandafter\bbl@switch\expandafter{\languagename}}}
666 \def\babel@aux#1#2{%
667 \select@language{#1}%
668 \bbl@foreach\BabelContentsFiles{% \relax -> don't assume vertical mode
669 \@writefile{##1}{\babel@toc{#1}{#2}\relax}}}% TODO - plain?
670 \def\babel@toc#1#2{%
671 \select@language{#1}}

First, check if the user asks for a known language. If so, update the value of \language and call
\originalTeX to bring TEX in a certain pre-defined state.
The name of the language is stored in the control sequence \languagename.
Then we have to redefine \originalTeX to compensate for the things that have been activated. To
save memory space for the macro definition of \originalTeX, we construct the control sequence
name for the \noextrashlang i command at definition time by expanding the \csname primitive.
Now activate the language-specific definitions. This is done by constructing the names of three
macros by concatenating three words with the argument of \selectlanguage, and calling these
macros.
The switching of the values of \lefthyphenmin and \righthyphenmin is somewhat different. First
we save their current values, then we check if \hlang ihyphenmins is defined. If it is not, we set
default values (2 and 3), otherwise the values in \hlang ihyphenmins will be used.

672 \newif\ifbbl@usedategroup
673 \def\bbl@switch#1{% from select@, foreign@
674 % make sure there is info for the language if so requested
675 \bbl@ensureinfo{#1}%
676 % restore
677 \originalTeX
678 \expandafter\def\expandafter\originalTeX\expandafter{%
679 \csname noextras#1\endcsname
680 \let\originalTeX\@empty
681 \babel@beginsave}%
682 \bbl@usehooks{afterreset}{}%
683 \languageshorthands{none}%
684 % set the locale id
685 \bbl@id@assign
686 % switch captions, date
687 % No text is supposed to be added here, so we remove any
688 % spurious spaces.
689 \bbl@bsphack
690 \ifcase\bbl@select@type
691 \csname captions#1\endcsname\relax
692 \csname date#1\endcsname\relax
693 \else
694 \bbl@xin@{,captions,}{,\bbl@select@opts,}%
695 \ifin@
696 \csname captions#1\endcsname\relax
697 \fi
698 \bbl@xin@{,date,}{,\bbl@select@opts,}%
699 \ifin@ % if \foreign... within \<lang>date
700 \csname date#1\endcsname\relax
701 \fi
702 \fi
703 \bbl@esphack

76
704 % switch extras
705 \bbl@usehooks{beforeextras}{}%
706 \csname extras#1\endcsname\relax
707 \bbl@usehooks{afterextras}{}%
708 % > babel-ensure
709 % > babel-sh-<short>
710 % > babel-bidi
711 % > babel-fontspec
712 % hyphenation - case mapping
713 \ifcase\bbl@opt@hyphenmap\or
714 \def\BabelLower##1##2{\lccode##1=##2\relax}%
715 \ifnum\bbl@hymapsel>4\else
716 \csname\languagename @bbl@hyphenmap\endcsname
717 \fi
718 \chardef\bbl@opt@hyphenmap\z@
719 \else
720 \ifnum\bbl@hymapsel>\bbl@opt@hyphenmap\else
721 \csname\languagename @bbl@hyphenmap\endcsname
722 \fi
723 \fi
724 \let\bbl@hymapsel\@cclv
725 % hyphenation - select rules
726 \ifnum\csname l@\languagename\endcsname=\l@unhyphenated
727 \edef\bbl@tempa{u}%
728 \else
729 \edef\bbl@tempa{\bbl@cl{lnbrk}}%
730 \fi
731 % linebreaking - handle u, e, k (v in the future)
732 \bbl@xin@{/u}{/\bbl@tempa}%
733 \ifin@\else\bbl@xin@{/e}{/\bbl@tempa}\fi % elongated forms
734 \ifin@\else\bbl@xin@{/k}{/\bbl@tempa}\fi % only kashida
735 \ifin@\else\bbl@xin@{/p}{/\bbl@tempa}\fi % padding (eg, Tibetan)
736 \ifin@\else\bbl@xin@{/v}{/\bbl@tempa}\fi % variable font
737 \ifin@
738 % unhyphenated/kashida/elongated/padding = allow stretching
739 \language\l@unhyphenated
740 \babel@savevariable\emergencystretch
741 \emergencystretch\maxdimen
742 \babel@savevariable\hbadness
743 \hbadness\@M
744 \else
745 % other = select patterns
746 \bbl@patterns{#1}%
747 \fi
748 % hyphenation - mins
749 \babel@savevariable\lefthyphenmin
750 \babel@savevariable\righthyphenmin
751 \expandafter\ifx\csname #1hyphenmins\endcsname\relax
752 \set@hyphenmins\tw@\thr@@\relax
753 \else
754 \expandafter\expandafter\expandafter\set@hyphenmins
755 \csname #1hyphenmins\endcsname\relax
756 \fi
757 \let\bbl@selectorname\@empty}

otherlanguage (env.) The otherlanguage environment can be used as an alternative to using the \selectlanguage
declarative command. When you are typesetting a document which mixes left-to-right and
right-to-left typesetting you have to use this environment in order to let things work as you expect
them to.
The \ignorespaces command is necessary to hide the environment when it is entered in horizontal
mode.

758 \long\def\otherlanguage#1{%
759 \def\bbl@selectorname{other}%

77
760 \ifnum\bbl@hymapsel=\@cclv\let\bbl@hymapsel\thr@@\fi
761 \csname selectlanguage \endcsname{#1}%
762 \ignorespaces}

The \endotherlanguage part of the environment tries to hide itself when it is called in horizontal
mode.

763 \long\def\endotherlanguage{%
764 \global\@ignoretrue\ignorespaces}

otherlanguage* (env.) The otherlanguage environment is meant to be used when a large part of text from a different
language needs to be typeset, but without changing the translation of words such as ‘figure’. This
environment makes use of \foreign@language.

765 \expandafter\def\csname otherlanguage*\endcsname{%


766 \@ifnextchar[\bbl@otherlanguage@s{\bbl@otherlanguage@s[]}}
767 \def\bbl@otherlanguage@s[#1]#2{%
768 \def\bbl@selectorname{other*}%
769 \ifnum\bbl@hymapsel=\@cclv\chardef\bbl@hymapsel4\relax\fi
770 \def\bbl@select@opts{#1}%
771 \foreign@language{#2}}

At the end of the environment we need to switch off the extra definitions. The grouping mechanism
of the environment will take care of resetting the correct hyphenation rules and “extras”.

772 \expandafter\let\csname endotherlanguage*\endcsname\relax

\foreignlanguage The \foreignlanguage command is another substitute for the \selectlanguage command. This
command takes two arguments, the first argument is the name of the language to use for typesetting
the text specified in the second argument.
Unlike \selectlanguage this command doesn’t switch everything, it only switches the hyphenation
rules and the extra definitions for the language specified. It does this within a group and assumes the
\extrashlang i command doesn’t make any \global changes. The coding is very similar to part of
\selectlanguage.
\bbl@beforeforeign is a trick to fix a bug in bidi texts. \foreignlanguage is supposed to be a ‘text’
command, and therefore it must emit a \leavevmode, but it does not, and therefore the indent is
placed on the opposite margin. For backward compatibility, however, it is done only if a right-to-left
script is requested; otherwise, it is no-op.
(3.11) \foreignlanguage* is a temporary, experimental macro for a few lines with a different script
direction, while preserving the paragraph format (thank the braces around \par, things like
\hangindent are not reset). Do not use it in production, because its semantics and its syntax may
change (and very likely will, or even it could be removed altogether). Currently it enters in vmode
and then selects the language (which in turn sets the paragraph direction).
(3.11) Also experimental are the hook foreign and foreign*. With them you can redefine
\BabelText which by default does nothing. Its behavior is not well defined yet. So, use it in
horizontal mode only if you do not want surprises.
In other words, at the beginning of a paragraph \foreignlanguage enters into hmode with the
surrounding lang, and with \foreignlanguage* with the new lang.

773 \providecommand\bbl@beforeforeign{}
774 \edef\foreignlanguage{%
775 \noexpand\protect
776 \expandafter\noexpand\csname foreignlanguage \endcsname}
777 \expandafter\def\csname foreignlanguage \endcsname{%
778 \@ifstar\bbl@foreign@s\bbl@foreign@x}
779 \providecommand\bbl@foreign@x[3][]{%
780 \begingroup
781 \def\bbl@selectorname{foreign}%
782 \def\bbl@select@opts{#1}%
783 \let\BabelText\@firstofone
784 \bbl@beforeforeign
785 \foreign@language{#2}%
786 \bbl@usehooks{foreign}{}%
787 \BabelText{#3}% Now in horizontal mode!
788 \endgroup}
789 \def\bbl@foreign@s#1#2{% TODO - \shapemode, \@setpar, ?\@@par
790 \begingroup

78
791 {\par}%
792 \def\bbl@selectorname{foreign*}%
793 \let\bbl@select@opts\@empty
794 \let\BabelText\@firstofone
795 \foreign@language{#1}%
796 \bbl@usehooks{foreign*}{}%
797 \bbl@dirparastext
798 \BabelText{#2}% Still in vertical mode!
799 {\par}%
800 \endgroup}

\foreign@language This macro does the work for \foreignlanguage and the otherlanguage* environment. First we
need to store the name of the language and check that it is a known language. Then it just calls
bbl@switch.
801 \def\foreign@language#1{%
802 % set name
803 \edef\languagename{#1}%
804 \ifbbl@usedategroup
805 \bbl@add\bbl@select@opts{,date,}%
806 \bbl@usedategroupfalse
807 \fi
808 \bbl@fixname\languagename
809 % TODO. name@map here?
810 \bbl@provide@locale
811 \bbl@iflanguage\languagename{%
812 \let\bbl@select@type\@ne
813 \expandafter\bbl@switch\expandafter{\languagename}}}

The following macro executes conditionally some code based on the selector being used.
814 \def\IfBabelSelectorTF#1{%
815 \bbl@xin@{,\bbl@selectorname,}{,\zap@space#1 \@empty,}%
816 \ifin@
817 \expandafter\@firstoftwo
818 \else
819 \expandafter\@secondoftwo
820 \fi}

\bbl@patterns This macro selects the hyphenation patterns by changing the \language register. If special
hyphenation patterns are available specifically for the current font encoding, use them instead of the
default.
It also sets hyphenation exceptions, but only once, because they are global (here language \lccode’s
has been set, too). \bbl@hyphenation@ is set to relax until the very first \babelhyphenation, so do
nothing with this value. If the exceptions for a language (by its number, not its name, so that :ENC is
taken into account) has been set, then use \hyphenation with both global and language exceptions
and empty the latter to mark they must not be set again.
821 \let\bbl@hyphlist\@empty
822 \let\bbl@hyphenation@\relax
823 \let\bbl@pttnlist\@empty
824 \let\bbl@patterns@\relax
825 \let\bbl@hymapsel=\@cclv
826 \def\bbl@patterns#1{%
827 \language=\expandafter\ifx\csname l@#1:\f@encoding\endcsname\relax
828 \csname l@#1\endcsname
829 \edef\bbl@tempa{#1}%
830 \else
831 \csname l@#1:\f@encoding\endcsname
832 \edef\bbl@tempa{#1:\f@encoding}%
833 \fi
834 \@expandtwoargs\bbl@usehooks{patterns}{{#1}{\bbl@tempa}}%
835 % > luatex
836 \@ifundefined{bbl@hyphenation@}{}{% Can be \relax!
837 \begingroup
838 \bbl@xin@{,\number\language,}{,\bbl@hyphlist}%

79
839 \ifin@\else
840 \@expandtwoargs\bbl@usehooks{hyphenation}{{#1}{\bbl@tempa}}%
841 \hyphenation{%
842 \bbl@hyphenation@
843 \@ifundefined{bbl@hyphenation@#1}%
844 \@empty
845 {\space\csname bbl@hyphenation@#1\endcsname}}%
846 \xdef\bbl@hyphlist{\bbl@hyphlist\number\language,}%
847 \fi
848 \endgroup}}

hyphenrules (env.) The environment hyphenrules can be used to select just the hyphenation rules. This environment
does not change \languagename and when the hyphenation rules specified were not loaded it has no
effect. Note however, \lccode’s and font encodings are not set at all, so in most cases you should use
otherlanguage*.

849 \def\hyphenrules#1{%
850 \edef\bbl@tempf{#1}%
851 \bbl@fixname\bbl@tempf
852 \bbl@iflanguage\bbl@tempf{%
853 \expandafter\bbl@patterns\expandafter{\bbl@tempf}%
854 \ifx\languageshorthands\@undefined\else
855 \languageshorthands{none}%
856 \fi
857 \expandafter\ifx\csname\bbl@tempf hyphenmins\endcsname\relax
858 \set@hyphenmins\tw@\thr@@\relax
859 \else
860 \expandafter\expandafter\expandafter\set@hyphenmins
861 \csname\bbl@tempf hyphenmins\endcsname\relax
862 \fi}}
863 \let\endhyphenrules\@empty

\providehyphenmins The macro \providehyphenmins should be used in the language definition files to provide a default
setting for the hyphenation parameters \lefthyphenmin and \righthyphenmin. If the macro
\hlang ihyphenmins is already defined this command has no effect.

864 \def\providehyphenmins#1#2{%
865 \expandafter\ifx\csname #1hyphenmins\endcsname\relax
866 \@namedef{#1hyphenmins}{#2}%
867 \fi}

\set@hyphenmins This macro sets the values of \lefthyphenmin and \righthyphenmin. It expects two values as its
argument.

868 \def\set@hyphenmins#1#2{%
869 \lefthyphenmin#1\relax
870 \righthyphenmin#2\relax}

\ProvidesLanguage The identification code for each file is something that was introduced in LATEX 2ε . When the
command \ProvidesFile does not exist, a dummy definition is provided temporarily. For use in the
language definition file the command \ProvidesLanguage is defined by babel.
Depending on the format, ie, on if the former is defined, we use a similar definition or not.

871 \ifx\ProvidesFile\@undefined
872 \def\ProvidesLanguage#1[#2 #3 #4]{%
873 \wlog{Language: #1 #4 #3 <#2>}%
874 }
875 \else
876 \def\ProvidesLanguage#1{%
877 \begingroup
878 \catcode`\ 10 %
879 \@makeother\/%
880 \@ifnextchar[%]
881 {\@provideslanguage{#1}}{\@provideslanguage{#1}[]}}
882 \def\@provideslanguage#1[#2]{%
883 \wlog{Language: #1 #2}%

80
884 \expandafter\xdef\csname ver@#1.ldf\endcsname{#2}%
885 \endgroup}
886 \fi

\originalTeX The macro\originalTeX should be known to TEX at this moment. As it has to be expandable we \let
it to \@empty instead of \relax.
887 \ifx\originalTeX\@undefined\let\originalTeX\@empty\fi

Because this part of the code can be included in a format, we make sure that the macro which
initializes the save mechanism, \babel@beginsave, is not considered to be undefined.
888 \ifx\babel@beginsave\@undefined\let\babel@beginsave\relax\fi

A few macro names are reserved for future releases of babel, which will use the concept of ‘locale’:
889 \providecommand\setlocale{%
890 \bbl@error
891 {Not yet available}%
892 {Find an armchair, sit down and wait}}
893 \let\uselocale\setlocale
894 \let\locale\setlocale
895 \let\selectlocale\setlocale
896 \let\textlocale\setlocale
897 \let\textlanguage\setlocale
898 \let\languagetext\setlocale

7.2 Errors
\@nolanerr The babel package will signal an error when a documents tries to select a language that hasn’t been
\@nopatterns defined earlier. When a user selects a language for which no hyphenation patterns were loaded into
the format he will be given a warning about that fact. We revert to the patterns for \language=0 in
that case. In most formats that will be (US)english, but it might also be empty.

\@noopterr When the package was loaded without options not everything will work as expected. An error
message is issued in that case.
When the format knows about \PackageError it must be LATEX 2ε , so we can safely use its error
handling interface. Otherwise we’ll have to ‘keep it simple’.
Infos are not written to the console, but on the other hand many people think warnings are errors, so
a further message type is defined: an important info which is sent to the console.
899 \edef\bbl@nulllanguage{\string\language=0}
900 \def\bbl@nocaption{\protect\bbl@nocaption@i}
901 \def\bbl@nocaption@i#1#2{% 1: text to be printed 2: caption macro \langXname
902 \global\@namedef{#2}{\textbf{?#1?}}%
903 \@nameuse{#2}%
904 \edef\bbl@tempa{#1}%
905 \bbl@sreplace\bbl@tempa{name}{}%
906 \bbl@warning{%
907 \@backslashchar#1 not set for '\languagename'. Please,\\%
908 define it after the language has been loaded\\%
909 (typically in the preamble) with:\\%
910 \string\setlocalecaption{\languagename}{\bbl@tempa}{..}\\%
911 Feel free to contribute on github.com/latex3/babel.\\%
912 Reported}}
913 \def\bbl@tentative{\protect\bbl@tentative@i}
914 \def\bbl@tentative@i#1{%
915 \bbl@warning{%
916 Some functions for '#1' are tentative.\\%
917 They might not work as expected and their behavior\\%
918 could change in the future.\\%
919 Reported}}
920 \def\@nolanerr#1{%
921 \bbl@error
922 {You haven't defined the language '#1' yet.\\%
923 Perhaps you misspelled it or your installation\\%
924 is not complete}%

81
925 {Your command will be ignored, type <return> to proceed}}
926 \def\@nopatterns#1{%
927 \bbl@warning
928 {No hyphenation patterns were preloaded for\\%
929 the language '#1' into the format.\\%
930 Please, configure your TeX system to add them and\\%
931 rebuild the format. Now I will use the patterns\\%
932 preloaded for \bbl@nulllanguage\space instead}}
933 \let\bbl@usehooks\@gobbletwo
934 \ifx\bbl@onlyswitch\@empty\endinput\fi
935 % Here ended switch.def

Here ended the now discarded switch.def. Here also (currently) ends the base option.

936 \ifx\directlua\@undefined\else
937 \ifx\bbl@luapatterns\@undefined
938 \input luababel.def
939 \fi
940 \fi
941 hhBasic macrosii
942 \bbl@trace{Compatibility with language.def}
943 \ifx\bbl@languages\@undefined
944 \ifx\directlua\@undefined
945 \openin1 = language.def % TODO. Remove hardcoded number
946 \ifeof1
947 \closein1
948 \message{I couldn't find the file language.def}
949 \else
950 \closein1
951 \begingroup
952 \def\addlanguage#1#2#3#4#5{%
953 \expandafter\ifx\csname lang@#1\endcsname\relax\else
954 \global\expandafter\let\csname l@#1\expandafter\endcsname
955 \csname lang@#1\endcsname
956 \fi}%
957 \def\uselanguage#1{}%
958 \input language.def
959 \endgroup
960 \fi
961 \fi
962 \chardef\l@english\z@
963 \fi

\addto It takes two arguments, a hcontrol sequencei and TEX-code to be added to the hcontrol sequencei.
If the hcontrol sequencei has not been defined before it is defined now. The control sequence could
also expand to \relax, in which case a circular definition results. The net result is a stack overflow.
Note there is an inconsistency, because the assignment in the last branch is global.

964 \def\addto#1#2{%
965 \ifx#1\@undefined
966 \def#1{#2}%
967 \else
968 \ifx#1\relax
969 \def#1{#2}%
970 \else
971 {\toks@\expandafter{#1#2}%
972 \xdef#1{\the\toks@}}%
973 \fi
974 \fi}

The macro \initiate@active@char below takes all the necessary actions to make its argument a
shorthand character. The real work is performed once for each character. But first we define a little
tool.

975 \def\bbl@withactive#1#2{%
976 \begingroup

82
977 \lccode`~=`#2\relax
978 \lowercase{\endgroup#1~}}

\bbl@redefine To redefine a command, we save the old meaning of the macro. Then we redefine it to call the
original macro with the ‘sanitized’ argument. The reason why we do it this way is that we don’t want
to redefine the LATEX macros completely in case their definitions change (they have changed in the
past). A macro named \macro will be saved new control sequences named \org@macro.

979 \def\bbl@redefine#1{%
980 \edef\bbl@tempa{\bbl@stripslash#1}%
981 \expandafter\let\csname org@\bbl@tempa\endcsname#1%
982 \expandafter\def\csname\bbl@tempa\endcsname}
983 \@onlypreamble\bbl@redefine

\bbl@redefine@long This version of \babel@redefine can be used to redefine \long commands such as \ifthenelse.

984 \def\bbl@redefine@long#1{%
985 \edef\bbl@tempa{\bbl@stripslash#1}%
986 \expandafter\let\csname org@\bbl@tempa\endcsname#1%
987 \long\expandafter\def\csname\bbl@tempa\endcsname}
988 \@onlypreamble\bbl@redefine@long

\bbl@redefinerobust For commands that are redefined, but which might be robust we need a slightly more intelligent
macro. A robust command foo is defined to expand to \protect\foo . So it is necessary to check
whether \foo exists. The result is that the command that is being redefined is always robust
afterwards. Therefore all we need to do now is define \foo .

989 \def\bbl@redefinerobust#1{%
990 \edef\bbl@tempa{\bbl@stripslash#1}%
991 \bbl@ifunset{\bbl@tempa\space}%
992 {\expandafter\let\csname org@\bbl@tempa\endcsname#1%
993 \bbl@exp{\def\\#1{\\\protect\<\bbl@tempa\space>}}}%
994 {\bbl@exp{\let\<org@\bbl@tempa>\<\bbl@tempa\space>}}%
995 \@namedef{\bbl@tempa\space}}
996 \@onlypreamble\bbl@redefinerobust

7.3 Hooks
Admittedly, the current implementation is a somewhat simplistic and does very little to catch errors,
but it is meant for developers, after all. \bbl@usehooks is the commands used by babel to execute
hooks defined for an event.

997 \bbl@trace{Hooks}
998 \newcommand\AddBabelHook[3][]{%
999 \bbl@ifunset{bbl@hk@#2}{\EnableBabelHook{#2}}{}%
1000 \def\bbl@tempa##1,#3=##2,##3\@empty{\def\bbl@tempb{##2}}%
1001 \expandafter\bbl@tempa\bbl@evargs,#3=,\@empty
1002 \bbl@ifunset{bbl@ev@#2@#3@#1}%
1003 {\bbl@csarg\bbl@add{ev@#3@#1}{\bbl@elth{#2}}}%
1004 {\bbl@csarg\let{ev@#2@#3@#1}\relax}%
1005 \bbl@csarg\newcommand{ev@#2@#3@#1}[\bbl@tempb]}
1006 \newcommand\EnableBabelHook[1]{\bbl@csarg\let{hk@#1}\@firstofone}
1007 \newcommand\DisableBabelHook[1]{\bbl@csarg\let{hk@#1}\@gobble}
1008 \def\bbl@usehooks#1#2{%
1009 \ifx\UseHook\@undefined\else\UseHook{babel/*/#1}\fi
1010 \def\bbl@elth##1{%
1011 \bbl@cs{hk@##1}{\bbl@cs{ev@##1@#1@}#2}}%
1012 \bbl@cs{ev@#1@}%
1013 \ifx\languagename\@undefined\else % Test required for Plain (?)
1014 \ifx\UseHook\@undefined\else\UseHook{babel/\languagename/#1}\fi
1015 \def\bbl@elth##1{%
1016 \bbl@cs{hk@##1}{\bbl@cl{ev@##1@#1}#2}}%
1017 \bbl@cl{ev@#1}%
1018 \fi}

83
To ensure forward compatibility, arguments in hooks are set implicitly. So, if a further argument is
added in the future, there is no need to change the existing code. Note events intended for
hyphen.cfg are also loaded (just in case you need them for some reason).
1019 \def\bbl@evargs{,% <- don't delete this comma
1020 everylanguage=1,loadkernel=1,loadpatterns=1,loadexceptions=1,%
1021 adddialect=2,patterns=2,defaultcommands=0,encodedcommands=2,write=0,%
1022 beforeextras=0,afterextras=0,stopcommands=0,stringprocess=0,%
1023 hyphenation=2,initiateactive=3,afterreset=0,foreign=0,foreign*=0,%
1024 beforestart=0,languagename=2}
1025 \ifx\NewHook\@undefined\else
1026 \def\bbl@tempa#1=#2\@@{\NewHook{babel/#1}}
1027 \bbl@foreach\bbl@evargs{\bbl@tempa#1\@@}
1028 \fi

\babelensure The user command just parses the optional argument and creates a new macro named
\bbl@e@hlanguagei. We register a hook at the afterextras event which just executes this macro in a
“complete” selection (which, if undefined, is \relax and does nothing). This part is somewhat
involved because we have to make sure things are expanded the correct number of times.
The macro \bbl@e@hlanguagei contains \bbl@ensure{hincludei}{hexcludei}{hfontenci}, which in in
turn loops over the macros names in \bbl@captionslist, excluding (with the help of \in@) those in
the exclude list. If the fontenc is given (and not \relax), the \fontencoding is also added. Then we
loop over the include list, but if the macro already contains \foreignlanguage, nothing is done.
Note this macro (1) is not restricted to the preamble, and (2) changes are local.
1029 \bbl@trace{Defining babelensure}
1030 \newcommand\babelensure[2][]{%
1031 \AddBabelHook{babel-ensure}{afterextras}{%
1032 \ifcase\bbl@select@type
1033 \bbl@cl{e}%
1034 \fi}%
1035 \begingroup
1036 \let\bbl@ens@include\@empty
1037 \let\bbl@ens@exclude\@empty
1038 \def\bbl@ens@fontenc{\relax}%
1039 \def\bbl@tempb##1{%
1040 \ifx\@empty##1\else\noexpand##1\expandafter\bbl@tempb\fi}%
1041 \edef\bbl@tempa{\bbl@tempb#1\@empty}%
1042 \def\bbl@tempb##1=##2\@@{\@namedef{bbl@ens@##1}{##2}}%
1043 \bbl@foreach\bbl@tempa{\bbl@tempb##1\@@}%
1044 \def\bbl@tempc{\bbl@ensure}%
1045 \expandafter\bbl@add\expandafter\bbl@tempc\expandafter{%
1046 \expandafter{\bbl@ens@include}}%
1047 \expandafter\bbl@add\expandafter\bbl@tempc\expandafter{%
1048 \expandafter{\bbl@ens@exclude}}%
1049 \toks@\expandafter{\bbl@tempc}%
1050 \bbl@exp{%
1051 \endgroup
1052 \def\<bbl@e@#2>{\the\toks@{\bbl@ens@fontenc}}}}
1053 \def\bbl@ensure#1#2#3{% 1: include 2: exclude 3: fontenc
1054 \def\bbl@tempb##1{% elt for (excluding) \bbl@captionslist list
1055 \ifx##1\@undefined % 3.32 - Don't assume the macro exists
1056 \edef##1{\noexpand\bbl@nocaption
1057 {\bbl@stripslash##1}{\languagename\bbl@stripslash##1}}%
1058 \fi
1059 \ifx##1\@empty\else
1060 \in@{##1}{#2}%
1061 \ifin@\else
1062 \bbl@ifunset{bbl@ensure@\languagename}%
1063 {\bbl@exp{%
1064 \\\DeclareRobustCommand\<bbl@ensure@\languagename>[1]{%
1065 \\\foreignlanguage{\languagename}%
1066 {\ifx\relax#3\else
1067 \\\fontencoding{#3}\\\selectfont
1068 \fi

84
1069 ########1}}}}%
1070 {}%
1071 \toks@\expandafter{##1}%
1072 \edef##1{%
1073 \bbl@csarg\noexpand{ensure@\languagename}%
1074 {\the\toks@}}%
1075 \fi
1076 \expandafter\bbl@tempb
1077 \fi}%
1078 \expandafter\bbl@tempb\bbl@captionslist\today\@empty
1079 \def\bbl@tempa##1{% elt for include list
1080 \ifx##1\@empty\else
1081 \bbl@csarg\in@{ensure@\languagename\expandafter}\expandafter{##1}%
1082 \ifin@\else
1083 \bbl@tempb##1\@empty
1084 \fi
1085 \expandafter\bbl@tempa
1086 \fi}%
1087 \bbl@tempa#1\@empty}
1088 \def\bbl@captionslist{%
1089 \prefacename\refname\abstractname\bibname\chaptername\appendixname
1090 \contentsname\listfigurename\listtablename\indexname\figurename
1091 \tablename\partname\enclname\ccname\headtoname\pagename\seename
1092 \alsoname\proofname\glossaryname}

7.4 Setting up language files


\LdfInit \LdfInit macro takes two arguments. The first argument is the name of the language that will be
defined in the language definition file; the second argument is either a control sequence or a string
from which a control sequence should be constructed. The existence of the control sequence
indicates that the file has been processed before.
At the start of processing a language definition file we always check the category code of the at-sign.
We make sure that it is a ‘letter’ during the processing of the file. We also save its name as the last
called option, even if not loaded.
Another character that needs to have the correct category code during processing of language
definition files is the equals sign, ‘=’, because it is sometimes used in constructions with the \let
primitive. Therefore we store its current catcode and restore it later on.
Now we check whether we should perhaps stop the processing of this file. To do this we first need to
check whether the second argument that is passed to \LdfInit is a control sequence. We do that by
looking at the first token after passing #2 through string. When it is equal to \@backslashchar we
are dealing with a control sequence which we can compare with \@undefined.
If so, we call \ldf@quit to set the main language, restore the category code of the @-sign and call
\endinput
When #2 was not a control sequence we construct one and compare it with \relax.
Finally we check \originalTeX.
1093 \bbl@trace{Macros for setting language files up}
1094 \def\bbl@ldfinit{%
1095 \let\bbl@screset\@empty
1096 \let\BabelStrings\bbl@opt@string
1097 \let\BabelOptions\@empty
1098 \let\BabelLanguages\relax
1099 \ifx\originalTeX\@undefined
1100 \let\originalTeX\@empty
1101 \else
1102 \originalTeX
1103 \fi}
1104 \def\LdfInit#1#2{%
1105 \chardef\atcatcode=\catcode`\@
1106 \catcode`\@=11\relax
1107 \chardef\eqcatcode=\catcode`\=
1108 \catcode`\==12\relax
1109 \expandafter\if\expandafter\@backslashchar
1110 \expandafter\@car\string#2\@nil

85
1111 \ifx#2\@undefined\else
1112 \ldf@quit{#1}%
1113 \fi
1114 \else
1115 \expandafter\ifx\csname#2\endcsname\relax\else
1116 \ldf@quit{#1}%
1117 \fi
1118 \fi
1119 \bbl@ldfinit}

\ldf@quit This macro interrupts the processing of a language definition file.

1120 \def\ldf@quit#1{%
1121 \expandafter\main@language\expandafter{#1}%
1122 \catcode`\@=\atcatcode \let\atcatcode\relax
1123 \catcode`\==\eqcatcode \let\eqcatcode\relax
1124 \endinput}

\ldf@finish This macro takes one argument. It is the name of the language that was defined in the language
definition file.
We load the local configuration file if one is present, we set the main language (taking into account
that the argument might be a control sequence that needs to be expanded) and reset the category
code of the @-sign.

1125 \def\bbl@afterldf#1{% TODO. Merge into the next macro? Unused elsewhere
1126 \bbl@afterlang
1127 \let\bbl@afterlang\relax
1128 \let\BabelModifiers\relax
1129 \let\bbl@screset\relax}%
1130 \def\ldf@finish#1{%
1131 \loadlocalcfg{#1}%
1132 \bbl@afterldf{#1}%
1133 \expandafter\main@language\expandafter{#1}%
1134 \catcode`\@=\atcatcode \let\atcatcode\relax
1135 \catcode`\==\eqcatcode \let\eqcatcode\relax}

After the preamble of the document the commands \LdfInit, \ldf@quit and \ldf@finish are no
longer needed. Therefore they are turned into warning messages in LATEX.

1136 \@onlypreamble\LdfInit
1137 \@onlypreamble\ldf@quit
1138 \@onlypreamble\ldf@finish

\main@language This command should be used in the various language definition files. It stores its argument in
\bbl@main@language \bbl@main@language; to be used to switch to the correct language at the beginning of the document.

1139 \def\main@language#1{%
1140 \def\bbl@main@language{#1}%
1141 \let\languagename\bbl@main@language % TODO. Set localename
1142 \bbl@id@assign
1143 \bbl@patterns{\languagename}}

We also have to make sure that some code gets executed at the beginning of the document, either
when the aux file is read or, if it does not exist, when the \AtBeginDocument is executed. Languages
do not set \pagedir, so we set here for the whole document to the main \bodydir.

1144 \def\bbl@beforestart{%
1145 \def\@nolanerr##1{%
1146 \bbl@warning{Undefined language '##1' in aux.\\Reported}}%
1147 \bbl@usehooks{beforestart}{}%
1148 \global\let\bbl@beforestart\relax}
1149 \AtBeginDocument{%
1150 {\@nameuse{bbl@beforestart}}% Group!
1151 \if@filesw
1152 \providecommand\babel@aux[2]{}%
1153 \immediate\write\@mainaux{%
1154 \string\providecommand\string\babel@aux[2]{}}%

86
1155 \immediate\write\@mainaux{\string\@nameuse{bbl@beforestart}}%
1156 \fi
1157 \expandafter\selectlanguage\expandafter{\bbl@main@language}%
1158 \ifbbl@single % must go after the line above.
1159 \renewcommand\selectlanguage[1]{}%
1160 \renewcommand\foreignlanguage[2]{#2}%
1161 \global\let\babel@aux\@gobbletwo % Also as flag
1162 \fi
1163 \ifcase\bbl@engine\or\pagedir\bodydir\fi} % TODO - a better place

A bit of optimization. Select in heads/foots the language only if necessary.


1164 \def\select@language@x#1{%
1165 \ifcase\bbl@select@type
1166 \bbl@ifsamestring\languagename{#1}{}{\select@language{#1}}%
1167 \else
1168 \select@language{#1}%
1169 \fi}

7.5 Shorthands
\bbl@add@special The macro \bbl@add@special is used to add a new character (or single character control sequence)
to the macro \dospecials (and \@sanitize if LATEX is used). It is used only at one place, namely
when \initiate@active@char is called (which is ignored if the char has been made active before).
Because \@sanitize can be undefined, we put the definition inside a conditional.
Items are added to the lists without checking its existence or the original catcode. It does not hurt,
but should be fixed. It’s already done with \nfss@catcodes, added in 3.10.
1170 \bbl@trace{Shorhands}
1171 \def\bbl@add@special#1{% 1:a macro like \", \?, etc.
1172 \bbl@add\dospecials{\do#1}% test @sanitize = \relax, for back. compat.
1173 \bbl@ifunset{@sanitize}{}{\bbl@add\@sanitize{\@makeother#1}}%
1174 \ifx\nfss@catcodes\@undefined\else % TODO - same for above
1175 \begingroup
1176 \catcode`#1\active
1177 \nfss@catcodes
1178 \ifnum\catcode`#1=\active
1179 \endgroup
1180 \bbl@add\nfss@catcodes{\@makeother#1}%
1181 \else
1182 \endgroup
1183 \fi
1184 \fi}

\bbl@remove@special The companion of the former macro is \bbl@remove@special. It removes a character from the set
macros \dospecials and \@sanitize, but it is not used at all in the babel core.
1185 \def\bbl@remove@special#1{%
1186 \begingroup
1187 \def\x##1##2{\ifnum`#1=`##2\noexpand\@empty
1188 \else\noexpand##1\noexpand##2\fi}%
1189 \def\do{\x\do}%
1190 \def\@makeother{\x\@makeother}%
1191 \edef\x{\endgroup
1192 \def\noexpand\dospecials{\dospecials}%
1193 \expandafter\ifx\csname @sanitize\endcsname\relax\else
1194 \def\noexpand\@sanitize{\@sanitize}%
1195 \fi}%
1196 \x}

\initiate@active@char A language definition file can call this macro to make a character active. This macro takes one
argument, the character that is to be made active. When the character was already active this macro
does nothing. Otherwise, this macro defines the control sequence \normal@charhchari to expand to
the character in its ‘normal state’ and it defines the active character to expand to
\normal@charhchari by default (hchari being the character to be made active). Later its definition
can be changed to expand to \active@charhchari by calling \bbl@activate{hchari}.

87
For example, to make the double quote character active one could have \initiate@active@char{"}
in a language definition file. This defines " as \active@prefix "\active@char" (where the first " is
the character with its original catcode, when the shorthand is created, and \active@char" is a single
token). In protected contexts, it expands to \protect " or \noexpand " (ie, with the original ");
otherwise \active@char" is executed. This macro in turn expands to \normal@char" in “safe”
contexts (eg, \label), but \user@active" in normal “unsafe” ones. The latter search a definition in
the user, language and system levels, in this order, but if none is found, \normal@char" is used.
However, a deactivated shorthand (with \bbl@deactivate is defined as
\active@prefix "\normal@char".
The following macro is used to define shorthands in the three levels. It takes 4 arguments: the
(string’ed) character, \<level>@group, <level>@active and <next-level>@active (except in
system).

1197 \def\bbl@active@def#1#2#3#4{%
1198 \@namedef{#3#1}{%
1199 \expandafter\ifx\csname#2@sh@#1@\endcsname\relax
1200 \bbl@afterelse\bbl@sh@select#2#1{#3@arg#1}{#4#1}%
1201 \else
1202 \bbl@afterfi\csname#2@sh@#1@\endcsname
1203 \fi}%

When there is also no current-level shorthand with an argument we will check whether there is a
next-level defined shorthand for this active character.

1204 \long\@namedef{#3@arg#1}##1{%
1205 \expandafter\ifx\csname#2@sh@#1@\string##1@\endcsname\relax
1206 \bbl@afterelse\csname#4#1\endcsname##1%
1207 \else
1208 \bbl@afterfi\csname#2@sh@#1@\string##1@\endcsname
1209 \fi}}%

\initiate@active@char calls \@initiate@active@char with 3 arguments. All of them are the same
character with different catcodes: active, other (\string’ed) and the original one. This trick
simplifies the code a lot.

1210 \def\initiate@active@char#1{%
1211 \bbl@ifunset{active@char\string#1}%
1212 {\bbl@withactive
1213 {\expandafter\@initiate@active@char\expandafter}#1\string#1#1}%
1214 {}}

The very first thing to do is saving the original catcode and the original definition, even if not active,
which is possible (undefined characters require a special treatement to avoid making them \relax
and preserving some degree of protection).

1215 \def\@initiate@active@char#1#2#3{%
1216 \bbl@csarg\edef{oricat@#2}{\catcode`#2=\the\catcode`#2\relax}%
1217 \ifx#1\@undefined
1218 \bbl@csarg\def{oridef@#2}{\def#1{\active@prefix#1\@undefined}}%
1219 \else
1220 \bbl@csarg\let{oridef@@#2}#1%
1221 \bbl@csarg\edef{oridef@#2}{%
1222 \let\noexpand#1%
1223 \expandafter\noexpand\csname bbl@oridef@@#2\endcsname}%
1224 \fi

If the character is already active we provide the default expansion under this shorthand mechanism.
Otherwise we write a message in the transcript file, and define \normal@charhchari to expand to the
character in its default state. If the character is mathematically active when babel is loaded (for
example ') the normal expansion is somewhat different to avoid an infinite loop (but it does not
prevent the loop if the mathcode is set to "8000 a posteriori).

1225 \ifx#1#3\relax
1226 \expandafter\let\csname normal@char#2\endcsname#3%
1227 \else
1228 \bbl@info{Making #2 an active character}%
1229 \ifnum\mathcode`#2=\ifodd\bbl@engine"1000000 \else"8000 \fi
1230 \@namedef{normal@char#2}{%

88
1231 \textormath{#3}{\csname bbl@oridef@@#2\endcsname}}%
1232 \else
1233 \@namedef{normal@char#2}{#3}%
1234 \fi

To prevent problems with the loading of other packages after babel we reset the catcode of the
character to the original one at the end of the package and of each language file (except with
KeepShorthandsActive). It is re-activate again at \begin{document}. We also need to make sure that
the shorthands are active during the processing of the .aux file. Otherwise some citations may give
unexpected results in the printout when a shorthand was used in the optional argument of \bibitem
for example. Then we make it active (not strictly necessary, but done for backward compatibility).

1235 \bbl@restoreactive{#2}%
1236 \AtBeginDocument{%
1237 \catcode`#2\active
1238 \if@filesw
1239 \immediate\write\@mainaux{\catcode`\string#2\active}%
1240 \fi}%
1241 \expandafter\bbl@add@special\csname#2\endcsname
1242 \catcode`#2\active
1243 \fi

Now we have set \normal@charhchari, we must define \active@charhchari, to be executed when


the character is activated. We define the first level expansion of \active@charhchari to check the
status of the @safe@actives flag. If it is set to true we expand to the ‘normal’ version of this
character, otherwise we call \user@activehchari to start the search of a definition in the user,
language and system levels (or eventually normal@charhchari).

1244 \let\bbl@tempa\@firstoftwo
1245 \if\string^#2%
1246 \def\bbl@tempa{\noexpand\textormath}%
1247 \else
1248 \ifx\bbl@mathnormal\@undefined\else
1249 \let\bbl@tempa\bbl@mathnormal
1250 \fi
1251 \fi
1252 \expandafter\edef\csname active@char#2\endcsname{%
1253 \bbl@tempa
1254 {\noexpand\if@safe@actives
1255 \noexpand\expandafter
1256 \expandafter\noexpand\csname normal@char#2\endcsname
1257 \noexpand\else
1258 \noexpand\expandafter
1259 \expandafter\noexpand\csname bbl@doactive#2\endcsname
1260 \noexpand\fi}%
1261 {\expandafter\noexpand\csname normal@char#2\endcsname}}%
1262 \bbl@csarg\edef{doactive#2}{%
1263 \expandafter\noexpand\csname user@active#2\endcsname}%

We now define the default values which the shorthand is set to when activated or deactivated. It is
set to the deactivated form (globally), so that the character expands to

\active@prefix hchari \normal@charhchari

(where \active@charhchari is one control sequence!).

1264 \bbl@csarg\edef{active@#2}{%
1265 \noexpand\active@prefix\noexpand#1%
1266 \expandafter\noexpand\csname active@char#2\endcsname}%
1267 \bbl@csarg\edef{normal@#2}{%
1268 \noexpand\active@prefix\noexpand#1%
1269 \expandafter\noexpand\csname normal@char#2\endcsname}%
1270 \bbl@ncarg\let#1{bbl@normal@#2}%

The next level of the code checks whether a user has defined a shorthand for himself with this
character. First we check for a single character shorthand. If that doesn’t exist we check for a
shorthand with an argument.

89
1271 \bbl@active@def#2\user@group{user@active}{language@active}%
1272 \bbl@active@def#2\language@group{language@active}{system@active}%
1273 \bbl@active@def#2\system@group{system@active}{normal@char}%

In order to do the right thing when a shorthand with an argument is used by itself at the end of the
line we provide a definition for the case of an empty argument. For that case we let the shorthand
character expand to its non-active self. Also, When a shorthand combination such as '' ends up in a
heading TEX would see \protect'\protect'. To prevent this from happening a couple of shorthand
needs to be defined at user level.

1274 \expandafter\edef\csname\user@group @sh@#2@@\endcsname


1275 {\expandafter\noexpand\csname normal@char#2\endcsname}%
1276 \expandafter\edef\csname\user@group @sh@#2@\string\protect@\endcsname
1277 {\expandafter\noexpand\csname user@active#2\endcsname}%

Finally, a couple of special cases are taken care of. (1) If we are making the right quote (') active we
need to change \pr@m@s as well. Also, make sure that a single ' in math mode ‘does the right thing’.
(2) If we are using the caret (^) as a shorthand character special care should be taken to make sure
math still works. Therefore an extra level of expansion is introduced with a check for math mode on
the upper level.

1278 \if\string'#2%
1279 \let\prim@s\bbl@prim@s
1280 \let\active@math@prime#1%
1281 \fi
1282 \bbl@usehooks{initiateactive}{{#1}{#2}{#3}}}

The following package options control the behavior of shorthands in math mode.

1283 hh∗More package optionsii ≡


1284 \DeclareOption{math=active}{}
1285 \DeclareOption{math=normal}{\def\bbl@mathnormal{\noexpand\textormath}}
1286 hh/More package optionsii

Initiating a shorthand makes active the char. That is not strictly necessary but it is still done for
backward compatibility. So we need to restore the original catcode at the end of package and and the
end of the ldf.

1287 \@ifpackagewith{babel}{KeepShorthandsActive}%
1288 {\let\bbl@restoreactive\@gobble}%
1289 {\def\bbl@restoreactive#1{%
1290 \bbl@exp{%
1291 \\\AfterBabelLanguage\\\CurrentOption
1292 {\catcode`#1=\the\catcode`#1\relax}%
1293 \\\AtEndOfPackage
1294 {\catcode`#1=\the\catcode`#1\relax}}}%
1295 \AtEndOfPackage{\let\bbl@restoreactive\@gobble}}

\bbl@sh@select This command helps the shorthand supporting macros to select how to proceed. Note that this macro
needs to be expandable as do all the shorthand macros in order for them to work in expansion-only
environments such as the argument of \hyphenation.
This macro expects the name of a group of shorthands in its first argument and a shorthand
character in its second argument. It will expand to either \bbl@firstcs or \bbl@scndcs. Hence two
more arguments need to follow it.

1296 \def\bbl@sh@select#1#2{%
1297 \expandafter\ifx\csname#1@sh@#2@sel\endcsname\relax
1298 \bbl@afterelse\bbl@scndcs
1299 \else
1300 \bbl@afterfi\csname#1@sh@#2@sel\endcsname
1301 \fi}

\active@prefix The command \active@prefix which is used in the expansion of active characters has a function
similar to \OT1-cmd in that it \protects the active character whenever \protect is not
\@typeset@protect. The \@gobble is needed to remove a token such as \activechar: (when the
double colon was the active character to be dealt with). There are two definitions, depending of
\ifincsname is available. If there is, the expansion will be more robust.

1302 \begingroup

90
1303 \bbl@ifunset{ifincsname}% TODO. Ugly. Correct? Only Plain?
1304 {\gdef\active@prefix#1{%
1305 \ifx\protect\@typeset@protect
1306 \else
1307 \ifx\protect\@unexpandable@protect
1308 \noexpand#1%
1309 \else
1310 \protect#1%
1311 \fi
1312 \expandafter\@gobble
1313 \fi}}
1314 {\gdef\active@prefix#1{%
1315 \ifincsname
1316 \string#1%
1317 \expandafter\@gobble
1318 \else
1319 \ifx\protect\@typeset@protect
1320 \else
1321 \ifx\protect\@unexpandable@protect
1322 \noexpand#1%
1323 \else
1324 \protect#1%
1325 \fi
1326 \expandafter\expandafter\expandafter\@gobble
1327 \fi
1328 \fi}}
1329 \endgroup

\if@safe@actives In some circumstances it is necessary to be able to change the expansion of an active character on
the fly. For this purpose the switch @safe@actives is available. The setting of this switch should be
checked in the first level expansion of \active@charhchari.

1330 \newif\if@safe@actives
1331 \@safe@activesfalse

\bbl@restore@actives When the output routine kicks in while the active characters were made “safe” this must be undone
in the headers to prevent unexpected typeset results. For this situation we define a command to
make them “unsafe” again.

1332 \def\bbl@restore@actives{\if@safe@actives\@safe@activesfalse\fi}

\bbl@activate Both macros take one argument, like \initiate@active@char. The macro is used to change the
\bbl@deactivate definition of an active character to expand to \active@charhchari in the case of \bbl@activate, or
\normal@charhchari in the case of \bbl@deactivate.

1333 \chardef\bbl@activated\z@
1334 \def\bbl@activate#1{%
1335 \chardef\bbl@activated\@ne
1336 \bbl@withactive{\expandafter\let\expandafter}#1%
1337 \csname bbl@active@\string#1\endcsname}
1338 \def\bbl@deactivate#1{%
1339 \chardef\bbl@activated\tw@
1340 \bbl@withactive{\expandafter\let\expandafter}#1%
1341 \csname bbl@normal@\string#1\endcsname}

\bbl@firstcs These macros are used only as a trick when declaring shorthands.
\bbl@scndcs
1342 \def\bbl@firstcs#1#2{\csname#1\endcsname}
1343 \def\bbl@scndcs#1#2{\csname#2\endcsname}

\declare@shorthand The command \declare@shorthand is used to declare a shorthand on a certain level. It takes three
arguments:
1. a name for the collection of shorthands, i.e. ‘system’, or ‘dutch’;
2. the character (sequence) that makes up the shorthand, i.e. ~ or "a;
3. the code to be executed when the shorthand is encountered.

91
The auxiliary macro \babel@texpdf improves the interoperativity with hyperref and takes 4
arguments: (1) The TEX code in text mode, (2) the string for hyperref, (3) the TEX code in math mode,
and (4), which is currently ignored, but it’s meant for a string in math mode, like a minus sign instead
of an hyphen (currently hyperref doesn’t discriminate the mode). This macro may be used in ldf
files.

1344 \def\babel@texpdf#1#2#3#4{%
1345 \ifx\texorpdfstring\@undefined
1346 \textormath{#1}{#3}%
1347 \else
1348 \texorpdfstring{\textormath{#1}{#3}}{#2}%
1349 % \texorpdfstring{\textormath{#1}{#3}}{\textormath{#2}{#4}}%
1350 \fi}
1351 %
1352 \def\declare@shorthand#1#2{\@decl@short{#1}#2\@nil}
1353 \def\@decl@short#1#2#3\@nil#4{%
1354 \def\bbl@tempa{#3}%
1355 \ifx\bbl@tempa\@empty
1356 \expandafter\let\csname #1@sh@\string#2@sel\endcsname\bbl@scndcs
1357 \bbl@ifunset{#1@sh@\string#2@}{}%
1358 {\def\bbl@tempa{#4}%
1359 \expandafter\ifx\csname#1@sh@\string#2@\endcsname\bbl@tempa
1360 \else
1361 \bbl@info
1362 {Redefining #1 shorthand \string#2\\%
1363 in language \CurrentOption}%
1364 \fi}%
1365 \@namedef{#1@sh@\string#2@}{#4}%
1366 \else
1367 \expandafter\let\csname #1@sh@\string#2@sel\endcsname\bbl@firstcs
1368 \bbl@ifunset{#1@sh@\string#2@\string#3@}{}%
1369 {\def\bbl@tempa{#4}%
1370 \expandafter\ifx\csname#1@sh@\string#2@\string#3@\endcsname\bbl@tempa
1371 \else
1372 \bbl@info
1373 {Redefining #1 shorthand \string#2\string#3\\%
1374 in language \CurrentOption}%
1375 \fi}%
1376 \@namedef{#1@sh@\string#2@\string#3@}{#4}%
1377 \fi}

\textormath Some of the shorthands that will be declared by the language definition files have to be usable in
both text and mathmode. To achieve this the helper macro \textormath is provided.

1378 \def\textormath{%
1379 \ifmmode
1380 \expandafter\@secondoftwo
1381 \else
1382 \expandafter\@firstoftwo
1383 \fi}

\user@group The current concept of ‘shorthands’ supports three levels or groups of shorthands. For each level the
\language@group name of the level or group is stored in a macro. The default is to have a user group; use language
\system@group group ‘english’ and have a system group called ‘system’.

1384 \def\user@group{user}
1385 \def\language@group{english} % TODO. I don't like defaults
1386 \def\system@group{system}

\useshorthands This is the user level macro. It initializes and activates the character for use as a shorthand character
(ie, it’s active in the preamble). Languages can deactivate shorthands, so a starred version is also
provided which activates them always after the language has been switched.

1387 \def\useshorthands{%
1388 \@ifstar\bbl@usesh@s{\bbl@usesh@x{}}}
1389 \def\bbl@usesh@s#1{%

92
1390 \bbl@usesh@x
1391 {\AddBabelHook{babel-sh-\string#1}{afterextras}{\bbl@activate{#1}}}%
1392 {#1}}
1393 \def\bbl@usesh@x#1#2{%
1394 \bbl@ifshorthand{#2}%
1395 {\def\user@group{user}%
1396 \initiate@active@char{#2}%
1397 #1%
1398 \bbl@activate{#2}}%
1399 {\bbl@error
1400 {I can't declare a shorthand turned off (\string#2)}
1401 {Sorry, but you can't use shorthands which have been\\%
1402 turned off in the package options}}}

\defineshorthand Currently we only support two groups of user level shorthands, named internally user and
user@<lang> (language-dependent user shorthands). By default, only the first one is taken into
account, but if the former is also used (in the optional argument of \defineshorthand) a new level is
inserted for it (user@generic, done by \bbl@set@user@generic); we make also sure {} and
\protect are taken into account in this new top level.
1403 \def\user@language@group{user@\language@group}
1404 \def\bbl@set@user@generic#1#2{%
1405 \bbl@ifunset{user@generic@active#1}%
1406 {\bbl@active@def#1\user@language@group{user@active}{user@generic@active}%
1407 \bbl@active@def#1\user@group{user@generic@active}{language@active}%
1408 \expandafter\edef\csname#2@sh@#1@@\endcsname{%
1409 \expandafter\noexpand\csname normal@char#1\endcsname}%
1410 \expandafter\edef\csname#2@sh@#1@\string\protect@\endcsname{%
1411 \expandafter\noexpand\csname user@active#1\endcsname}}%
1412 \@empty}
1413 \newcommand\defineshorthand[3][user]{%
1414 \edef\bbl@tempa{\zap@space#1 \@empty}%
1415 \bbl@for\bbl@tempb\bbl@tempa{%
1416 \if*\expandafter\@car\bbl@tempb\@nil
1417 \edef\bbl@tempb{user@\expandafter\@gobble\bbl@tempb}%
1418 \@expandtwoargs
1419 \bbl@set@user@generic{\expandafter\string\@car#2\@nil}\bbl@tempb
1420 \fi
1421 \declare@shorthand{\bbl@tempb}{#2}{#3}}}

\languageshorthands A user level command to change the language from which shorthands are used. Unfortunately, babel
currently does not keep track of defined groups, and therefore there is no way to catch a possible
change in casing to fix it in the same way languages names are fixed. [TODO].
1422 \def\languageshorthands#1{\def\language@group{#1}}

\aliasshorthand First the new shorthand needs to be initialized. Then, we define the new shorthand in terms of the
original one, but note with \aliasshorthands{"}{/} is \active@prefix /\active@char/, so we
still need to let the lattest to \active@char".
1423 \def\aliasshorthand#1#2{%
1424 \bbl@ifshorthand{#2}%
1425 {\expandafter\ifx\csname active@char\string#2\endcsname\relax
1426 \ifx\document\@notprerr
1427 \@notshorthand{#2}%
1428 \else
1429 \initiate@active@char{#2}%
1430 \bbl@ccarg\let{active@char\string#2}{active@char\string#1}%
1431 \bbl@ccarg\let{normal@char\string#2}{normal@char\string#1}%
1432 \bbl@activate{#2}%
1433 \fi
1434 \fi}%
1435 {\bbl@error
1436 {Cannot declare a shorthand turned off (\string#2)}
1437 {Sorry, but you cannot use shorthands which have been\\%
1438 turned off in the package options}}}

93
\@notshorthand
1439 \def\@notshorthand#1{%
1440 \bbl@error{%
1441 The character '\string #1' should be made a shorthand character;\\%
1442 add the command \string\useshorthands\string{#1\string} to
1443 the preamble.\\%
1444 I will ignore your instruction}%
1445 {You may proceed, but expect unexpected results}}

\shorthandon The first level definition of these macros just passes the argument on to \bbl@switch@sh, adding
\shorthandoff \@nil at the end to denote the end of the list of characters.

1446 \newcommand*\shorthandon[1]{\bbl@switch@sh\@ne#1\@nnil}
1447 \DeclareRobustCommand*\shorthandoff{%
1448 \@ifstar{\bbl@shorthandoff\tw@}{\bbl@shorthandoff\z@}}
1449 \def\bbl@shorthandoff#1#2{\bbl@switch@sh#1#2\@nnil}

\bbl@switch@sh The macro \bbl@switch@sh takes the list of characters apart one by one and subsequently switches
the category code of the shorthand character according to the first argument of \bbl@switch@sh.
But before any of this switching takes place we make sure that the character we are dealing with is
known as a shorthand character. If it is, a macro such as \active@char" should exist.
Switching off and on is easy – we just set the category code to ‘other’ (12) and \active. With the
starred version, the original catcode and the original definition, saved in @initiate@active@char,
are restored.

1450 \def\bbl@switch@sh#1#2{%
1451 \ifx#2\@nnil\else
1452 \bbl@ifunset{bbl@active@\string#2}%
1453 {\bbl@error
1454 {I can't switch '\string#2' on or off--not a shorthand}%
1455 {This character is not a shorthand. Maybe you made\\%
1456 a typing mistake? I will ignore your instruction.}}%
1457 {\ifcase#1% off, on, off*
1458 \catcode`#212\relax
1459 \or
1460 \catcode`#2\active
1461 \bbl@ifunset{bbl@shdef@\string#2}%
1462 {}%
1463 {\bbl@withactive{\expandafter\let\expandafter}#2%
1464 \csname bbl@shdef@\string#2\endcsname
1465 \bbl@csarg\let{shdef@\string#2}\relax}%
1466 \ifcase\bbl@activated\or
1467 \bbl@activate{#2}%
1468 \else
1469 \bbl@deactivate{#2}%
1470 \fi
1471 \or
1472 \bbl@ifunset{bbl@shdef@\string#2}%
1473 {\bbl@withactive{\bbl@csarg\let{shdef@\string#2}}#2}%
1474 {}%
1475 \csname bbl@oricat@\string#2\endcsname
1476 \csname bbl@oridef@\string#2\endcsname
1477 \fi}%
1478 \bbl@afterfi\bbl@switch@sh#1%
1479 \fi}

Note the value is that at the expansion time; eg, in the preample shorhands are usually deactivated.

1480 \def\babelshorthand{\active@prefix\babelshorthand\bbl@putsh}
1481 \def\bbl@putsh#1{%
1482 \bbl@ifunset{bbl@active@\string#1}%
1483 {\bbl@putsh@i#1\@empty\@nnil}%
1484 {\csname bbl@active@\string#1\endcsname}}
1485 \def\bbl@putsh@i#1#2\@nnil{%
1486 \csname\language@group @sh@\string#1@%

94
1487 \ifx\@empty#2\else\string#2@\fi\endcsname}
1488 \ifx\bbl@opt@shorthands\@nnil\else
1489 \let\bbl@s@initiate@active@char\initiate@active@char
1490 \def\initiate@active@char#1{%
1491 \bbl@ifshorthand{#1}{\bbl@s@initiate@active@char{#1}}{}}
1492 \let\bbl@s@switch@sh\bbl@switch@sh
1493 \def\bbl@switch@sh#1#2{%
1494 \ifx#2\@nnil\else
1495 \bbl@afterfi
1496 \bbl@ifshorthand{#2}{\bbl@s@switch@sh#1{#2}}{\bbl@switch@sh#1}%
1497 \fi}
1498 \let\bbl@s@activate\bbl@activate
1499 \def\bbl@activate#1{%
1500 \bbl@ifshorthand{#1}{\bbl@s@activate{#1}}{}}
1501 \let\bbl@s@deactivate\bbl@deactivate
1502 \def\bbl@deactivate#1{%
1503 \bbl@ifshorthand{#1}{\bbl@s@deactivate{#1}}{}}
1504 \fi

You may want to test if a character is a shorthand. Note it does not test whether the shorthand is on
or off.

1505 \newcommand\ifbabelshorthand[3]{\bbl@ifunset{bbl@active@\string#1}{#3}{#2}}

\bbl@prim@s One of the internal macros that are involved in substituting \prime for each right quote in
\bbl@pr@m@s mathmode is \prim@s. This checks if the next character is a right quote. When the right quote is
active, the definition of this macro needs to be adapted to look also for an active right quote; the hat
could be active, too.

1506 \def\bbl@prim@s{%
1507 \prime\futurelet\@let@token\bbl@pr@m@s}
1508 \def\bbl@if@primes#1#2{%
1509 \ifx#1\@let@token
1510 \expandafter\@firstoftwo
1511 \else\ifx#2\@let@token
1512 \bbl@afterelse\expandafter\@firstoftwo
1513 \else
1514 \bbl@afterfi\expandafter\@secondoftwo
1515 \fi\fi}
1516 \begingroup
1517 \catcode`\^=7 \catcode`\*=\active \lccode`\*=`\^
1518 \catcode`\'=12 \catcode`\"=\active \lccode`\"=`\'
1519 \lowercase{%
1520 \gdef\bbl@pr@m@s{%
1521 \bbl@if@primes"'%
1522 \pr@@@s
1523 {\bbl@if@primes*^\pr@@@t\egroup}}}
1524 \endgroup

Usually the ~ is active and expands to \penalty\@M\ . When it is written to the .aux file it is written
expanded. To prevent that and to be able to use the character ~ as a start character for a shorthand,
it is redefined here as a one character shorthand on system level. The system declaration is in most
cases redundant (when ~ is still a non-break space), and in some cases is inconvenient (if ~ has been
redefined); however, for backward compatibility it is maintained (some existing documents may rely
on the babel value).

1525 \initiate@active@char{~}
1526 \declare@shorthand{system}{~}{\leavevmode\nobreak\ }
1527 \bbl@activate{~}

\OT1dqpos The position of the double quote character is different for the OT1 and T1 encodings. It will later be
\T1dqpos selected using the \f@encoding macro. Therefore we define two macros here to store the position of
the character in these encodings.

1528 \expandafter\def\csname OT1dqpos\endcsname{127}


1529 \expandafter\def\csname T1dqpos\endcsname{4}

95
When the macro \f@encoding is undefined (as it is in plain TEX) we define it here to expand to OT1

1530 \ifx\f@encoding\@undefined
1531 \def\f@encoding{OT1}
1532 \fi

7.6 Language attributes


Language attributes provide a means to give the user control over which features of the language
definition files he wants to enable.

\languageattribute The macro \languageattribute checks whether its arguments are valid and then activates the
selected language attribute. First check whether the language is known, and then process each
attribute in the list.

1533 \bbl@trace{Language attributes}


1534 \newcommand\languageattribute[2]{%
1535 \def\bbl@tempc{#1}%
1536 \bbl@fixname\bbl@tempc
1537 \bbl@iflanguage\bbl@tempc{%
1538 \bbl@vforeach{#2}{%

We want to make sure that each attribute is selected only once; therefore we store the already
selected attributes in \bbl@known@attribs. When that control sequence is not yet defined this
attribute is certainly not selected before.

1539 \ifx\bbl@known@attribs\@undefined
1540 \in@false
1541 \else
1542 \bbl@xin@{,\bbl@tempc-##1,}{,\bbl@known@attribs,}%
1543 \fi
1544 \ifin@
1545 \bbl@warning{%
1546 You have more than once selected the attribute '##1'\\%
1547 for language #1. Reported}%
1548 \else

When we end up here the attribute is not selected before. So, we add it to the list of selected
attributes and execute the associated TEX-code.

1549 \bbl@exp{%
1550 \\\bbl@add@list\\\bbl@known@attribs{\bbl@tempc-##1}}%
1551 \edef\bbl@tempa{\bbl@tempc-##1}%
1552 \expandafter\bbl@ifknown@ttrib\expandafter{\bbl@tempa}\bbl@attributes%
1553 {\csname\bbl@tempc @attr@##1\endcsname}%
1554 {\@attrerr{\bbl@tempc}{##1}}%
1555 \fi}}}
1556 \@onlypreamble\languageattribute

The error text to be issued when an unknown attribute is selected.

1557 \newcommand*{\@attrerr}[2]{%
1558 \bbl@error
1559 {The attribute #2 is unknown for language #1.}%
1560 {Your command will be ignored, type <return> to proceed}}

\bbl@declare@ttribute This command adds the new language/attribute combination to the list of known attributes.
Then it defines a control sequence to be executed when the attribute is used in a document. The
result of this should be that the macro \extras... for the current language is extended, otherwise
the attribute will not work as its code is removed from memory at \begin{document}.

1561 \def\bbl@declare@ttribute#1#2#3{%
1562 \bbl@xin@{,#2,}{,\BabelModifiers,}%
1563 \ifin@
1564 \AfterBabelLanguage{#1}{\languageattribute{#1}{#2}}%
1565 \fi
1566 \bbl@add@list\bbl@attributes{#1-#2}%
1567 \expandafter\def\csname#1@attr@#2\endcsname{#3}}

96
\bbl@ifattributeset This internal macro has 4 arguments. It can be used to interpret TEX code based on whether a certain
attribute was set. This command should appear inside the argument to \AtBeginDocument because
the attributes are set in the document preamble, after babel is loaded.
The first argument is the language, the second argument the attribute being checked, and the third
and fourth arguments are the true and false clauses.

1568 \def\bbl@ifattributeset#1#2#3#4{%
1569 \ifx\bbl@known@attribs\@undefined
1570 \in@false
1571 \else
1572 \bbl@xin@{,#1-#2,}{,\bbl@known@attribs,}%
1573 \fi
1574 \ifin@
1575 \bbl@afterelse#3%
1576 \else
1577 \bbl@afterfi#4%
1578 \fi}

\bbl@ifknown@ttrib An internal macro to check whether a given language/attribute is known. The macro takes 4
arguments, the language/attribute, the attribute list, the TEX-code to be executed when the attribute is
known and the TEX-code to be executed otherwise.
We first assume the attribute is unknown. Then we loop over the list of known attributes, trying to
find a match.

1579 \def\bbl@ifknown@ttrib#1#2{%
1580 \let\bbl@tempa\@secondoftwo
1581 \bbl@loopx\bbl@tempb{#2}{%
1582 \expandafter\in@\expandafter{\expandafter,\bbl@tempb,}{,#1,}%
1583 \ifin@
1584 \let\bbl@tempa\@firstoftwo
1585 \else
1586 \fi}%
1587 \bbl@tempa}

\bbl@clear@ttribs This macro removes all the attribute code from LATEX’s memory at \begin{document} time (if any is
present).

1588 \def\bbl@clear@ttribs{%
1589 \ifx\bbl@attributes\@undefined\else
1590 \bbl@loopx\bbl@tempa{\bbl@attributes}{%
1591 \expandafter\bbl@clear@ttrib\bbl@tempa.
1592 }%
1593 \let\bbl@attributes\@undefined
1594 \fi}
1595 \def\bbl@clear@ttrib#1-#2.{%
1596 \expandafter\let\csname#1@attr@#2\endcsname\@undefined}
1597 \AtBeginDocument{\bbl@clear@ttribs}

7.7 Support for saving macro definitions


To save the meaning of control sequences using \babel@save, we use temporary control sequences.
To save hash table entries for these control sequences, we don’t use the name of the control sequence
to be saved to construct the temporary name. Instead we simply use the value of a counter, which is
reset to zero each time we begin to save new values. This works well because we release the saved
meanings before we begin to save a new set of control sequence meanings (see \selectlanguage
and \originalTeX). Note undefined macros are not undefined any more when saved – they are
\relax’ed.

\babel@savecnt The initialization of a new save cycle: reset the counter to zero.
\babel@beginsave
1598 \bbl@trace{Macros for saving definitions}
1599 \def\babel@beginsave{\babel@savecnt\z@}

Before it’s forgotten, allocate the counter and initialize all.

1600 \newcount\babel@savecnt
1601 \babel@beginsave

97
\babel@save The macro \babel@savehcsnamei saves the current meaning of the control sequence hcsnamei to
\babel@savevariable \originalTeX32 . To do this, we let the current meaning to a temporary control sequence, the restore
commands are appended to \originalTeX and the counter is incremented. The macro
\babel@savevariablehvariablei saves the value of the variable. hvariablei can be anything allowed
after the \the primitive.

1602 \def\babel@save#1{%
1603 \expandafter\let\csname babel@\number\babel@savecnt\endcsname#1\relax
1604 \toks@\expandafter{\originalTeX\let#1=}%
1605 \bbl@exp{%
1606 \def\\\originalTeX{\the\toks@\<babel@\number\babel@savecnt>\relax}}%
1607 \advance\babel@savecnt\@ne}
1608 \def\babel@savevariable#1{%
1609 \toks@\expandafter{\originalTeX #1=}%
1610 \bbl@exp{\def\\\originalTeX{\the\toks@\the#1\relax}}}

\bbl@frenchspacing Some languages need to have \frenchspacing in effect. Others don’t want that. The command
\bbl@nonfrenchspacing \bbl@frenchspacing switches it on when it isn’t already in effect and \bbl@nonfrenchspacing
switches it off if necessary. A more refined way to switch the catcodes is done with ini files. Here an
auxiliary macro is defined, but the main part is in \babelprovide. This new method should be
ideally the default one.

1611 \def\bbl@frenchspacing{%
1612 \ifnum\the\sfcode`\.=\@m
1613 \let\bbl@nonfrenchspacing\relax
1614 \else
1615 \frenchspacing
1616 \let\bbl@nonfrenchspacing\nonfrenchspacing
1617 \fi}
1618 \let\bbl@nonfrenchspacing\nonfrenchspacing
1619 \let\bbl@elt\relax
1620 \edef\bbl@fs@chars{%
1621 \bbl@elt{\string.}\@m{3000}\bbl@elt{\string?}\@m{3000}%
1622 \bbl@elt{\string!}\@m{3000}\bbl@elt{\string:}\@m{2000}%
1623 \bbl@elt{\string;}\@m{1500}\bbl@elt{\string,}\@m{1250}}
1624 \def\bbl@pre@fs{%
1625 \def\bbl@elt##1##2##3{\sfcode`##1=\the\sfcode`##1\relax}%
1626 \edef\bbl@save@sfcodes{\bbl@fs@chars}}%
1627 \def\bbl@post@fs{%
1628 \bbl@save@sfcodes
1629 \edef\bbl@tempa{\bbl@cl{frspc}}%
1630 \edef\bbl@tempa{\expandafter\@car\bbl@tempa\@nil}%
1631 \if u\bbl@tempa % do nothing
1632 \else\if n\bbl@tempa % non french
1633 \def\bbl@elt##1##2##3{%
1634 \ifnum\sfcode`##1=##2\relax
1635 \babel@savevariable{\sfcode`##1}%
1636 \sfcode`##1=##3\relax
1637 \fi}%
1638 \bbl@fs@chars
1639 \else\if y\bbl@tempa % french
1640 \def\bbl@elt##1##2##3{%
1641 \ifnum\sfcode`##1=##3\relax
1642 \babel@savevariable{\sfcode`##1}%
1643 \sfcode`##1=##2\relax
1644 \fi}%
1645 \bbl@fs@chars
1646 \fi\fi\fi}

7.8 Short tags


\babeltags This macro is straightforward. After zapping spaces, we loop over the list and define the macros
\texthtag i and \htag i. Definitions are first expanded so that they don’t contain \csname but the
32
\originalTeX has to be expandable, i. e. you shouldn’t let it to \relax.

98
actual macro.

1647 \bbl@trace{Short tags}


1648 \def\babeltags#1{%
1649 \edef\bbl@tempa{\zap@space#1 \@empty}%
1650 \def\bbl@tempb##1=##2\@@{%
1651 \edef\bbl@tempc{%
1652 \noexpand\newcommand
1653 \expandafter\noexpand\csname ##1\endcsname{%
1654 \noexpand\protect
1655 \expandafter\noexpand\csname otherlanguage*\endcsname{##2}}
1656 \noexpand\newcommand
1657 \expandafter\noexpand\csname text##1\endcsname{%
1658 \noexpand\foreignlanguage{##2}}}
1659 \bbl@tempc}%
1660 \bbl@for\bbl@tempa\bbl@tempa{%
1661 \expandafter\bbl@tempb\bbl@tempa\@@}}

7.9 Hyphens
\babelhyphenation This macro saves hyphenation exceptions. Two macros are used to store them: \bbl@hyphenation@
for the global ones and \bbl@hyphenation<lang> for language ones. See \bbl@patterns above for
further details. We make sure there is a space between words when multiple commands are used.

1662 \bbl@trace{Hyphens}
1663 \@onlypreamble\babelhyphenation
1664 \AtEndOfPackage{%
1665 \newcommand\babelhyphenation[2][\@empty]{%
1666 \ifx\bbl@hyphenation@\relax
1667 \let\bbl@hyphenation@\@empty
1668 \fi
1669 \ifx\bbl@hyphlist\@empty\else
1670 \bbl@warning{%
1671 You must not intermingle \string\selectlanguage\space and\\%
1672 \string\babelhyphenation\space or some exceptions will not\\%
1673 be taken into account. Reported}%
1674 \fi
1675 \ifx\@empty#1%
1676 \protected@edef\bbl@hyphenation@{\bbl@hyphenation@\space#2}%
1677 \else
1678 \bbl@vforeach{#1}{%
1679 \def\bbl@tempa{##1}%
1680 \bbl@fixname\bbl@tempa
1681 \bbl@iflanguage\bbl@tempa{%
1682 \bbl@csarg\protected@edef{hyphenation@\bbl@tempa}{%
1683 \bbl@ifunset{bbl@hyphenation@\bbl@tempa}%
1684 {}%
1685 {\csname bbl@hyphenation@\bbl@tempa\endcsname\space}%
1686 #2}}}%
1687 \fi}}

\bbl@allowhyphens This macro makes hyphenation possible. Basically its definition is nothing more than \nobreak
\hskip 0pt plus 0pt33 .

1688 \def\bbl@allowhyphens{\ifvmode\else\nobreak\hskip\z@skip\fi}
1689 \def\bbl@t@one{T1}
1690 \def\allowhyphens{\ifx\cf@encoding\bbl@t@one\else\bbl@allowhyphens\fi}

\babelhyphen Macros to insert common hyphens. Note the space before @ in \babelhyphen. Instead of protecting it
with \DeclareRobustCommand, which could insert a \relax, we use the same procedure as
shorthands, with \active@prefix.

1691 \newcommand\babelnullhyphen{\char\hyphenchar\font}
1692 \def\babelhyphen{\active@prefix\babelhyphen\bbl@hyphen}

33
TEX begins and ends a word for hyphenation at a glue node. The penalty prevents a linebreak at this glue node.

99
1693 \def\bbl@hyphen{%
1694 \@ifstar{\bbl@hyphen@i @}{\bbl@hyphen@i\@empty}}
1695 \def\bbl@hyphen@i#1#2{%
1696 \bbl@ifunset{bbl@hy@#1#2\@empty}%
1697 {\csname bbl@#1usehyphen\endcsname{\discretionary{#2}{}{#2}}}%
1698 {\csname bbl@hy@#1#2\@empty\endcsname}}

The following two commands are used to wrap the “hyphen” and set the behavior of the rest of the
word – the version with a single @ is used when further hyphenation is allowed, while that with @@ if
no more hyphens are allowed. In both cases, if the hyphen is preceded by a positive space, breaking
after the hyphen is disallowed.
There should not be a discretionary after a hyphen at the beginning of a word, so it is prevented if
preceded by a skip. Unfortunately, this does handle cases like “(-suffix)”. \nobreak is always
preceded by \leavevmode, in case the shorthand starts a paragraph.

1699 \def\bbl@usehyphen#1{%
1700 \leavevmode
1701 \ifdim\lastskip>\z@\mbox{#1}\else\nobreak#1\fi
1702 \nobreak\hskip\z@skip}
1703 \def\bbl@@usehyphen#1{%
1704 \leavevmode\ifdim\lastskip>\z@\mbox{#1}\else#1\fi}

The following macro inserts the hyphen char.

1705 \def\bbl@hyphenchar{%
1706 \ifnum\hyphenchar\font=\m@ne
1707 \babelnullhyphen
1708 \else
1709 \char\hyphenchar\font
1710 \fi}

Finally, we define the hyphen “types”. Their names will not change, so you may use them in ldf’s.
After a space, the \mbox in \bbl@hy@nobreak is redundant.

1711 \def\bbl@hy@soft{\bbl@usehyphen{\discretionary{\bbl@hyphenchar}{}{}}}
1712 \def\bbl@hy@@soft{\bbl@@usehyphen{\discretionary{\bbl@hyphenchar}{}{}}}
1713 \def\bbl@hy@hard{\bbl@usehyphen\bbl@hyphenchar}
1714 \def\bbl@hy@@hard{\bbl@@usehyphen\bbl@hyphenchar}
1715 \def\bbl@hy@nobreak{\bbl@usehyphen{\mbox{\bbl@hyphenchar}}}
1716 \def\bbl@hy@@nobreak{\mbox{\bbl@hyphenchar}}
1717 \def\bbl@hy@repeat{%
1718 \bbl@usehyphen{%
1719 \discretionary{\bbl@hyphenchar}{\bbl@hyphenchar}{\bbl@hyphenchar}}}
1720 \def\bbl@hy@@repeat{%
1721 \bbl@@usehyphen{%
1722 \discretionary{\bbl@hyphenchar}{\bbl@hyphenchar}{\bbl@hyphenchar}}}
1723 \def\bbl@hy@empty{\hskip\z@skip}
1724 \def\bbl@hy@@empty{\discretionary{}{}{}}

\bbl@disc For some languages the macro \bbl@disc is used to ease the insertion of discretionaries for letters
that behave ‘abnormally’ at a breakpoint.

1725 \def\bbl@disc#1#2{\nobreak\discretionary{#2-}{}{#1}\bbl@allowhyphens}

7.10 Multiencoding strings


The aim following commands is to provide a commom interface for strings in several encodings.
They also contains several hooks which can be used by luatex and xetex. The code is organized here
with pseudo-guards, so we start with the basic commands.

Tools But first, a tool. It makes global a local variable. This is not the best solution, but it works.
1726 \bbl@trace{Multiencoding strings}
1727 \def\bbl@toglobal#1{\global\let#1#1}

The second one. We need to patch \@uclclist, but it is done once and only if \SetCase is used or if
strings are encoded. The code is far from satisfactory for several reasons, including the fact
\@uclclist is not a list any more. Therefore a package option is added to ignore it. Instead of

100
gobbling the macro getting the next two elements (usually \reserved@a), we pass it as argument to
\bbl@uclc. The parser is restarted inside \hlang i@bbl@uclc because we do not know how many
expansions are necessary (depends on whether strings are encoded). The last part is tricky – when
uppercasing, we have:

\let\bbl@tolower\@empty\bbl@toupper\@empty

and starts over (and similarly when lowercasing).

1728 \@ifpackagewith{babel}{nocase}%
1729 {\let\bbl@patchuclc\relax}%
1730 {\def\bbl@patchuclc{%
1731 \global\let\bbl@patchuclc\relax
1732 \g@addto@macro\@uclclist{\reserved@b{\reserved@b\bbl@uclc}}%
1733 \gdef\bbl@uclc##1{%
1734 \let\bbl@encoded\bbl@encoded@uclc
1735 \bbl@ifunset{\languagename @bbl@uclc}% and resumes it
1736 {##1}%
1737 {\let\bbl@tempa##1\relax % Used by LANG@bbl@uclc
1738 \csname\languagename @bbl@uclc\endcsname}%
1739 {\bbl@tolower\@empty}{\bbl@toupper\@empty}}%
1740 \gdef\bbl@tolower{\csname\languagename @bbl@lc\endcsname}%
1741 \gdef\bbl@toupper{\csname\languagename @bbl@uc\endcsname}}}
1742 % A temporary hack, for testing purposes:
1743 \def\BabelRestoreCase{%
1744 \DeclareRobustCommand{\MakeUppercase}[1]{{%
1745 \def\reserved@a####1####2{\let####1####2\reserved@a}%
1746 \def\i{I}\def\j{J}%
1747 \expandafter\reserved@a\@uclclist\reserved@b{\reserved@b\@gobble}%
1748 \let\UTF@two@octets@noexpand\@empty
1749 \let\UTF@three@octets@noexpand\@empty
1750 \let\UTF@four@octets@noexpand\@empty
1751 \protected@edef\reserved@a{\uppercase{##1}}%
1752 \reserved@a
1753 }}%
1754 \DeclareRobustCommand{\MakeLowercase}[1]{{%
1755 \def\reserved@a####1####2{\let####2####1\reserved@a}%
1756 \expandafter\reserved@a\@uclclist\reserved@b{\reserved@b\@gobble}%
1757 \let\UTF@two@octets@noexpand\@empty
1758 \let\UTF@three@octets@noexpand\@empty
1759 \let\UTF@four@octets@noexpand\@empty
1760 \protected@edef\reserved@a{\lowercase{##1}}%
1761 \reserved@a}}}

1762 hh∗More package optionsii ≡


1763 \DeclareOption{nocase}{}
1764 hh/More package optionsii

The following package options control the behavior of \SetString.

1765 hh∗More package optionsii ≡


1766 \let\bbl@opt@strings\@nnil % accept strings=value
1767 \DeclareOption{strings}{\def\bbl@opt@strings{\BabelStringsDefault}}
1768 \DeclareOption{strings=encoded}{\let\bbl@opt@strings\relax}
1769 \def\BabelStringsDefault{generic}
1770 hh/More package optionsii

Main command This is the main command. With the first use it is redefined to omit the basic
setup in subsequent blocks. We make sure strings contain actual letters in the range 128-255, not
active characters.

1771 \@onlypreamble\StartBabelCommands
1772 \def\StartBabelCommands{%
1773 \begingroup
1774 \@tempcnta="7F

101
1775 \def\bbl@tempa{%
1776 \ifnum\@tempcnta>"FF\else
1777 \catcode\@tempcnta=11
1778 \advance\@tempcnta\@ne
1779 \expandafter\bbl@tempa
1780 \fi}%
1781 \bbl@tempa
1782 hhMacros local to BabelCommandsii
1783 \def\bbl@provstring##1##2{%
1784 \providecommand##1{##2}%
1785 \bbl@toglobal##1}%
1786 \global\let\bbl@scafter\@empty
1787 \let\StartBabelCommands\bbl@startcmds
1788 \ifx\BabelLanguages\relax
1789 \let\BabelLanguages\CurrentOption
1790 \fi
1791 \begingroup
1792 \let\bbl@screset\@nnil % local flag - disable 1st stopcommands
1793 \StartBabelCommands}
1794 \def\bbl@startcmds{%
1795 \ifx\bbl@screset\@nnil\else
1796 \bbl@usehooks{stopcommands}{}%
1797 \fi
1798 \endgroup
1799 \begingroup
1800 \@ifstar
1801 {\ifx\bbl@opt@strings\@nnil
1802 \let\bbl@opt@strings\BabelStringsDefault
1803 \fi
1804 \bbl@startcmds@i}%
1805 \bbl@startcmds@i}
1806 \def\bbl@startcmds@i#1#2{%
1807 \edef\bbl@L{\zap@space#1 \@empty}%
1808 \edef\bbl@G{\zap@space#2 \@empty}%
1809 \bbl@startcmds@ii}
1810 \let\bbl@startcommands\StartBabelCommands

Parse the encoding info to get the label, input, and font parts.
Select the behavior of \SetString. Thre are two main cases, depending of if there is an optional
argument: without it and strings=encoded, strings are defined always; otherwise, they are set only
if they are still undefined (ie, fallback values). With labelled blocks and strings=encoded, define the
strings, but with another value, define strings only if the current label or font encoding is the value
of strings; otherwise (ie, no strings or a block whose label is not in strings=) do nothing.
We presume the current block is not loaded, and therefore set (above) a couple of default values to
gobble the arguments. Then, these macros are redefined if necessary according to several
parameters.

1811 \newcommand\bbl@startcmds@ii[1][\@empty]{%
1812 \let\SetString\@gobbletwo
1813 \let\bbl@stringdef\@gobbletwo
1814 \let\AfterBabelCommands\@gobble
1815 \ifx\@empty#1%
1816 \def\bbl@sc@label{generic}%
1817 \def\bbl@encstring##1##2{%
1818 \ProvideTextCommandDefault##1{##2}%
1819 \bbl@toglobal##1%
1820 \expandafter\bbl@toglobal\csname\string?\string##1\endcsname}%
1821 \let\bbl@sctest\in@true
1822 \else
1823 \let\bbl@sc@charset\space % <- zapped below
1824 \let\bbl@sc@fontenc\space % <- " "
1825 \def\bbl@tempa##1=##2\@nil{%
1826 \bbl@csarg\edef{sc@\zap@space##1 \@empty}{##2 }}%
1827 \bbl@vforeach{label=#1}{\bbl@tempa##1\@nil}%

102
1828 \def\bbl@tempa##1 ##2{% space -> comma
1829 ##1%
1830 \ifx\@empty##2\else\ifx,##1,\else,\fi\bbl@afterfi\bbl@tempa##2\fi}%
1831 \edef\bbl@sc@fontenc{\expandafter\bbl@tempa\bbl@sc@fontenc\@empty}%
1832 \edef\bbl@sc@label{\expandafter\zap@space\bbl@sc@label\@empty}%
1833 \edef\bbl@sc@charset{\expandafter\zap@space\bbl@sc@charset\@empty}%
1834 \def\bbl@encstring##1##2{%
1835 \bbl@foreach\bbl@sc@fontenc{%
1836 \bbl@ifunset{T@####1}%
1837 {}%
1838 {\ProvideTextCommand##1{####1}{##2}%
1839 \bbl@toglobal##1%
1840 \expandafter
1841 \bbl@toglobal\csname####1\string##1\endcsname}}}%
1842 \def\bbl@sctest{%
1843 \bbl@xin@{,\bbl@opt@strings,}{,\bbl@sc@label,\bbl@sc@fontenc,}}%
1844 \fi
1845 \ifx\bbl@opt@strings\@nnil % ie, no strings key -> defaults
1846 \else\ifx\bbl@opt@strings\relax % ie, strings=encoded
1847 \let\AfterBabelCommands\bbl@aftercmds
1848 \let\SetString\bbl@setstring
1849 \let\bbl@stringdef\bbl@encstring
1850 \else % ie, strings=value
1851 \bbl@sctest
1852 \ifin@
1853 \let\AfterBabelCommands\bbl@aftercmds
1854 \let\SetString\bbl@setstring
1855 \let\bbl@stringdef\bbl@provstring
1856 \fi\fi\fi
1857 \bbl@scswitch
1858 \ifx\bbl@G\@empty
1859 \def\SetString##1##2{%
1860 \bbl@error{Missing group for string \string##1}%
1861 {You must assign strings to some category, typically\\%
1862 captions or extras, but you set none}}%
1863 \fi
1864 \ifx\@empty#1%
1865 \bbl@usehooks{defaultcommands}{}%
1866 \else
1867 \@expandtwoargs
1868 \bbl@usehooks{encodedcommands}{{\bbl@sc@charset}{\bbl@sc@fontenc}}%
1869 \fi}

There are two versions of \bbl@scswitch. The first version is used when ldfs are read, and it makes
sure \hgroupihlanguagei is reset, but only once (\bbl@screset is used to keep track of this). The
second version is used in the preamble and packages loaded after babel and does nothing.
The macro \bbl@forlang loops \bbl@L but its body is executed only if the value is in
\BabelLanguages (inside babel) or \datehlanguagei is defined (after babel has been loaded). There
are also two version of \bbl@forlang. The first one skips the current iteration if the language is not
in \BabelLanguages (used in ldfs), and the second one skips undefined languages (after babel has
been loaded) .

1870 \def\bbl@forlang#1#2{%
1871 \bbl@for#1\bbl@L{%
1872 \bbl@xin@{,#1,}{,\BabelLanguages,}%
1873 \ifin@#2\relax\fi}}
1874 \def\bbl@scswitch{%
1875 \bbl@forlang\bbl@tempa{%
1876 \ifx\bbl@G\@empty\else
1877 \ifx\SetString\@gobbletwo\else
1878 \edef\bbl@GL{\bbl@G\bbl@tempa}%
1879 \bbl@xin@{,\bbl@GL,}{,\bbl@screset,}%
1880 \ifin@\else
1881 \global\expandafter\let\csname\bbl@GL\endcsname\@undefined

103
1882 \xdef\bbl@screset{\bbl@screset,\bbl@GL}%
1883 \fi
1884 \fi
1885 \fi}}
1886 \AtEndOfPackage{%
1887 \def\bbl@forlang#1#2{\bbl@for#1\bbl@L{\bbl@ifunset{date#1}{}{#2}}}%
1888 \let\bbl@scswitch\relax}
1889 \@onlypreamble\EndBabelCommands
1890 \def\EndBabelCommands{%
1891 \bbl@usehooks{stopcommands}{}%
1892 \endgroup
1893 \endgroup
1894 \bbl@scafter}
1895 \let\bbl@endcommands\EndBabelCommands

Now we define commands to be used inside \StartBabelCommands.

Strings The following macro is the actual definition of \SetString when it is “active”
First save the “switcher”. Create it if undefined. Strings are defined only if undefined (ie, like
\providescommmand). With the event stringprocess you can preprocess the string by manipulating
the value of \BabelString. If there are several hooks assigned to this event, preprocessing is done in
the same order as defined. Finally, the string is set.

1896 \def\bbl@setstring#1#2{% eg, \prefacename{<string>}


1897 \bbl@forlang\bbl@tempa{%
1898 \edef\bbl@LC{\bbl@tempa\bbl@stripslash#1}%
1899 \bbl@ifunset{\bbl@LC}% eg, \germanchaptername
1900 {\bbl@exp{%
1901 \global\\\bbl@add\<\bbl@G\bbl@tempa>{\\\bbl@scset\\#1\<\bbl@LC>}}}%
1902 {}%
1903 \def\BabelString{#2}%
1904 \bbl@usehooks{stringprocess}{}%
1905 \expandafter\bbl@stringdef
1906 \csname\bbl@LC\expandafter\endcsname\expandafter{\BabelString}}}

Now, some addtional stuff to be used when encoded strings are used. Captions then include
\bbl@encoded for string to be expanded in case transformations. It is \relax by default, but in
\MakeUppercase and \MakeLowercase its value is a modified expandable \@changed@cmd.

1907 \ifx\bbl@opt@strings\relax
1908 \def\bbl@scset#1#2{\def#1{\bbl@encoded#2}}
1909 \bbl@patchuclc
1910 \let\bbl@encoded\relax
1911 \def\bbl@encoded@uclc#1{%
1912 \@inmathwarn#1%
1913 \expandafter\ifx\csname\cf@encoding\string#1\endcsname\relax
1914 \expandafter\ifx\csname ?\string#1\endcsname\relax
1915 \TextSymbolUnavailable#1%
1916 \else
1917 \csname ?\string#1\endcsname
1918 \fi
1919 \else
1920 \csname\cf@encoding\string#1\endcsname
1921 \fi}
1922 \else
1923 \def\bbl@scset#1#2{\def#1{#2}}
1924 \fi

Define \SetStringLoop, which is actually set inside \StartBabelCommands. The current definition is
somewhat complicated because we need a count, but \count@ is not under our control (remember
\SetString may call hooks). Instead of defining a dedicated count, we just “pre-expand” its value.

1925 hh∗Macros local to BabelCommandsii ≡


1926 \def\SetStringLoop##1##2{%
1927 \def\bbl@templ####1{\expandafter\noexpand\csname##1\endcsname}%
1928 \count@\z@

104
1929 \bbl@loop\bbl@tempa{##2}{% empty items and spaces are ok
1930 \advance\count@\@ne
1931 \toks@\expandafter{\bbl@tempa}%
1932 \bbl@exp{%
1933 \\\SetString\bbl@templ{\romannumeral\count@}{\the\toks@}%
1934 \count@=\the\count@\relax}}}%
1935 hh/Macros local to BabelCommandsii

Delaying code Now the definition of \AfterBabelCommands when it is activated.


1936 \def\bbl@aftercmds#1{%
1937 \toks@\expandafter{\bbl@scafter#1}%
1938 \xdef\bbl@scafter{\the\toks@}}

Case mapping The command \SetCase provides a way to change the behavior of
\MakeUppercase and \MakeLowercase. \bbl@tempa is set by the patched \@uclclist to the parsing
command.

1939 hh∗Macros local to BabelCommandsii ≡


1940 \newcommand\SetCase[3][]{%
1941 \bbl@patchuclc
1942 \bbl@forlang\bbl@tempa{%
1943 \bbl@carg\bbl@encstring{\bbl@tempa @bbl@uclc}{\bbl@tempa##1}%
1944 \bbl@carg\bbl@encstring{\bbl@tempa @bbl@uc}{##2}%
1945 \bbl@carg\bbl@encstring{\bbl@tempa @bbl@lc}{##3}}}%
1946 hh/Macros local to BabelCommandsii

Macros to deal with case mapping for hyphenation. To decide if the document is monolingual or
multilingual, we make a rough guess – just see if there is a comma in the languages list, built in the
first pass of the package options.

1947 hh∗Macros local to BabelCommandsii ≡


1948 \newcommand\SetHyphenMap[1]{%
1949 \bbl@forlang\bbl@tempa{%
1950 \expandafter\bbl@stringdef
1951 \csname\bbl@tempa @bbl@hyphenmap\endcsname{##1}}}%
1952 hh/Macros local to BabelCommandsii

There are 3 helper macros which do most of the work for you.

1953 \newcommand\BabelLower[2]{% one to one.


1954 \ifnum\lccode#1=#2\else
1955 \babel@savevariable{\lccode#1}%
1956 \lccode#1=#2\relax
1957 \fi}
1958 \newcommand\BabelLowerMM[4]{% many-to-many
1959 \@tempcnta=#1\relax
1960 \@tempcntb=#4\relax
1961 \def\bbl@tempa{%
1962 \ifnum\@tempcnta>#2\else
1963 \@expandtwoargs\BabelLower{\the\@tempcnta}{\the\@tempcntb}%
1964 \advance\@tempcnta#3\relax
1965 \advance\@tempcntb#3\relax
1966 \expandafter\bbl@tempa
1967 \fi}%
1968 \bbl@tempa}
1969 \newcommand\BabelLowerMO[4]{% many-to-one
1970 \@tempcnta=#1\relax
1971 \def\bbl@tempa{%
1972 \ifnum\@tempcnta>#2\else
1973 \@expandtwoargs\BabelLower{\the\@tempcnta}{#4}%
1974 \advance\@tempcnta#3
1975 \expandafter\bbl@tempa
1976 \fi}%
1977 \bbl@tempa}

105
The following package options control the behavior of hyphenation mapping.

1978 hh∗More package optionsii ≡


1979 \DeclareOption{hyphenmap=off}{\chardef\bbl@opt@hyphenmap\z@}
1980 \DeclareOption{hyphenmap=first}{\chardef\bbl@opt@hyphenmap\@ne}
1981 \DeclareOption{hyphenmap=select}{\chardef\bbl@opt@hyphenmap\tw@}
1982 \DeclareOption{hyphenmap=other}{\chardef\bbl@opt@hyphenmap\thr@@}
1983 \DeclareOption{hyphenmap=other*}{\chardef\bbl@opt@hyphenmap4\relax}
1984 hh/More package optionsii

Initial setup to provide a default behavior if hypenmap is not set.

1985 \AtEndOfPackage{%
1986 \ifx\bbl@opt@hyphenmap\@undefined
1987 \bbl@xin@{,}{\bbl@language@opts}%
1988 \chardef\bbl@opt@hyphenmap\ifin@4\else\@ne\fi
1989 \fi}

This sections ends with a general tool for resetting the caption names with a unique interface. With
the old way, which mixes the switcher and the string, we convert it to the new one, which separates
these two steps.

1990 \newcommand\setlocalecaption{% TODO. Catch typos.


1991 \@ifstar\bbl@setcaption@s\bbl@setcaption@x}
1992 \def\bbl@setcaption@x#1#2#3{% language caption-name string
1993 \bbl@trim@def\bbl@tempa{#2}%
1994 \bbl@xin@{.template}{\bbl@tempa}%
1995 \ifin@
1996 \bbl@ini@captions@template{#3}{#1}%
1997 \else
1998 \edef\bbl@tempd{%
1999 \expandafter\expandafter\expandafter
2000 \strip@prefix\expandafter\meaning\csname captions#1\endcsname}%
2001 \bbl@xin@
2002 {\expandafter\string\csname #2name\endcsname}%
2003 {\bbl@tempd}%
2004 \ifin@ % Renew caption
2005 \bbl@xin@{\string\bbl@scset}{\bbl@tempd}%
2006 \ifin@
2007 \bbl@exp{%
2008 \\\bbl@ifsamestring{\bbl@tempa}{\languagename}%
2009 {\\\bbl@scset\<#2name>\<#1#2name>}%
2010 {}}%
2011 \else % Old way converts to new way
2012 \bbl@ifunset{#1#2name}%
2013 {\bbl@exp{%
2014 \\\bbl@add\<captions#1>{\def\<#2name>{\<#1#2name>}}%
2015 \\\bbl@ifsamestring{\bbl@tempa}{\languagename}%
2016 {\def\<#2name>{\<#1#2name>}}%
2017 {}}}%
2018 {}%
2019 \fi
2020 \else
2021 \bbl@xin@{\string\bbl@scset}{\bbl@tempd}% New
2022 \ifin@ % New way
2023 \bbl@exp{%
2024 \\\bbl@add\<captions#1>{\\\bbl@scset\<#2name>\<#1#2name>}%
2025 \\\bbl@ifsamestring{\bbl@tempa}{\languagename}%
2026 {\\\bbl@scset\<#2name>\<#1#2name>}%
2027 {}}%
2028 \else % Old way, but defined in the new way
2029 \bbl@exp{%
2030 \\\bbl@add\<captions#1>{\def\<#2name>{\<#1#2name>}}%
2031 \\\bbl@ifsamestring{\bbl@tempa}{\languagename}%
2032 {\def\<#2name>{\<#1#2name>}}%
2033 {}}%

106
2034 \fi%
2035 \fi
2036 \@namedef{#1#2name}{#3}%
2037 \toks@\expandafter{\bbl@captionslist}%
2038 \bbl@exp{\\\in@{\<#2name>}{\the\toks@}}%
2039 \ifin@\else
2040 \bbl@exp{\\\bbl@add\\\bbl@captionslist{\<#2name>}}%
2041 \bbl@toglobal\bbl@captionslist
2042 \fi
2043 \fi}
2044 % \def\bbl@setcaption@s#1#2#3{} % TODO. Not yet implemented (w/o 'name')

7.11 Macros common to a number of languages


\set@low@box The following macro is used to lower quotes to the same level as the comma. It prepares its
argument in box register 0.

2045 \bbl@trace{Macros related to glyphs}


2046 \def\set@low@box#1{\setbox\tw@\hbox{,}\setbox\z@\hbox{#1}%
2047 \dimen\z@\ht\z@ \advance\dimen\z@ -\ht\tw@%
2048 \setbox\z@\hbox{\lower\dimen\z@ \box\z@}\ht\z@\ht\tw@ \dp\z@\dp\tw@}

\save@sf@q The macro \save@sf@q is used to save and reset the current space factor.

2049 \def\save@sf@q#1{\leavevmode
2050 \begingroup
2051 \edef\@SF{\spacefactor\the\spacefactor}#1\@SF
2052 \endgroup}

7.12 Making glyphs available


This section makes a number of glyphs available that either do not exist in the OT1 encoding and
have to be ‘faked’, or that are not accessible through T1enc.def.

7.12.1 Quotation marks


\quotedblbase In the T1 encoding the opening double quote at the baseline is available as a separate character,
accessible via \quotedblbase. In the OT1 encoding it is not available, therefore we make it available
by lowering the normal open quote character to the baseline.

2053 \ProvideTextCommand{\quotedblbase}{OT1}{%
2054 \save@sf@q{\set@low@box{\textquotedblright\/}%
2055 \box\z@\kern-.04em\bbl@allowhyphens}}

Make sure that when an encoding other than OT1 or T1 is used this glyph can still be typeset.

2056 \ProvideTextCommandDefault{\quotedblbase}{%
2057 \UseTextSymbol{OT1}{\quotedblbase}}

\quotesinglbase We also need the single quote character at the baseline.

2058 \ProvideTextCommand{\quotesinglbase}{OT1}{%
2059 \save@sf@q{\set@low@box{\textquoteright\/}%
2060 \box\z@\kern-.04em\bbl@allowhyphens}}

Make sure that when an encoding other than OT1 or T1 is used this glyph can still be typeset.

2061 \ProvideTextCommandDefault{\quotesinglbase}{%
2062 \UseTextSymbol{OT1}{\quotesinglbase}}

\guillemetleft The guillemet characters are not available in OT1 encoding. They are faked. (Wrong names with o
\guillemetright preserved for compatibility.)

2063 \ProvideTextCommand{\guillemetleft}{OT1}{%
2064 \ifmmode
2065 \ll
2066 \else
2067 \save@sf@q{\nobreak
2068 \raise.2ex\hbox{$\scriptscriptstyle\ll$}\bbl@allowhyphens}%

107
2069 \fi}
2070 \ProvideTextCommand{\guillemetright}{OT1}{%
2071 \ifmmode
2072 \gg
2073 \else
2074 \save@sf@q{\nobreak
2075 \raise.2ex\hbox{$\scriptscriptstyle\gg$}\bbl@allowhyphens}%
2076 \fi}
2077 \ProvideTextCommand{\guillemotleft}{OT1}{%
2078 \ifmmode
2079 \ll
2080 \else
2081 \save@sf@q{\nobreak
2082 \raise.2ex\hbox{$\scriptscriptstyle\ll$}\bbl@allowhyphens}%
2083 \fi}
2084 \ProvideTextCommand{\guillemotright}{OT1}{%
2085 \ifmmode
2086 \gg
2087 \else
2088 \save@sf@q{\nobreak
2089 \raise.2ex\hbox{$\scriptscriptstyle\gg$}\bbl@allowhyphens}%
2090 \fi}

Make sure that when an encoding other than OT1 or T1 is used these glyphs can still be typeset.

2091 \ProvideTextCommandDefault{\guillemetleft}{%
2092 \UseTextSymbol{OT1}{\guillemetleft}}
2093 \ProvideTextCommandDefault{\guillemetright}{%
2094 \UseTextSymbol{OT1}{\guillemetright}}
2095 \ProvideTextCommandDefault{\guillemotleft}{%
2096 \UseTextSymbol{OT1}{\guillemotleft}}
2097 \ProvideTextCommandDefault{\guillemotright}{%
2098 \UseTextSymbol{OT1}{\guillemotright}}

\guilsinglleft The single guillemets are not available in OT1 encoding. They are faked.
\guilsinglright
2099 \ProvideTextCommand{\guilsinglleft}{OT1}{%
2100 \ifmmode
2101 <%
2102 \else
2103 \save@sf@q{\nobreak
2104 \raise.2ex\hbox{$\scriptscriptstyle<$}\bbl@allowhyphens}%
2105 \fi}
2106 \ProvideTextCommand{\guilsinglright}{OT1}{%
2107 \ifmmode
2108 >%
2109 \else
2110 \save@sf@q{\nobreak
2111 \raise.2ex\hbox{$\scriptscriptstyle>$}\bbl@allowhyphens}%
2112 \fi}

Make sure that when an encoding other than OT1 or T1 is used these glyphs can still be typeset.

2113 \ProvideTextCommandDefault{\guilsinglleft}{%
2114 \UseTextSymbol{OT1}{\guilsinglleft}}
2115 \ProvideTextCommandDefault{\guilsinglright}{%
2116 \UseTextSymbol{OT1}{\guilsinglright}}

7.12.2 Letters
\ij The dutch language uses the letter ‘ij’. It is available in T1 encoded fonts, but not in the OT1 encoded
\IJ fonts. Therefore we fake it for the OT1 encoding.

2117 \DeclareTextCommand{\ij}{OT1}{%
2118 i\kern-0.02em\bbl@allowhyphens j}
2119 \DeclareTextCommand{\IJ}{OT1}{%
2120 I\kern-0.02em\bbl@allowhyphens J}

108
2121 \DeclareTextCommand{\ij}{T1}{\char188}
2122 \DeclareTextCommand{\IJ}{T1}{\char156}

Make sure that when an encoding other than OT1 or T1 is used these glyphs can still be typeset.
2123 \ProvideTextCommandDefault{\ij}{%
2124 \UseTextSymbol{OT1}{\ij}}
2125 \ProvideTextCommandDefault{\IJ}{%
2126 \UseTextSymbol{OT1}{\IJ}}

\dj The croatian language needs the letters \dj and \DJ; they are available in the T1 encoding, but not in
\DJ the OT1 encoding by default.
Some code to construct these glyphs for the OT1 encoding was made available to me by Stipčević
Mario, ([email protected]).
2127 \def\crrtic@{\hrule height0.1ex width0.3em}
2128 \def\crttic@{\hrule height0.1ex width0.33em}
2129 \def\ddj@{%
2130 \setbox0\hbox{d}\dimen@=\ht0
2131 \advance\dimen@1ex
2132 \[email protected]\dimen@
2133 \dimen@ii\expandafter\rem@pt\the\fontdimen\@ne\font\dimen@
2134 \advance\[email protected]
2135 \leavevmode\rlap{\raise\dimen@\hbox{\kern\dimen@ii\vbox{\crrtic@}}}}
2136 \def\DDJ@{%
2137 \setbox0\hbox{D}\dimen@=.55\ht0
2138 \dimen@ii\expandafter\rem@pt\the\fontdimen\@ne\font\dimen@
2139 \advance\[email protected] % correction for the dash position
2140 \advance\[email protected]\fontdimen7\font % correction for cmtt font
2141 \dimen\thr@@\expandafter\rem@pt\the\fontdimen7\font\dimen@
2142 \leavevmode\rlap{\raise\dimen@\hbox{\kern\dimen@ii\vbox{\crttic@}}}}
2143 %
2144 \DeclareTextCommand{\dj}{OT1}{\ddj@ d}
2145 \DeclareTextCommand{\DJ}{OT1}{\DDJ@ D}

Make sure that when an encoding other than OT1 or T1 is used these glyphs can still be typeset.
2146 \ProvideTextCommandDefault{\dj}{%
2147 \UseTextSymbol{OT1}{\dj}}
2148 \ProvideTextCommandDefault{\DJ}{%
2149 \UseTextSymbol{OT1}{\DJ}}

\SS For the T1 encoding \SS is defined and selects a specific glyph from the font, but for other encodings
it is not available. Therefore we make it available here.
2150 \DeclareTextCommand{\SS}{OT1}{SS}
2151 \ProvideTextCommandDefault{\SS}{\UseTextSymbol{OT1}{\SS}}

7.12.3 Shorthands for quotation marks


Shorthands are provided for a number of different quotation marks, which make them usable both
outside and inside mathmode. They are defined with \ProvideTextCommandDefault, but this is very
likely not required because their definitions are based on encoding-dependent macros.

\glq The ‘german’ single quotes.


\grq
2152 \ProvideTextCommandDefault{\glq}{%
2153 \textormath{\quotesinglbase}{\mbox{\quotesinglbase}}}
The definition of \grq depends on the fontencoding. With T1 encoding no extra kerning is needed.
2154 \ProvideTextCommand{\grq}{T1}{%
2155 \textormath{\kern\z@\textquoteleft}{\mbox{\textquoteleft}}}
2156 \ProvideTextCommand{\grq}{TU}{%
2157 \textormath{\textquoteleft}{\mbox{\textquoteleft}}}
2158 \ProvideTextCommand{\grq}{OT1}{%
2159 \save@sf@q{\kern-.0125em
2160 \textormath{\textquoteleft}{\mbox{\textquoteleft}}%
2161 \kern.07em\relax}}
2162 \ProvideTextCommandDefault{\grq}{\UseTextSymbol{OT1}\grq}

109
\glqq The ‘german’ double quotes.
\grqq
2163 \ProvideTextCommandDefault{\glqq}{%
2164 \textormath{\quotedblbase}{\mbox{\quotedblbase}}}

The definition of \grqq depends on the fontencoding. With T1 encoding no extra kerning is needed.

2165 \ProvideTextCommand{\grqq}{T1}{%
2166 \textormath{\textquotedblleft}{\mbox{\textquotedblleft}}}
2167 \ProvideTextCommand{\grqq}{TU}{%
2168 \textormath{\textquotedblleft}{\mbox{\textquotedblleft}}}
2169 \ProvideTextCommand{\grqq}{OT1}{%
2170 \save@sf@q{\kern-.07em
2171 \textormath{\textquotedblleft}{\mbox{\textquotedblleft}}%
2172 \kern.07em\relax}}
2173 \ProvideTextCommandDefault{\grqq}{\UseTextSymbol{OT1}\grqq}

\flq The ‘french’ single guillemets.


\frq
2174 \ProvideTextCommandDefault{\flq}{%
2175 \textormath{\guilsinglleft}{\mbox{\guilsinglleft}}}
2176 \ProvideTextCommandDefault{\frq}{%
2177 \textormath{\guilsinglright}{\mbox{\guilsinglright}}}

\flqq The ‘french’ double guillemets.


\frqq
2178 \ProvideTextCommandDefault{\flqq}{%
2179 \textormath{\guillemetleft}{\mbox{\guillemetleft}}}
2180 \ProvideTextCommandDefault{\frqq}{%
2181 \textormath{\guillemetright}{\mbox{\guillemetright}}}

7.12.4 Umlauts and tremas


The command \" needs to have a different effect for different languages. For German for instance,
the ‘umlaut’ should be positioned lower than the default position for placing it over the letters a, o, u,
A, O and U. When placed over an e, i, E or I it can retain its normal position. For Dutch the same
glyph is always placed in the lower position.

\umlauthigh To be able to provide both positions of \" we provide two commands to switch the positioning, the
\umlautlow default will be \umlauthigh (the normal positioning).

2182 \def\umlauthigh{%
2183 \def\bbl@umlauta##1{\leavevmode\bgroup%
2184 \accent\csname\f@encoding dqpos\endcsname
2185 ##1\bbl@allowhyphens\egroup}%
2186 \let\bbl@umlaute\bbl@umlauta}
2187 \def\umlautlow{%
2188 \def\bbl@umlauta{\protect\lower@umlaut}}
2189 \def\umlautelow{%
2190 \def\bbl@umlaute{\protect\lower@umlaut}}
2191 \umlauthigh

\lower@umlaut The command \lower@umlaut is used to position the \" closer to the letter.
We want the umlaut character lowered, nearer to the letter. To do this we need an extra hdimeni
register.

2192 \expandafter\ifx\csname U@D\endcsname\relax


2193 \csname newdimen\endcsname\U@D
2194 \fi

The following code fools TEX’s make_accent procedure about the current x-height of the font to force
another placement of the umlaut character. First we have to save the current x-height of the font,
because we’ll change this font dimension and this is always done globally.
Then we compute the new x-height in such a way that the umlaut character is lowered to the base
character. The value of .45ex depends on the METAFONT parameters with which the fonts were
built. (Just try out, which value will look best.) If the new x-height is too low, it is not changed. Finally
we call the \accent primitive, reset the old x-height and insert the base character in the argument.

2195 \def\lower@umlaut#1{%

110
2196 \leavevmode\bgroup
2197 \U@D 1ex%
2198 {\setbox\z@\hbox{%
2199 \char\csname\f@encoding dqpos\endcsname}%
2200 \dimen@ -.45ex\advance\dimen@\ht\z@
2201 \ifdim 1ex<\dimen@ \fontdimen5\font\dimen@ \fi}%
2202 \accent\csname\f@encoding dqpos\endcsname
2203 \fontdimen5\font\U@D #1%
2204 \egroup}

For all vowels we declare \" to be a composite command which uses \bbl@umlauta or \bbl@umlaute
to position the umlaut character. We need to be sure that these definitions override the ones that are
provided when the package fontenc with option OT1 is used. Therefore these declarations are
postponed until the beginning of the document. Note these definitions only apply to some languages,
but babel sets them for all languages – you may want to redefine \bbl@umlauta and/or \bbl@umlaute
for a language in the corresponding ldf (using the babel switching mechanism, of course).
2205 \AtBeginDocument{%
2206 \DeclareTextCompositeCommand{\"}{OT1}{a}{\bbl@umlauta{a}}%
2207 \DeclareTextCompositeCommand{\"}{OT1}{e}{\bbl@umlaute{e}}%
2208 \DeclareTextCompositeCommand{\"}{OT1}{i}{\bbl@umlaute{\i}}%
2209 \DeclareTextCompositeCommand{\"}{OT1}{\i}{\bbl@umlaute{\i}}%
2210 \DeclareTextCompositeCommand{\"}{OT1}{o}{\bbl@umlauta{o}}%
2211 \DeclareTextCompositeCommand{\"}{OT1}{u}{\bbl@umlauta{u}}%
2212 \DeclareTextCompositeCommand{\"}{OT1}{A}{\bbl@umlauta{A}}%
2213 \DeclareTextCompositeCommand{\"}{OT1}{E}{\bbl@umlaute{E}}%
2214 \DeclareTextCompositeCommand{\"}{OT1}{I}{\bbl@umlaute{I}}%
2215 \DeclareTextCompositeCommand{\"}{OT1}{O}{\bbl@umlauta{O}}%
2216 \DeclareTextCompositeCommand{\"}{OT1}{U}{\bbl@umlauta{U}}}
Finally, make sure the default hyphenrules are defined (even if empty). For internal use, another
empty \language is defined. Currently used in Amharic.
2217 \ifx\l@english\@undefined
2218 \chardef\l@english\z@
2219 \fi
2220 % The following is used to cancel rules in ini files (see Amharic).
2221 \ifx\l@unhyphenated\@undefined
2222 \newlanguage\l@unhyphenated
2223 \fi

7.13 Layout
Layout is mainly intended to set bidi documents, but there is at least a tool useful in general.
2224 \bbl@trace{Bidi layout}
2225 \providecommand\IfBabelLayout[3]{#3}%
2226 \newcommand\BabelPatchSection[1]{%
2227 \@ifundefined{#1}{}{%
2228 \bbl@exp{\let\<bbl@ss@#1>\<#1>}%
2229 \@namedef{#1}{%
2230 \@ifstar{\bbl@presec@s{#1}}%
2231 {\@dblarg{\bbl@presec@x{#1}}}}}}
2232 \def\bbl@presec@x#1[#2]#3{%
2233 \bbl@exp{%
2234 \\\select@language@x{\bbl@main@language}%
2235 \\\bbl@cs{sspre@#1}%
2236 \\\bbl@cs{ss@#1}%
2237 [\\\foreignlanguage{\languagename}{\unexpanded{#2}}]%
2238 {\\\foreignlanguage{\languagename}{\unexpanded{#3}}}%
2239 \\\select@language@x{\languagename}}}
2240 \def\bbl@presec@s#1#2{%
2241 \bbl@exp{%
2242 \\\select@language@x{\bbl@main@language}%
2243 \\\bbl@cs{sspre@#1}%
2244 \\\bbl@cs{ss@#1}*%

111
2245 {\\\foreignlanguage{\languagename}{\unexpanded{#2}}}%
2246 \\\select@language@x{\languagename}}}
2247 \IfBabelLayout{sectioning}%
2248 {\BabelPatchSection{part}%
2249 \BabelPatchSection{chapter}%
2250 \BabelPatchSection{section}%
2251 \BabelPatchSection{subsection}%
2252 \BabelPatchSection{subsubsection}%
2253 \BabelPatchSection{paragraph}%
2254 \BabelPatchSection{subparagraph}%
2255 \def\babel@toc#1{%
2256 \select@language@x{\bbl@main@language}}}{}
2257 \IfBabelLayout{captions}%
2258 {\BabelPatchSection{caption}}{}

7.14 Load engine specific macros


Some macros are not defined in all engines, so, after loading the files define them if necessary to
raise an error.

2259 \bbl@trace{Input engine specific macros}


2260 \ifcase\bbl@engine
2261 \input txtbabel.def
2262 \or
2263 \input luababel.def
2264 \or
2265 \input xebabel.def
2266 \fi
2267 \providecommand\babelfont{%
2268 \bbl@error
2269 {This macro is available only in LuaLaTeX and XeLaTeX.}%
2270 {Consider switching to these engines.}}
2271 \providecommand\babelprehyphenation{%
2272 \bbl@error
2273 {This macro is available only in LuaLaTeX.}%
2274 {Consider switching to that engine.}}
2275 \ifx\babelposthyphenation\@undefined
2276 \let\babelposthyphenation\babelprehyphenation
2277 \let\babelpatterns\babelprehyphenation
2278 \let\babelcharproperty\babelprehyphenation
2279 \fi

7.15 Creating and modifying languages


\babelprovide is a general purpose tool for creating and modifying languages. It creates the
language infrastructure, and loads, if requested, an ini file. It may be used in conjunction to
previouly loaded ldf files.

2280 \bbl@trace{Creating languages and reading ini files}


2281 \let\bbl@extend@ini\@gobble
2282 \newcommand\babelprovide[2][]{%
2283 \let\bbl@savelangname\languagename
2284 \edef\bbl@savelocaleid{\the\localeid}%
2285 % Set name and locale id
2286 \edef\languagename{#2}%
2287 \bbl@id@assign
2288 % Initialize keys
2289 \bbl@vforeach{captions,date,import,main,script,language,%
2290 hyphenrules,linebreaking,justification,mapfont,maparabic,%
2291 mapdigits,intraspace,intrapenalty,onchar,transforms,alph,%
2292 Alph,labels,labels*,calendar,date}%
2293 {\bbl@csarg\let{KVP@##1}\@nnil}%
2294 \global\let\bbl@release@transforms\@empty
2295 \let\bbl@calendars\@empty

112
2296 \global\let\bbl@inidata\@empty
2297 \global\let\bbl@extend@ini\@gobble
2298 \gdef\bbl@key@list{;}%
2299 \bbl@forkv{#1}{%
2300 \in@{/}{##1}%
2301 \ifin@
2302 \global\let\bbl@extend@ini\bbl@extend@ini@aux
2303 \bbl@renewinikey##1\@@{##2}%
2304 \else
2305 \bbl@csarg\ifx{KVP@##1}\@nnil\else
2306 \bbl@error
2307 {Unknown key '##1' in \string\babelprovide}%
2308 {See the manual for valid keys}%
2309 \fi
2310 \bbl@csarg\def{KVP@##1}{##2}%
2311 \fi}%
2312 \chardef\bbl@howloaded=% 0:none; 1:ldf without ini; 2:ini
2313 \bbl@ifunset{date#2}\z@{\bbl@ifunset{bbl@llevel@#2}\@ne\tw@}%
2314 % == init ==
2315 \ifx\bbl@screset\@undefined
2316 \bbl@ldfinit
2317 \fi
2318 % == date (as option) ==
2319 % \ifx\bbl@KVP@date\@nnil\else
2320 % \fi
2321 % ==
2322 \let\bbl@lbkflag\relax % \@empty = do setup linebreak
2323 \ifcase\bbl@howloaded
2324 \let\bbl@lbkflag\@empty % new
2325 \else
2326 \ifx\bbl@KVP@hyphenrules\@nnil\else
2327 \let\bbl@lbkflag\@empty
2328 \fi
2329 \ifx\bbl@KVP@import\@nnil\else
2330 \let\bbl@lbkflag\@empty
2331 \fi
2332 \fi
2333 % == import, captions ==
2334 \ifx\bbl@KVP@import\@nnil\else
2335 \bbl@exp{\\\bbl@ifblank{\bbl@KVP@import}}%
2336 {\ifx\bbl@initoload\relax
2337 \begingroup
2338 \def\BabelBeforeIni##1##2{\gdef\bbl@KVP@import{##1}\endinput}%
2339 \bbl@input@texini{#2}%
2340 \endgroup
2341 \else
2342 \xdef\bbl@KVP@import{\bbl@initoload}%
2343 \fi}%
2344 {}%
2345 \let\bbl@KVP@date\@empty
2346 \fi
2347 \ifx\bbl@KVP@captions\@nnil
2348 \let\bbl@KVP@captions\bbl@KVP@import
2349 \fi
2350 % ==
2351 \ifx\bbl@KVP@transforms\@nnil\else
2352 \bbl@replace\bbl@KVP@transforms{ }{,}%
2353 \fi
2354 % == Load ini ==
2355 \ifcase\bbl@howloaded
2356 \bbl@provide@new{#2}%
2357 \else
2358 \bbl@ifblank{#1}%

113
2359 {}% With \bbl@load@basic below
2360 {\bbl@provide@renew{#2}}%
2361 \fi
2362 % Post tasks
2363 % ----------
2364 % == subsequent calls after the first provide for a locale ==
2365 \ifx\bbl@inidata\@empty\else
2366 \bbl@extend@ini{#2}%
2367 \fi
2368 % == ensure captions ==
2369 \ifx\bbl@KVP@captions\@nnil\else
2370 \bbl@ifunset{bbl@extracaps@#2}%
2371 {\bbl@exp{\\\babelensure[exclude=\\\today]{#2}}}%
2372 {\bbl@exp{\\\babelensure[exclude=\\\today,
2373 include=\[bbl@extracaps@#2]}]{#2}}%
2374 \bbl@ifunset{bbl@ensure@\languagename}%
2375 {\bbl@exp{%
2376 \\\DeclareRobustCommand\<bbl@ensure@\languagename>[1]{%
2377 \\\foreignlanguage{\languagename}%
2378 {####1}}}}%
2379 {}%
2380 \bbl@exp{%
2381 \\\bbl@toglobal\<bbl@ensure@\languagename>%
2382 \\\bbl@toglobal\<bbl@ensure@\languagename\space>}%
2383 \fi
2384 % ==
2385 % At this point all parameters are defined if 'import'. Now we
2386 % execute some code depending on them. But what about if nothing was
2387 % imported? We just set the basic parameters, but still loading the
2388 % whole ini file.
2389 \bbl@load@basic{#2}%
2390 % == script, language ==
2391 % Override the values from ini or defines them
2392 \ifx\bbl@KVP@script\@nnil\else
2393 \bbl@csarg\edef{sname@#2}{\bbl@KVP@script}%
2394 \fi
2395 \ifx\bbl@KVP@language\@nnil\else
2396 \bbl@csarg\edef{lname@#2}{\bbl@KVP@language}%
2397 \fi
2398 \ifcase\bbl@engine\or
2399 \bbl@ifunset{bbl@chrng@\languagename}{}%
2400 {\directlua{
2401 Babel.set_chranges_b('\bbl@cl{sbcp}', '\bbl@cl{chrng}') }}%
2402 \fi
2403 % == onchar ==
2404 \ifx\bbl@KVP@onchar\@nnil\else
2405 \bbl@luahyphenate
2406 \bbl@exp{%
2407 \\\AddToHook{env/document/before}{{\\\select@language{#2}{}}}}%
2408 \directlua{
2409 if Babel.locale_mapped == nil then
2410 Babel.locale_mapped = true
2411 Babel.linebreaking.add_before(Babel.locale_map)
2412 Babel.loc_to_scr = {}
2413 Babel.chr_to_loc = Babel.chr_to_loc or {}
2414 end
2415 Babel.locale_props[\the\localeid].letters = false
2416 }%
2417 \bbl@xin@{ letters }{ \bbl@KVP@onchar\space}%
2418 \ifin@
2419 \directlua{
2420 Babel.locale_props[\the\localeid].letters = true
2421 }%

114
2422 \fi
2423 \bbl@xin@{ ids }{ \bbl@KVP@onchar\space}%
2424 \ifin@
2425 \ifx\bbl@starthyphens\@undefined % Needed if no explicit selection
2426 \AddBabelHook{babel-onchar}{beforestart}{{\bbl@starthyphens}}%
2427 \fi
2428 \bbl@exp{\\\bbl@add\\\bbl@starthyphens
2429 {\\\bbl@patterns@lua{\languagename}}}%
2430 % TODO - error/warning if no script
2431 \directlua{
2432 if Babel.script_blocks['\bbl@cl{sbcp}'] then
2433 Babel.loc_to_scr[\the\localeid] =
2434 Babel.script_blocks['\bbl@cl{sbcp}']
2435 Babel.locale_props[\the\localeid].lc = \the\localeid\space
2436 Babel.locale_props[\the\localeid].lg = \the\@nameuse{l@\languagename}\space
2437 end
2438 }%
2439 \fi
2440 \bbl@xin@{ fonts }{ \bbl@KVP@onchar\space}%
2441 \ifin@
2442 \bbl@ifunset{bbl@lsys@\languagename}{\bbl@provide@lsys{\languagename}}{}%
2443 \bbl@ifunset{bbl@wdir@\languagename}{\bbl@provide@dirs{\languagename}}{}%
2444 \directlua{
2445 if Babel.script_blocks['\bbl@cl{sbcp}'] then
2446 Babel.loc_to_scr[\the\localeid] =
2447 Babel.script_blocks['\bbl@cl{sbcp}']
2448 end}%
2449 \ifx\bbl@mapselect\@undefined % TODO. almost the same as mapfont
2450 \AtBeginDocument{%
2451 \bbl@patchfont{{\bbl@mapselect}}%
2452 {\selectfont}}%
2453 \def\bbl@mapselect{%
2454 \let\bbl@mapselect\relax
2455 \edef\bbl@prefontid{\fontid\font}}%
2456 \def\bbl@mapdir##1{%
2457 {\def\languagename{##1}%
2458 \let\bbl@ifrestoring\@firstoftwo % To avoid font warning
2459 \bbl@switchfont
2460 \ifnum\fontid\font>\z@ % A hack, for the pgf nullfont hack
2461 \directlua{
2462 Babel.locale_props[\the\csname bbl@id@@##1\endcsname]%
2463 ['/\bbl@prefontid'] = \fontid\font\space}%
2464 \fi}}%
2465 \fi
2466 \bbl@exp{\\\bbl@add\\\bbl@mapselect{\\\bbl@mapdir{\languagename}}}%
2467 \fi
2468 % TODO - catch non-valid values
2469 \fi
2470 % == mapfont ==
2471 % For bidi texts, to switch the font based on direction
2472 \ifx\bbl@KVP@mapfont\@nnil\else
2473 \bbl@ifsamestring{\bbl@KVP@mapfont}{direction}{}%
2474 {\bbl@error{Option '\bbl@KVP@mapfont' unknown for\\%
2475 mapfont. Use 'direction'.%
2476 {See the manual for details.}}}%
2477 \bbl@ifunset{bbl@lsys@\languagename}{\bbl@provide@lsys{\languagename}}{}%
2478 \bbl@ifunset{bbl@wdir@\languagename}{\bbl@provide@dirs{\languagename}}{}%
2479 \ifx\bbl@mapselect\@undefined % TODO. See onchar.
2480 \AtBeginDocument{%
2481 \bbl@patchfont{{\bbl@mapselect}}%
2482 {\selectfont}}%
2483 \def\bbl@mapselect{%
2484 \let\bbl@mapselect\relax

115
2485 \edef\bbl@prefontid{\fontid\font}}%
2486 \def\bbl@mapdir##1{%
2487 {\def\languagename{##1}%
2488 \let\bbl@ifrestoring\@firstoftwo % avoid font warning
2489 \bbl@switchfont
2490 \directlua{Babel.fontmap
2491 [\the\csname bbl@wdir@##1\endcsname]%
2492 [\bbl@prefontid]=\fontid\font}}}%
2493 \fi
2494 \bbl@exp{\\\bbl@add\\\bbl@mapselect{\\\bbl@mapdir{\languagename}}}%
2495 \fi
2496 % == Line breaking: intraspace, intrapenalty ==
2497 % For CJK, East Asian, Southeast Asian, if interspace in ini
2498 \ifx\bbl@KVP@intraspace\@nnil\else % We can override the ini or set
2499 \bbl@csarg\edef{intsp@#2}{\bbl@KVP@intraspace}%
2500 \fi
2501 \bbl@provide@intraspace
2502 % == Line breaking: CJK quotes ==
2503 \ifcase\bbl@engine\or
2504 \bbl@xin@{/c}{/\bbl@cl{lnbrk}}%
2505 \ifin@
2506 \bbl@ifunset{bbl@quote@\languagename}{}%
2507 {\directlua{
2508 Babel.locale_props[\the\localeid].cjk_quotes = {}
2509 local cs = 'op'
2510 for c in string.utfvalues(%
2511 [[\csname bbl@quote@\languagename\endcsname]]) do
2512 if Babel.cjk_characters[c].c == 'qu' then
2513 Babel.locale_props[\the\localeid].cjk_quotes[c] = cs
2514 end
2515 cs = ( cs == 'op') and 'cl' or 'op'
2516 end
2517 }}%
2518 \fi
2519 \fi
2520 % == Line breaking: justification ==
2521 \ifx\bbl@KVP@justification\@nnil\else
2522 \let\bbl@KVP@linebreaking\bbl@KVP@justification
2523 \fi
2524 \ifx\bbl@KVP@linebreaking\@nnil\else
2525 \bbl@xin@{,\bbl@KVP@linebreaking,}%
2526 {,elongated,kashida,cjk,padding,unhyphenated,}%
2527 \ifin@
2528 \bbl@csarg\xdef
2529 {lnbrk@\languagename}{\expandafter\@car\bbl@KVP@linebreaking\@nil}%
2530 \fi
2531 \fi
2532 \bbl@xin@{/e}{/\bbl@cl{lnbrk}}%
2533 \ifin@\else\bbl@xin@{/k}{/\bbl@cl{lnbrk}}\fi
2534 \ifin@\bbl@arabicjust\fi
2535 \bbl@xin@{/p}{/\bbl@cl{lnbrk}}%
2536 \ifin@\AtBeginDocument{\@nameuse{bbl@tibetanjust}}\fi
2537 % == Line breaking: hyphenate.other.(locale|script) ==
2538 \ifx\bbl@lbkflag\@empty
2539 \bbl@ifunset{bbl@hyotl@\languagename}{}%
2540 {\bbl@csarg\bbl@replace{hyotl@\languagename}{ }{,}%
2541 \bbl@startcommands*{\languagename}{}%
2542 \bbl@csarg\bbl@foreach{hyotl@\languagename}{%
2543 \ifcase\bbl@engine
2544 \ifnum##1<257
2545 \SetHyphenMap{\BabelLower{##1}{##1}}%
2546 \fi
2547 \else

116
2548 \SetHyphenMap{\BabelLower{##1}{##1}}%
2549 \fi}%
2550 \bbl@endcommands}%
2551 \bbl@ifunset{bbl@hyots@\languagename}{}%
2552 {\bbl@csarg\bbl@replace{hyots@\languagename}{ }{,}%
2553 \bbl@csarg\bbl@foreach{hyots@\languagename}{%
2554 \ifcase\bbl@engine
2555 \ifnum##1<257
2556 \global\lccode##1=##1\relax
2557 \fi
2558 \else
2559 \global\lccode##1=##1\relax
2560 \fi}}%
2561 \fi
2562 % == Counters: maparabic ==
2563 % Native digits, if provided in ini (TeX level, xe and lua)
2564 \ifcase\bbl@engine\else
2565 \bbl@ifunset{bbl@dgnat@\languagename}{}%
2566 {\expandafter\ifx\csname bbl@dgnat@\languagename\endcsname\@empty\else
2567 \expandafter\expandafter\expandafter
2568 \bbl@setdigits\csname bbl@dgnat@\languagename\endcsname
2569 \ifx\bbl@KVP@maparabic\@nnil\else
2570 \ifx\bbl@latinarabic\@undefined
2571 \expandafter\let\expandafter\@arabic
2572 \csname bbl@counter@\languagename\endcsname
2573 \else % ie, if layout=counters, which redefines \@arabic
2574 \expandafter\let\expandafter\bbl@latinarabic
2575 \csname bbl@counter@\languagename\endcsname
2576 \fi
2577 \fi
2578 \fi}%
2579 \fi
2580 % == Counters: mapdigits ==
2581 % Native digits (lua level).
2582 \ifodd\bbl@engine
2583 \ifx\bbl@KVP@mapdigits\@nnil\else
2584 \bbl@ifunset{bbl@dgnat@\languagename}{}%
2585 {\RequirePackage{luatexbase}%
2586 \bbl@activate@preotf
2587 \directlua{
2588 Babel = Babel or {} %%% -> presets in luababel
2589 Babel.digits_mapped = true
2590 Babel.digits = Babel.digits or {}
2591 Babel.digits[\the\localeid] =
2592 table.pack(string.utfvalue('\bbl@cl{dgnat}'))
2593 if not Babel.numbers then
2594 function Babel.numbers(head)
2595 local LOCALE = Babel.attr_locale
2596 local GLYPH = node.id'glyph'
2597 local inmath = false
2598 for item in node.traverse(head) do
2599 if not inmath and item.id == GLYPH then
2600 local temp = node.get_attribute(item, LOCALE)
2601 if Babel.digits[temp] then
2602 local chr = item.char
2603 if chr > 47 and chr < 58 then
2604 item.char = Babel.digits[temp][chr-47]
2605 end
2606 end
2607 elseif item.id == node.id'math' then
2608 inmath = (item.subtype == 0)
2609 end
2610 end

117
2611 return head
2612 end
2613 end
2614 }}%
2615 \fi
2616 \fi
2617 % == Counters: alph, Alph ==
2618 % What if extras<lang> contains a \babel@save\@alph? It won't be
2619 % restored correctly when exiting the language, so we ignore
2620 % this change with the \bbl@alph@saved trick.
2621 \ifx\bbl@KVP@alph\@nnil\else
2622 \bbl@extras@wrap{\\\bbl@alph@saved}%
2623 {\let\bbl@alph@saved\@alph}%
2624 {\let\@alph\bbl@alph@saved
2625 \babel@save\@alph}%
2626 \bbl@exp{%
2627 \\\bbl@add\<extras\languagename>{%
2628 \let\\\@alph\<bbl@cntr@\bbl@KVP@alph @\languagename>}}%
2629 \fi
2630 \ifx\bbl@KVP@Alph\@nnil\else
2631 \bbl@extras@wrap{\\\bbl@Alph@saved}%
2632 {\let\bbl@Alph@saved\@Alph}%
2633 {\let\@Alph\bbl@Alph@saved
2634 \babel@save\@Alph}%
2635 \bbl@exp{%
2636 \\\bbl@add\<extras\languagename>{%
2637 \let\\\@Alph\<bbl@cntr@\bbl@KVP@Alph @\languagename>}}%
2638 \fi
2639 % == Calendars ==
2640 \ifx\bbl@KVP@calendar\@nnil
2641 \edef\bbl@KVP@calendar{\bbl@cl{calpr}}%
2642 \fi
2643 \def\bbl@tempe##1 ##2\@@{% Get first calendar
2644 \def\bbl@tempa{##1}}%
2645 \bbl@exp{\\\bbl@tempe\bbl@KVP@calendar\space\\\@@}%
2646 \def\bbl@tempe##1.##2.##3\@@{%
2647 \def\bbl@tempc{##1}%
2648 \def\bbl@tempb{##2}}%
2649 \expandafter\bbl@tempe\bbl@tempa..\@@
2650 \bbl@csarg\edef{calpr@\languagename}{%
2651 \ifx\bbl@tempc\@empty\else
2652 calendar=\bbl@tempc
2653 \fi
2654 \ifx\bbl@tempb\@empty\else
2655 ,variant=\bbl@tempb
2656 \fi}%
2657 % == require.babel in ini ==
2658 % To load or reaload the babel-*.tex, if require.babel in ini
2659 \ifx\bbl@beforestart\relax\else % But not in doc aux or body
2660 \bbl@ifunset{bbl@rqtex@\languagename}{}%
2661 {\expandafter\ifx\csname bbl@rqtex@\languagename\endcsname\@empty\else
2662 \let\BabelBeforeIni\@gobbletwo
2663 \chardef\atcatcode=\catcode`\@
2664 \catcode`\@=11\relax
2665 \bbl@input@texini{\bbl@cs{rqtex@\languagename}}%
2666 \catcode`\@=\atcatcode
2667 \let\atcatcode\relax
2668 \global\bbl@csarg\let{rqtex@\languagename}\relax
2669 \fi}%
2670 \bbl@foreach\bbl@calendars{%
2671 \bbl@ifunset{bbl@ca@##1}{%
2672 \chardef\atcatcode=\catcode`\@
2673 \catcode`\@=11\relax

118
2674 \InputIfFileExists{babel-ca-##1.tex}{}{}%
2675 \catcode`\@=\atcatcode
2676 \let\atcatcode\relax}%
2677 {}}%
2678 \fi
2679 % == frenchspacing ==
2680 \ifcase\bbl@howloaded\in@true\else\in@false\fi
2681 \ifin@\else\bbl@xin@{typography/frenchspacing}{\bbl@key@list}\fi
2682 \ifin@
2683 \bbl@extras@wrap{\\\bbl@pre@fs}%
2684 {\bbl@pre@fs}%
2685 {\bbl@post@fs}%
2686 \fi
2687 % == transforms ==
2688 \ifodd\bbl@engine
2689 \ifx\bbl@KVP@transforms\@nnil\else
2690 \def\bbl@elt##1##2##3{%
2691 \in@{$transforms.}{$##1}%
2692 \ifin@
2693 \def\bbl@tempa{##1}%
2694 \bbl@replace\bbl@tempa{transforms.}{}%
2695 \bbl@carg\bbl@transforms{babel\bbl@tempa}{##2}{##3}%
2696 \fi}%
2697 \csname bbl@inidata@\languagename\endcsname
2698 \bbl@release@transforms\relax % \relax closes the last item.
2699 \fi
2700 \fi
2701 % == main ==
2702 \ifx\bbl@KVP@main\@nnil % Restore only if not 'main'
2703 \let\languagename\bbl@savelangname
2704 \chardef\localeid\bbl@savelocaleid\relax
2705 \fi}

Depending on whether or not the language exists (based on \date<language>), we define two
macros. Remember \bbl@startcommands opens a group.

2706 \def\bbl@provide@new#1{%
2707 \@namedef{date#1}{}% marks lang exists - required by \StartBabelCommands
2708 \@namedef{extras#1}{}%
2709 \@namedef{noextras#1}{}%
2710 \bbl@startcommands*{#1}{captions}%
2711 \ifx\bbl@KVP@captions\@nnil % and also if import, implicit
2712 \def\bbl@tempb##1{% elt for \bbl@captionslist
2713 \ifx##1\@empty\else
2714 \bbl@exp{%
2715 \\\SetString\\##1{%
2716 \\\bbl@nocaption{\bbl@stripslash##1}{#1\bbl@stripslash##1}}}%
2717 \expandafter\bbl@tempb
2718 \fi}%
2719 \expandafter\bbl@tempb\bbl@captionslist\@empty
2720 \else
2721 \ifx\bbl@initoload\relax
2722 \bbl@read@ini{\bbl@KVP@captions}2% % Here letters cat = 11
2723 \else
2724 \bbl@read@ini{\bbl@initoload}2% % Same
2725 \fi
2726 \fi
2727 \StartBabelCommands*{#1}{date}%
2728 \ifx\bbl@KVP@date\@nnil
2729 \bbl@exp{%
2730 \\\SetString\\\today{\\\bbl@nocaption{today}{#1today}}}%
2731 \else
2732 \bbl@savetoday
2733 \bbl@savedate

119
2734 \fi
2735 \bbl@endcommands
2736 \bbl@load@basic{#1}%
2737 % == hyphenmins == (only if new)
2738 \bbl@exp{%
2739 \gdef\<#1hyphenmins>{%
2740 {\bbl@ifunset{bbl@lfthm@#1}{2}{\bbl@cs{lfthm@#1}}}%
2741 {\bbl@ifunset{bbl@rgthm@#1}{3}{\bbl@cs{rgthm@#1}}}}}%
2742 % == hyphenrules (also in renew) ==
2743 \bbl@provide@hyphens{#1}%
2744 \ifx\bbl@KVP@main\@nnil\else
2745 \expandafter\main@language\expandafter{#1}%
2746 \fi}
2747 %
2748 \def\bbl@provide@renew#1{%
2749 \ifx\bbl@KVP@captions\@nnil\else
2750 \StartBabelCommands*{#1}{captions}%
2751 \bbl@read@ini{\bbl@KVP@captions}2% % Here all letters cat = 11
2752 \EndBabelCommands
2753 \fi
2754 \ifx\bbl@KVP@date\@nnil\else
2755 \StartBabelCommands*{#1}{date}%
2756 \bbl@savetoday
2757 \bbl@savedate
2758 \EndBabelCommands
2759 \fi
2760 % == hyphenrules (also in new) ==
2761 \ifx\bbl@lbkflag\@empty
2762 \bbl@provide@hyphens{#1}%
2763 \fi}
Load the basic parameters (ids, typography, counters, and a few more), while captions and dates are
left out. But it may happen some data has been loaded before automatically, so we first discard the
saved values. (TODO. But preserving previous values would be useful.)
2764 \def\bbl@load@basic#1{%
2765 \ifcase\bbl@howloaded\or\or
2766 \ifcase\csname bbl@llevel@\languagename\endcsname
2767 \bbl@csarg\let{lname@\languagename}\relax
2768 \fi
2769 \fi
2770 \bbl@ifunset{bbl@lname@#1}%
2771 {\def\BabelBeforeIni##1##2{%
2772 \begingroup
2773 \let\bbl@ini@captions@aux\@gobbletwo
2774 \def\bbl@inidate ####1.####2.####3.####4\relax ####5####6{}%
2775 \bbl@read@ini{##1}1%
2776 \ifx\bbl@initoload\relax\endinput\fi
2777 \endgroup}%
2778 \begingroup % boxed, to avoid extra spaces:
2779 \ifx\bbl@initoload\relax
2780 \bbl@input@texini{#1}%
2781 \else
2782 \setbox\z@\hbox{\BabelBeforeIni{\bbl@initoload}{}}%
2783 \fi
2784 \endgroup}%
2785 {}}
The hyphenrules option is handled with an auxiliary macro.
2786 \def\bbl@provide@hyphens#1{%
2787 \let\bbl@tempa\relax
2788 \ifx\bbl@KVP@hyphenrules\@nnil\else
2789 \bbl@replace\bbl@KVP@hyphenrules{ }{,}%
2790 \bbl@foreach\bbl@KVP@hyphenrules{%
2791 \ifx\bbl@tempa\relax % if not yet found

120
2792 \bbl@ifsamestring{##1}{+}%
2793 {{\bbl@exp{\\\addlanguage\<l@##1>}}}%
2794 {}%
2795 \bbl@ifunset{l@##1}%
2796 {}%
2797 {\bbl@exp{\let\bbl@tempa\<l@##1>}}%
2798 \fi}%
2799 \fi
2800 \ifx\bbl@tempa\relax % if no opt or no language in opt found
2801 \ifx\bbl@KVP@import\@nnil
2802 \ifx\bbl@initoload\relax\else
2803 \bbl@exp{% and hyphenrules is not empty
2804 \\\bbl@ifblank{\bbl@cs{hyphr@#1}}%
2805 {}%
2806 {\let\\\bbl@tempa\<l@\bbl@cl{hyphr}>}}%
2807 \fi
2808 \else % if importing
2809 \bbl@exp{% and hyphenrules is not empty
2810 \\\bbl@ifblank{\bbl@cs{hyphr@#1}}%
2811 {}%
2812 {\let\\\bbl@tempa\<l@\bbl@cl{hyphr}>}}%
2813 \fi
2814 \fi
2815 \bbl@ifunset{bbl@tempa}% ie, relax or undefined
2816 {\bbl@ifunset{l@#1}% no hyphenrules found - fallback
2817 {\bbl@exp{\\\adddialect\<l@#1>\language}}%
2818 {}}% so, l@<lang> is ok - nothing to do
2819 {\bbl@exp{\\\adddialect\<l@#1>\bbl@tempa}}}% found in opt list or ini
The reader of babel-...tex files. We reset temporarily some catcodes.
2820 \def\bbl@input@texini#1{%
2821 \bbl@bsphack
2822 \bbl@exp{%
2823 \catcode`\\\%=14 \catcode`\\\\=0
2824 \catcode`\\\{=1 \catcode`\\\}=2
2825 \lowercase{\\\InputIfFileExists{babel-#1.tex}{}{}}%
2826 \catcode`\\\%=\the\catcode`\%\relax
2827 \catcode`\\\\=\the\catcode`\\\relax
2828 \catcode`\\\{=\the\catcode`\{\relax
2829 \catcode`\\\}=\the\catcode`\}\relax}%
2830 \bbl@esphack}
The following macros read and store ini files (but don’t process them). For each line, there are 3
possible actions: ignore if starts with ;, switch section if starts with [, and store otherwise. There are
used in the first step of \bbl@read@ini.
2831 \def\bbl@iniline#1\bbl@iniline{%
2832 \@ifnextchar[\bbl@inisect{\@ifnextchar;\bbl@iniskip\bbl@inistore}#1\@@}% ]
2833 \def\bbl@inisect[#1]#2\@@{\def\bbl@section{#1}}
2834 \def\bbl@iniskip#1\@@{}% if starts with ;
2835 \def\bbl@inistore#1=#2\@@{% full (default)
2836 \bbl@trim@def\bbl@tempa{#1}%
2837 \bbl@trim\toks@{#2}%
2838 \bbl@xin@{;\bbl@section/\bbl@tempa;}{\bbl@key@list}%
2839 \ifin@\else
2840 \bbl@xin@{,identification/include.}%
2841 {,\bbl@section/\bbl@tempa}%
2842 \ifin@\edef\bbl@required@inis{\the\toks@}\fi
2843 \bbl@exp{%
2844 \\\g@addto@macro\\\bbl@inidata{%
2845 \\\bbl@elt{\bbl@section}{\bbl@tempa}{\the\toks@}}}%
2846 \fi}
2847 \def\bbl@inistore@min#1=#2\@@{% minimal (maybe set in \bbl@read@ini)
2848 \bbl@trim@def\bbl@tempa{#1}%
2849 \bbl@trim\toks@{#2}%

121
2850 \bbl@xin@{.identification.}{.\bbl@section.}%
2851 \ifin@
2852 \bbl@exp{\\\g@addto@macro\\\bbl@inidata{%
2853 \\\bbl@elt{identification}{\bbl@tempa}{\the\toks@}}}%
2854 \fi}

Now, the ‘main loop’, which **must be executed inside a group**. At this point, \bbl@inidata may
contain data declared in \babelprovide, with ‘slashed’ keys. There are 3 steps: first read the ini file
and store it; then traverse the stored values, and process some groups if required (date, captions,
labels, counters); finally, ‘export’ some values by defining global macros (identification, typography,
characters, numbers). The second argument is 0 when called to read the minimal data for fonts; with
\babelprovide it’s either 1 or 2.

2855 \def\bbl@loop@ini{%
2856 \loop
2857 \if T\ifeof\bbl@readstream F\fi T\relax % Trick, because inside \loop
2858 \endlinechar\m@ne
2859 \read\bbl@readstream to \bbl@line
2860 \endlinechar`\^^M
2861 \ifx\bbl@line\@empty\else
2862 \expandafter\bbl@iniline\bbl@line\bbl@iniline
2863 \fi
2864 \repeat}
2865 \ifx\bbl@readstream\@undefined
2866 \csname newread\endcsname\bbl@readstream
2867 \fi
2868 \def\bbl@read@ini#1#2{%
2869 \global\let\bbl@extend@ini\@gobble
2870 \openin\bbl@readstream=babel-#1.ini
2871 \ifeof\bbl@readstream
2872 \bbl@error
2873 {There is no ini file for the requested language\\%
2874 (#1: \languagename). Perhaps you misspelled it or your\\%
2875 installation is not complete.}%
2876 {Fix the name or reinstall babel.}%
2877 \else
2878 % == Store ini data in \bbl@inidata ==
2879 \catcode`\[=12 \catcode`\]=12 \catcode`\==12 \catcode`\&=12
2880 \catcode`\;=12 \catcode`\|=12 \catcode`\%=14 \catcode`\-=12
2881 \bbl@info{Importing
2882 \ifcase#2font and identification \or basic \fi
2883 data for \languagename\\%
2884 from babel-#1.ini. Reported}%
2885 \ifnum#2=\z@
2886 \global\let\bbl@inidata\@empty
2887 \let\bbl@inistore\bbl@inistore@min % Remember it's local
2888 \fi
2889 \def\bbl@section{identification}%
2890 \let\bbl@required@inis\@empty
2891 \bbl@exp{\\\bbl@inistore tag.ini=#1\\\@@}%
2892 \bbl@inistore load.level=#2\@@
2893 \bbl@loop@ini
2894 \ifx\bbl@required@inis\@empty\else
2895 \bbl@replace\bbl@required@inis{ }{,}%
2896 \bbl@foreach\bbl@required@inis{%
2897 \openin\bbl@readstream=##1.ini
2898 \bbl@loop@ini}%
2899 \fi
2900 % == Process stored data ==
2901 \bbl@csarg\xdef{lini@\languagename}{#1}%
2902 \bbl@read@ini@aux
2903 % == 'Export' data ==
2904 \bbl@ini@exports{#2}%
2905 \global\bbl@csarg\let{inidata@\languagename}\bbl@inidata

122
2906 \global\let\bbl@inidata\@empty
2907 \bbl@exp{\\\bbl@add@list\\\bbl@ini@loaded{\languagename}}%
2908 \bbl@toglobal\bbl@ini@loaded
2909 \fi}
2910 \def\bbl@read@ini@aux{%
2911 \let\bbl@savestrings\@empty
2912 \let\bbl@savetoday\@empty
2913 \let\bbl@savedate\@empty
2914 \def\bbl@elt##1##2##3{%
2915 \def\bbl@section{##1}%
2916 \in@{=date.}{=##1}% Find a better place
2917 \ifin@
2918 \bbl@ifunset{bbl@inikv@##1}%
2919 {\bbl@ini@calendar{##1}}%
2920 {}%
2921 \fi
2922 \in@{=identification/extension.}{=##1/##2}%
2923 \ifin@
2924 \bbl@ini@extension{##2}%
2925 \fi
2926 \bbl@ifunset{bbl@inikv@##1}{}%
2927 {\csname bbl@inikv@##1\endcsname{##2}{##3}}}%
2928 \bbl@inidata}
A variant to be used when the ini file has been already loaded, because it’s not the first
\babelprovide for this language.
2929 \def\bbl@extend@ini@aux#1{%
2930 \bbl@startcommands*{#1}{captions}%
2931 % Activate captions/... and modify exports
2932 \bbl@csarg\def{[email protected]}##1##2{%
2933 \setlocalecaption{#1}{##1}{##2}}%
2934 \def\bbl@inikv@captions##1##2{%
2935 \bbl@ini@captions@aux{##1}{##2}}%
2936 \def\bbl@stringdef##1##2{\gdef##1{##2}}%
2937 \def\bbl@exportkey##1##2##3{%
2938 \bbl@ifunset{bbl@@kv@##2}{}%
2939 {\expandafter\ifx\csname bbl@@kv@##2\endcsname\@empty\else
2940 \bbl@exp{\global\let\<bbl@##1@\languagename>\<bbl@@kv@##2>}%
2941 \fi}}%
2942 % As with \bbl@read@ini, but with some changes
2943 \bbl@read@ini@aux
2944 \bbl@ini@exports\tw@
2945 % Update inidata@lang by pretending the ini is read.
2946 \def\bbl@elt##1##2##3{%
2947 \def\bbl@section{##1}%
2948 \bbl@iniline##2=##3\bbl@iniline}%
2949 \csname bbl@inidata@#1\endcsname
2950 \global\bbl@csarg\let{inidata@#1}\bbl@inidata
2951 \StartBabelCommands*{#1}{date}% And from the import stuff
2952 \def\bbl@stringdef##1##2{\gdef##1{##2}}%
2953 \bbl@savetoday
2954 \bbl@savedate
2955 \bbl@endcommands}
A somewhat hackish tool to handle calendar sections. TODO. To be improved.
2956 \def\bbl@ini@calendar#1{%
2957 \lowercase{\def\bbl@tempa{=#1=}}%
2958 \bbl@replace\bbl@tempa{=date.gregorian}{}%
2959 \bbl@replace\bbl@tempa{=date.}{}%
2960 \in@{.licr=}{#1=}%
2961 \ifin@
2962 \ifcase\bbl@engine
2963 \bbl@replace\bbl@tempa{.licr=}{}%
2964 \else

123
2965 \let\bbl@tempa\relax
2966 \fi
2967 \fi
2968 \ifx\bbl@tempa\relax\else
2969 \bbl@replace\bbl@tempa{=}{}%
2970 \ifx\bbl@tempa\@empty\else
2971 \xdef\bbl@calendars{\bbl@calendars,\bbl@tempa}%
2972 \fi
2973 \bbl@exp{%
2974 \def\<bbl@inikv@#1>####1####2{%
2975 \\\bbl@inidate####1...\relax{####2}{\bbl@tempa}}}%
2976 \fi}
A key with a slash in \babelprovide replaces the value in the ini file (which is ignored altogether).
The mechanism is simple (but suboptimal): add the data to the ini one (at this point the ini file has
not yet been read), and define a dummy macro. When the ini file is read, just skip the
corresponding key and reset the macro (in \bbl@inistore above).
2977 \def\bbl@renewinikey#1/#2\@@#3{%
2978 \edef\bbl@tempa{\zap@space #1 \@empty}% section
2979 \edef\bbl@tempb{\zap@space #2 \@empty}% key
2980 \bbl@trim\toks@{#3}% value
2981 \bbl@exp{%
2982 \edef\\\bbl@key@list{\bbl@key@list \bbl@tempa/\bbl@tempb;}%
2983 \\\g@addto@macro\\\bbl@inidata{%
2984 \\\bbl@elt{\bbl@tempa}{\bbl@tempb}{\the\toks@}}}}%
The previous assignments are local, so we need to export them. If the value is empty, we can provide
a default value.
2985 \def\bbl@exportkey#1#2#3{%
2986 \bbl@ifunset{bbl@@kv@#2}%
2987 {\bbl@csarg\gdef{#1@\languagename}{#3}}%
2988 {\expandafter\ifx\csname bbl@@kv@#2\endcsname\@empty
2989 \bbl@csarg\gdef{#1@\languagename}{#3}%
2990 \else
2991 \bbl@exp{\global\let\<bbl@#1@\languagename>\<bbl@@kv@#2>}%
2992 \fi}}
Key-value pairs are treated differently depending on the section in the ini file. The following macros
are the readers for identification and typography. Note \bbl@ini@exports is called always (via
\bbl@inisec), while \bbl@after@ini must be called explicitly after \bbl@read@ini if necessary.
2993 \def\bbl@iniwarning#1{%
2994 \bbl@ifunset{bbl@@[email protected]#1}{}%
2995 {\bbl@warning{%
2996 From babel-\bbl@cs{lini@\languagename}.ini:\\%
2997 \bbl@cs{@[email protected]#1}\\%
2998 Reported }}}
2999 %
3000 \let\bbl@release@transforms\@empty

BCP 47 extensions are separated by a single letter (eg, latin-x-medieval. The following macro
handles this special case to create correctly the correspondig info.
3001 \def\bbl@ini@extension#1{%
3002 \def\bbl@tempa{#1}%
3003 \bbl@replace\bbl@tempa{extension.}{}%
3004 \bbl@replace\bbl@tempa{.tag.bcp47}{}%
3005 \bbl@ifunset{bbl@info@#1}%
3006 {\bbl@csarg\xdef{info@#1}{ext/\bbl@tempa}%
3007 \bbl@exp{%
3008 \\\g@addto@macro\\\bbl@moreinfo{%
3009 \\\bbl@exportkey{ext/\bbl@tempa}{identification.#1}{}}}}%
3010 {}}
3011 \let\bbl@moreinfo\@empty
3012 %
3013 \def\bbl@ini@exports#1{%

124
3014 % Identification always exported
3015 \bbl@iniwarning{}%
3016 \ifcase\bbl@engine
3017 \bbl@iniwarning{.pdflatex}%
3018 \or
3019 \bbl@iniwarning{.lualatex}%
3020 \or
3021 \bbl@iniwarning{.xelatex}%
3022 \fi%
3023 \bbl@exportkey{llevel}{identification.load.level}{}%
3024 \bbl@exportkey{elname}{identification.name.english}{}%
3025 \bbl@exp{\\\bbl@exportkey{lname}{identification.name.opentype}%
3026 {\csname bbl@elname@\languagename\endcsname}}%
3027 \bbl@exportkey{tbcp}{identification.tag.bcp47}{}%
3028 \bbl@exportkey{lbcp}{identification.language.tag.bcp47}{}%
3029 \bbl@exportkey{lotf}{identification.tag.opentype}{dflt}%
3030 \bbl@exportkey{esname}{identification.script.name}{}%
3031 \bbl@exp{\\\bbl@exportkey{sname}{identification.script.name.opentype}%
3032 {\csname bbl@esname@\languagename\endcsname}}%
3033 \bbl@exportkey{sbcp}{identification.script.tag.bcp47}{}%
3034 \bbl@exportkey{sotf}{identification.script.tag.opentype}{DFLT}%
3035 \bbl@exportkey{rbcp}{identification.region.tag.bcp47}{}%
3036 \bbl@exportkey{vbcp}{identification.variant.tag.bcp47}{}%
3037 \bbl@moreinfo
3038 % Also maps bcp47 -> languagename
3039 \ifbbl@bcptoname
3040 \bbl@csarg\xdef{bcp@map@\bbl@cl{tbcp}}{\languagename}%
3041 \fi
3042 % Conditional
3043 \ifnum#1>\z@ % 0 = only info, 1, 2 = basic, (re)new
3044 \bbl@exportkey{calpr}{date.calendar.preferred}{}%
3045 \bbl@exportkey{lnbrk}{typography.linebreaking}{h}%
3046 \bbl@exportkey{hyphr}{typography.hyphenrules}{}%
3047 \bbl@exportkey{lfthm}{typography.lefthyphenmin}{2}%
3048 \bbl@exportkey{rgthm}{typography.righthyphenmin}{3}%
3049 \bbl@exportkey{prehc}{typography.prehyphenchar}{}%
3050 \bbl@exportkey{hyotl}{typography.hyphenate.other.locale}{}%
3051 \bbl@exportkey{hyots}{typography.hyphenate.other.script}{}%
3052 \bbl@exportkey{intsp}{typography.intraspace}{}%
3053 \bbl@exportkey{frspc}{typography.frenchspacing}{u}%
3054 \bbl@exportkey{chrng}{characters.ranges}{}%
3055 \bbl@exportkey{quote}{characters.delimiters.quotes}{}%
3056 \bbl@exportkey{dgnat}{numbers.digits.native}{}%
3057 \ifnum#1=\tw@ % only (re)new
3058 \bbl@exportkey{rqtex}{identification.require.babel}{}%
3059 \bbl@toglobal\bbl@savetoday
3060 \bbl@toglobal\bbl@savedate
3061 \bbl@savestrings
3062 \fi
3063 \fi}

A shared handler for key=val lines to be stored in \bbl@@kv@<section>.<key>.

3064 \def\bbl@inikv#1#2{% key=value


3065 \toks@{#2}% This hides #'s from ini values
3066 \bbl@csarg\edef{@kv@\bbl@section.#1}{\the\toks@}}

By default, the following sections are just read. Actions are taken later.

3067 \let\bbl@inikv@identification\bbl@inikv
3068 \let\bbl@inikv@date\bbl@inikv
3069 \let\bbl@inikv@typography\bbl@inikv
3070 \let\bbl@inikv@characters\bbl@inikv
3071 \let\bbl@inikv@numbers\bbl@inikv

Additive numerals require an additional definition. When .1 is found, two macros are defined – the

125
basic one, without .1 called by \localenumeral, and another one preserving the trailing .1 for the
‘units’.

3072 \def\bbl@inikv@counters#1#2{%
3073 \bbl@ifsamestring{#1}{digits}%
3074 {\bbl@error{The counter name 'digits' is reserved for mapping\\%
3075 decimal digits}%
3076 {Use another name.}}%
3077 {}%
3078 \def\bbl@tempc{#1}%
3079 \bbl@trim@def{\bbl@tempb*}{#2}%
3080 \in@{.1$}{#1$}%
3081 \ifin@
3082 \bbl@replace\bbl@tempc{.1}{}%
3083 \bbl@csarg\protected@xdef{cntr@\bbl@tempc @\languagename}{%
3084 \noexpand\bbl@alphnumeral{\bbl@tempc}}%
3085 \fi
3086 \in@{.F.}{#1}%
3087 \ifin@\else\in@{.S.}{#1}\fi
3088 \ifin@
3089 \bbl@csarg\protected@xdef{cntr@#1@\languagename}{\bbl@tempb*}%
3090 \else
3091 \toks@{}% Required by \bbl@buildifcase, which returns \bbl@tempa
3092 \expandafter\bbl@buildifcase\bbl@tempb* \\ % Space after \\
3093 \bbl@csarg{\global\expandafter\let}{cntr@#1@\languagename}\bbl@tempa
3094 \fi}

Now captions and captions.licr, depending on the engine. And below also for dates. They rely on
a few auxiliary macros. It is expected the ini file provides the complete set in Unicode and LICR, in
that order.

3095 \ifcase\bbl@engine
3096 \bbl@csarg\def{[email protected]}#1#2{%
3097 \bbl@ini@captions@aux{#1}{#2}}
3098 \else
3099 \def\bbl@inikv@captions#1#2{%
3100 \bbl@ini@captions@aux{#1}{#2}}
3101 \fi

The auxiliary macro for captions define \<caption>name.

3102 \def\bbl@ini@captions@template#1#2{% string language tempa=capt-name


3103 \bbl@replace\bbl@tempa{.template}{}%
3104 \def\bbl@toreplace{#1{}}%
3105 \bbl@replace\bbl@toreplace{[ ]}{\nobreakspace{}}%
3106 \bbl@replace\bbl@toreplace{[[}{\csname}%
3107 \bbl@replace\bbl@toreplace{[}{\csname the}%
3108 \bbl@replace\bbl@toreplace{]]}{name\endcsname{}}%
3109 \bbl@replace\bbl@toreplace{]}{\endcsname{}}%
3110 \bbl@xin@{,\bbl@tempa,}{,chapter,appendix,part,}%
3111 \ifin@
3112 \@nameuse{bbl@patch\bbl@tempa}%
3113 \global\bbl@csarg\let{\bbl@tempa fmt@#2}\bbl@toreplace
3114 \fi
3115 \bbl@xin@{,\bbl@tempa,}{,figure,table,}%
3116 \ifin@
3117 \global\bbl@csarg\let{\bbl@tempa fmt@#2}\bbl@toreplace
3118 \bbl@exp{\gdef\<fnum@\bbl@tempa>{%
3119 \\\bbl@ifunset{bbl@\bbl@tempa fmt@\\\languagename}%
3120 {\[fnum@\bbl@tempa]}%
3121 {\\\@nameuse{bbl@\bbl@tempa fmt@\\\languagename}}}}%
3122 \fi}
3123 \def\bbl@ini@captions@aux#1#2{%
3124 \bbl@trim@def\bbl@tempa{#1}%
3125 \bbl@xin@{.template}{\bbl@tempa}%
3126 \ifin@

126
3127 \bbl@ini@captions@template{#2}\languagename
3128 \else
3129 \bbl@ifblank{#2}%
3130 {\bbl@exp{%
3131 \toks@{\\\bbl@nocaption{\bbl@tempa}{\languagename\bbl@tempa name}}}}%
3132 {\bbl@trim\toks@{#2}}%
3133 \bbl@exp{%
3134 \\\bbl@add\\\bbl@savestrings{%
3135 \\\SetString\<\bbl@tempa name>{\the\toks@}}}%
3136 \toks@\expandafter{\bbl@captionslist}%
3137 \bbl@exp{\\\in@{\<\bbl@tempa name>}{\the\toks@}}%
3138 \ifin@\else
3139 \bbl@exp{%
3140 \\\bbl@add\<bbl@extracaps@\languagename>{\<\bbl@tempa name>}%
3141 \\\bbl@toglobal\<bbl@extracaps@\languagename>}%
3142 \fi
3143 \fi}

Labels. Captions must contain just strings, no format at all, so there is new group in ini files.

3144 \def\bbl@list@the{%
3145 part,chapter,section,subsection,subsubsection,paragraph,%
3146 subparagraph,enumi,enumii,enumiii,enumiv,equation,figure,%
3147 table,page,footnote,mpfootnote,mpfn}
3148 \def\bbl@map@cnt#1{% #1:roman,etc, // #2:enumi,etc
3149 \bbl@ifunset{bbl@map@#1@\languagename}%
3150 {\@nameuse{#1}}%
3151 {\@nameuse{bbl@map@#1@\languagename}}}
3152 \def\bbl@inikv@labels#1#2{%
3153 \in@{.map}{#1}%
3154 \ifin@
3155 \ifx\bbl@KVP@labels\@nnil\else
3156 \bbl@xin@{ map }{ \bbl@KVP@labels\space}%
3157 \ifin@
3158 \def\bbl@tempc{#1}%
3159 \bbl@replace\bbl@tempc{.map}{}%
3160 \in@{,#2,}{,arabic,roman,Roman,alph,Alph,fnsymbol,}%
3161 \bbl@exp{%
3162 \gdef\<bbl@map@\bbl@tempc @\languagename>%
3163 {\ifin@\<#2>\else\\\localecounter{#2}\fi}}%
3164 \bbl@foreach\bbl@list@the{%
3165 \bbl@ifunset{the##1}{}%
3166 {\bbl@exp{\let\\\bbl@tempd\<the##1>}%
3167 \bbl@exp{%
3168 \\\bbl@sreplace\<the##1>%
3169 {\<\bbl@tempc>{##1}}{\\\bbl@map@cnt{\bbl@tempc}{##1}}%
3170 \\\bbl@sreplace\<the##1>%
3171 {\<\@empty @\bbl@tempc>\<c@##1>}{\\\bbl@map@cnt{\bbl@tempc}{##1}}}%
3172 \expandafter\ifx\csname the##1\endcsname\bbl@tempd\else
3173 \toks@\expandafter\expandafter\expandafter{%
3174 \csname the##1\endcsname}%
3175 \expandafter\xdef\csname the##1\endcsname{{\the\toks@}}%
3176 \fi}}%
3177 \fi
3178 \fi
3179 %
3180 \else
3181 %
3182 % The following code is still under study. You can test it and make
3183 % suggestions. Eg, enumerate.2 = ([enumi]).([enumii]). It's
3184 % language dependent.
3185 \in@{enumerate.}{#1}%
3186 \ifin@
3187 \def\bbl@tempa{#1}%

127
3188 \bbl@replace\bbl@tempa{enumerate.}{}%
3189 \def\bbl@toreplace{#2}%
3190 \bbl@replace\bbl@toreplace{[ ]}{\nobreakspace{}}%
3191 \bbl@replace\bbl@toreplace{[}{\csname the}%
3192 \bbl@replace\bbl@toreplace{]}{\endcsname{}}%
3193 \toks@\expandafter{\bbl@toreplace}%
3194 % TODO. Execute only once:
3195 \bbl@exp{%
3196 \\\bbl@add\<extras\languagename>{%
3197 \\\babel@save\<labelenum\romannumeral\bbl@tempa>%
3198 \def\<labelenum\romannumeral\bbl@tempa>{\the\toks@}}%
3199 \\\bbl@toglobal\<extras\languagename>}%
3200 \fi
3201 \fi}
To show correctly some captions in a few languages, we need to patch some internal macros, because
the order is hardcoded. For example, in Japanese the chapter number is surrounded by two string,
while in Hungarian is placed after. These replacement works in many classes, but not all. Actually,
the following lines are somewhat tentative.
3202 \def\bbl@chaptype{chapter}
3203 \ifx\@makechapterhead\@undefined
3204 \let\bbl@patchchapter\relax
3205 \else\ifx\thechapter\@undefined
3206 \let\bbl@patchchapter\relax
3207 \else\ifx\ps@headings\@undefined
3208 \let\bbl@patchchapter\relax
3209 \else
3210 \def\bbl@patchchapter{%
3211 \global\let\bbl@patchchapter\relax
3212 \gdef\bbl@chfmt{%
3213 \bbl@ifunset{bbl@\bbl@chaptype fmt@\languagename}%
3214 {\@chapapp\space\thechapter}
3215 {\@nameuse{bbl@\bbl@chaptype fmt@\languagename}}}
3216 \bbl@add\appendix{\def\bbl@chaptype{appendix}}% Not harmful, I hope
3217 \bbl@sreplace\ps@headings{\@chapapp\ \thechapter}{\bbl@chfmt}%
3218 \bbl@sreplace\chaptermark{\@chapapp\ \thechapter}{\bbl@chfmt}%
3219 \bbl@sreplace\@makechapterhead{\@chapapp\space\thechapter}{\bbl@chfmt}%
3220 \bbl@toglobal\appendix
3221 \bbl@toglobal\ps@headings
3222 \bbl@toglobal\chaptermark
3223 \bbl@toglobal\@makechapterhead}
3224 \let\bbl@patchappendix\bbl@patchchapter
3225 \fi\fi\fi
3226 \ifx\@part\@undefined
3227 \let\bbl@patchpart\relax
3228 \else
3229 \def\bbl@patchpart{%
3230 \global\let\bbl@patchpart\relax
3231 \gdef\bbl@partformat{%
3232 \bbl@ifunset{bbl@partfmt@\languagename}%
3233 {\partname\nobreakspace\thepart}
3234 {\@nameuse{bbl@partfmt@\languagename}}}
3235 \bbl@sreplace\@part{\partname\nobreakspace\thepart}{\bbl@partformat}%
3236 \bbl@toglobal\@part}
3237 \fi

Date. Arguments (year, month, day) are not protected, on purpose. In \today, arguments are always
gregorian, and therefore always converted with other calendars. TODO. Document
3238 \let\bbl@calendar\@empty
3239 \DeclareRobustCommand\localedate[1][]{\bbl@localedate{#1}}
3240 \def\bbl@localedate#1#2#3#4{%
3241 \begingroup
3242 \edef\bbl@they{#2}%
3243 \edef\bbl@them{#3}%

128
3244 \edef\bbl@thed{#4}%
3245 \edef\bbl@tempe{%
3246 \bbl@ifunset{bbl@calpr@\languagename}{}{\bbl@cl{calpr}},%
3247 #1}%
3248 \bbl@replace\bbl@tempe{ }{}%
3249 \bbl@replace\bbl@tempe{CONVERT}{convert=}% Hackish
3250 \bbl@replace\bbl@tempe{convert}{convert=}%
3251 \let\bbl@ld@calendar\@empty
3252 \let\bbl@ld@variant\@empty
3253 \let\bbl@ld@convert\relax
3254 \def\bbl@tempb##1=##2\@@{\@namedef{bbl@ld@##1}{##2}}%
3255 \bbl@foreach\bbl@tempe{\bbl@tempb##1\@@}%
3256 \bbl@replace\bbl@ld@calendar{gregorian}{}%
3257 \ifx\bbl@ld@calendar\@empty\else
3258 \ifx\bbl@ld@convert\relax\else
3259 \babelcalendar[\bbl@they-\bbl@them-\bbl@thed]%
3260 {\bbl@ld@calendar}\bbl@they\bbl@them\bbl@thed
3261 \fi
3262 \fi
3263 \@nameuse{bbl@precalendar}% Remove, eg, +, -civil (-ca-islamic)
3264 \edef\bbl@calendar{% Used in \month..., too
3265 \bbl@ld@calendar
3266 \ifx\bbl@ld@variant\@empty\else
3267 .\bbl@ld@variant
3268 \fi}%
3269 \bbl@cased
3270 {\@nameuse{bbl@date@\languagename @\bbl@calendar}%
3271 \bbl@they\bbl@them\bbl@thed}%
3272 \endgroup}
3273 % eg: 1=months, 2=wide, 3=1, 4=dummy, 5=value, 6=calendar
3274 \def\bbl@inidate#1.#2.#3.#4\relax#5#6{% TODO - ignore with 'captions'
3275 \bbl@trim@def\bbl@tempa{#1.#2}%
3276 \bbl@ifsamestring{\bbl@tempa}{months.wide}% to savedate
3277 {\bbl@trim@def\bbl@tempa{#3}%
3278 \bbl@trim\toks@{#5}%
3279 \@temptokena\expandafter{\bbl@savedate}%
3280 \bbl@exp{% Reverse order - in ini last wins
3281 \def\\\bbl@savedate{%
3282 \\\SetString\<month\romannumeral\bbl@tempa#6name>{\the\toks@}%
3283 \the\@temptokena}}}%
3284 {\bbl@ifsamestring{\bbl@tempa}{date.long}% defined now
3285 {\lowercase{\def\bbl@tempb{#6}}%
3286 \bbl@trim@def\bbl@toreplace{#5}%
3287 \bbl@TG@@date
3288 \global\bbl@csarg\let{date@\languagename @\bbl@tempb}\bbl@toreplace
3289 \ifx\bbl@savetoday\@empty
3290 \bbl@exp{% TODO. Move to a better place.
3291 \\\AfterBabelCommands{%
3292 \def\<\languagename date>{\\\protect\<\languagename date >}%
3293 \\\newcommand\<\languagename date >[4][]{%
3294 \\\bbl@usedategrouptrue
3295 \<bbl@ensure@\languagename>{%
3296 \\\localedate[####1]{####2}{####3}{####4}}}}%
3297 \def\\\bbl@savetoday{%
3298 \\\SetString\\\today{%
3299 \<\languagename date>[convert]%
3300 {\\\the\year}{\\\the\month}{\\\the\day}}}}%
3301 \fi}%
3302 {}}}

Dates will require some macros for the basic formatting. They may be redefined by language, so
“semi-public” names (camel case) are used. Oddly enough, the CLDR places particles like “de”
inconsistently in either in the date or in the month name. Note after \bbl@replace \toks@ contains
the resulting string, which is used by \bbl@replace@finish@iii (this implicit behavior doesn’t seem

129
a good idea, but it’s efficient).

3303 \let\bbl@calendar\@empty
3304 \newcommand\babelcalendar[2][\the\year-\the\month-\the\day]{%
3305 \@nameuse{bbl@ca@#2}#1\@@}
3306 \newcommand\BabelDateSpace{\nobreakspace}
3307 \newcommand\BabelDateDot{.\@} % TODO. \let instead of repeating
3308 \newcommand\BabelDated[1]{{\number#1}}
3309 \newcommand\BabelDatedd[1]{{\ifnum#1<10 0\fi\number#1}}
3310 \newcommand\BabelDateM[1]{{\number#1}}
3311 \newcommand\BabelDateMM[1]{{\ifnum#1<10 0\fi\number#1}}
3312 \newcommand\BabelDateMMMM[1]{{%
3313 \csname month\romannumeral#1\bbl@calendar name\endcsname}}%
3314 \newcommand\BabelDatey[1]{{\number#1}}%
3315 \newcommand\BabelDateyy[1]{{%
3316 \ifnum#1<10 0\number#1 %
3317 \else\ifnum#1<100 \number#1 %
3318 \else\ifnum#1<1000 \expandafter\@gobble\number#1 %
3319 \else\ifnum#1<10000 \expandafter\@gobbletwo\number#1 %
3320 \else
3321 \bbl@error
3322 {Currently two-digit years are restricted to the\\
3323 range 0-9999.}%
3324 {There is little you can do. Sorry.}%
3325 \fi\fi\fi\fi}}
3326 \newcommand\BabelDateyyyy[1]{{\number#1}} % TODO - add leading 0
3327 \def\bbl@replace@finish@iii#1{%
3328 \bbl@exp{\def\\#1####1####2####3{\the\toks@}}}
3329 \def\bbl@TG@@date{%
3330 \bbl@replace\bbl@toreplace{[ ]}{\BabelDateSpace{}}%
3331 \bbl@replace\bbl@toreplace{[.]}{\BabelDateDot{}}%
3332 \bbl@replace\bbl@toreplace{[d]}{\BabelDated{####3}}%
3333 \bbl@replace\bbl@toreplace{[dd]}{\BabelDatedd{####3}}%
3334 \bbl@replace\bbl@toreplace{[M]}{\BabelDateM{####2}}%
3335 \bbl@replace\bbl@toreplace{[MM]}{\BabelDateMM{####2}}%
3336 \bbl@replace\bbl@toreplace{[MMMM]}{\BabelDateMMMM{####2}}%
3337 \bbl@replace\bbl@toreplace{[y]}{\BabelDatey{####1}}%
3338 \bbl@replace\bbl@toreplace{[yy]}{\BabelDateyy{####1}}%
3339 \bbl@replace\bbl@toreplace{[yyyy]}{\BabelDateyyyy{####1}}%
3340 \bbl@replace\bbl@toreplace{[y|}{\bbl@datecntr[####1|}%
3341 \bbl@replace\bbl@toreplace{[m|}{\bbl@datecntr[####2|}%
3342 \bbl@replace\bbl@toreplace{[d|}{\bbl@datecntr[####3|}%
3343 \bbl@replace@finish@iii\bbl@toreplace}
3344 \def\bbl@datecntr{\expandafter\bbl@xdatecntr\expandafter}
3345 \def\bbl@xdatecntr[#1|#2]{\localenumeral{#2}{#1}}

Transforms.

3346 \let\bbl@release@transforms\@empty
3347 \bbl@csarg\let{[email protected]}\bbl@inikv
3348 \bbl@csarg\let{[email protected]}\bbl@inikv
3349 \def\bbl@transforms@aux#1#2#3#4,#5\relax{%
3350 #1[#2]{#3}{#4}{#5}}
3351 \begingroup % A hack. TODO. Don't require an specific order
3352 \catcode`\%=12
3353 \catcode`\&=14
3354 \gdef\bbl@transforms#1#2#3{&%
3355 \directlua{
3356 local str = [==[#2]==]
3357 str = str:gsub('%.%d+%.%d+$', '')
3358 tex.print([[\def\string\babeltempa{]] .. str .. [[}]])
3359 }&%
3360 \bbl@xin@{,\babeltempa,}{,\bbl@KVP@transforms,}&%
3361 \ifin@
3362 \in@{.0$}{#2$}&%

130
3363 \ifin@
3364 \directlua{&% (\attribute) syntax
3365 local str = string.match([[\bbl@KVP@transforms]],
3366 '%(([^%(]-)%)[^%)]-\babeltempa')
3367 if str == nil then
3368 tex.print([[\def\string\babeltempb{}]])
3369 else
3370 tex.print([[\def\string\babeltempb{,attribute=]] .. str .. [[}]])
3371 end
3372 }
3373 \toks@{#3}&%
3374 \bbl@exp{&%
3375 \\\g@addto@macro\\\bbl@release@transforms{&%
3376 \relax &% Closes previous \bbl@transforms@aux
3377 \\\bbl@transforms@aux
3378 \\#1{label=\babeltempa\babeltempb}{\languagename}{\the\toks@}}}&%
3379 \else
3380 \g@addto@macro\bbl@release@transforms{, {#3}}&%
3381 \fi
3382 \fi}
3383 \endgroup

Language and Script values to be used when defining a font or setting the direction are set with the
following macros.

3384 \def\bbl@provide@lsys#1{%
3385 \bbl@ifunset{bbl@lname@#1}%
3386 {\bbl@load@info{#1}}%
3387 {}%
3388 \bbl@csarg\let{lsys@#1}\@empty
3389 \bbl@ifunset{bbl@sname@#1}{\bbl@csarg\gdef{sname@#1}{Default}}{}%
3390 \bbl@ifunset{bbl@sotf@#1}{\bbl@csarg\gdef{sotf@#1}{DFLT}}{}%
3391 \bbl@csarg\bbl@add@list{lsys@#1}{Script=\bbl@cs{sname@#1}}%
3392 \bbl@ifunset{bbl@lname@#1}{}%
3393 {\bbl@csarg\bbl@add@list{lsys@#1}{Language=\bbl@cs{lname@#1}}}%
3394 \ifcase\bbl@engine\or\or
3395 \bbl@ifunset{bbl@prehc@#1}{}%
3396 {\bbl@exp{\\\bbl@ifblank{\bbl@cs{prehc@#1}}}%
3397 {}%
3398 {\ifx\bbl@xenohyph\@undefined
3399 \global\let\bbl@xenohyph\bbl@xenohyph@d
3400 \ifx\AtBeginDocument\@notprerr
3401 \expandafter\@secondoftwo % to execute right now
3402 \fi
3403 \AtBeginDocument{%
3404 \bbl@patchfont{\bbl@xenohyph}%
3405 \expandafter\selectlanguage\expandafter{\languagename}}%
3406 \fi}}%
3407 \fi
3408 \bbl@csarg\bbl@toglobal{lsys@#1}}
3409 \def\bbl@xenohyph@d{%
3410 \bbl@ifset{bbl@prehc@\languagename}%
3411 {\ifnum\hyphenchar\font=\defaulthyphenchar
3412 \iffontchar\font\bbl@cl{prehc}\relax
3413 \hyphenchar\font\bbl@cl{prehc}\relax
3414 \else\iffontchar\font"200B
3415 \hyphenchar\font"200B
3416 \else
3417 \bbl@warning
3418 {Neither 0 nor ZERO WIDTH SPACE are available\\%
3419 in the current font, and therefore the hyphen\\%
3420 will be printed. Try changing the fontspec's\\%
3421 'HyphenChar' to another value, but be aware\\%
3422 this setting is not safe (see the manual).\\%

131
3423 Reported}%
3424 \hyphenchar\font\defaulthyphenchar
3425 \fi\fi
3426 \fi}%
3427 {\hyphenchar\font\defaulthyphenchar}}
3428 % \fi}
The following ini reader ignores everything but the identification section. It is called when a font
is defined (ie, when the language is first selected) to know which script/language must be enabled.
This means we must make sure a few characters are not active. The ini is not read directly, but with
a proxy tex file named as the language (which means any code in it must be skipped, too).
3429 \def\bbl@load@info#1{%
3430 \def\BabelBeforeIni##1##2{%
3431 \begingroup
3432 \bbl@read@ini{##1}0%
3433 \endinput % babel- .tex may contain onlypreamble's
3434 \endgroup}% boxed, to avoid extra spaces:
3435 {\bbl@input@texini{#1}}}
A tool to define the macros for native digits from the list provided in the ini file. Somewhat
convoluted because there are 10 digits, but only 9 arguments in TEX. Non-digits characters are kept.
The first macro is the generic “localized” command.
3436 \def\bbl@setdigits#1#2#3#4#5{%
3437 \bbl@exp{%
3438 \def\<\languagename digits>####1{% ie, \langdigits
3439 \<bbl@digits@\languagename>####1\\\@nil}%
3440 \let\<bbl@cntr@digits@\languagename>\<\languagename digits>%
3441 \def\<\languagename counter>####1{% ie, \langcounter
3442 \\\expandafter\<bbl@counter@\languagename>%
3443 \\\csname c@####1\endcsname}%
3444 \def\<bbl@counter@\languagename>####1{% ie, \bbl@counter@lang
3445 \\\expandafter\<bbl@digits@\languagename>%
3446 \\\number####1\\\@nil}}%
3447 \def\bbl@tempa##1##2##3##4##5{%
3448 \bbl@exp{% Wow, quite a lot of hashes! :-(
3449 \def\<bbl@digits@\languagename>########1{%
3450 \\\ifx########1\\\@nil % ie, \bbl@digits@lang
3451 \\\else
3452 \\\ifx0########1#1%
3453 \\\else\\\ifx1########1#2%
3454 \\\else\\\ifx2########1#3%
3455 \\\else\\\ifx3########1#4%
3456 \\\else\\\ifx4########1#5%
3457 \\\else\\\ifx5########1##1%
3458 \\\else\\\ifx6########1##2%
3459 \\\else\\\ifx7########1##3%
3460 \\\else\\\ifx8########1##4%
3461 \\\else\\\ifx9########1##5%
3462 \\\else########1%
3463 \\\fi\\\fi\\\fi\\\fi\\\fi\\\fi\\\fi\\\fi\\\fi\\\fi
3464 \\\expandafter\<bbl@digits@\languagename>%
3465 \\\fi}}}%
3466 \bbl@tempa}
Alphabetic counters must be converted from a space separated list to an \ifcase structure.
3467 \def\bbl@buildifcase#1 {% Returns \bbl@tempa, requires \toks@={}
3468 \ifx\\#1% % \\ before, in case #1 is multiletter
3469 \bbl@exp{%
3470 \def\\\bbl@tempa####1{%
3471 \<ifcase>####1\space\the\toks@\<else>\\\@ctrerr\<fi>}}%
3472 \else
3473 \toks@\expandafter{\the\toks@\or #1}%
3474 \expandafter\bbl@buildifcase
3475 \fi}

132
The code for additive counters is somewhat tricky and it’s based on the fact the arguments just
before \@@ collects digits which have been left ‘unused’ in previous arguments, the first of them
being the number of digits in the number to be converted. This explains the reverse set 76543210.
Digits above 10000 are not handled yet. When the key contains the subkey .F., the number after is
treated as an special case, for a fixed form (see babel-he.ini, for example).

3476 \newcommand\localenumeral[2]{\bbl@cs{cntr@#1@\languagename}{#2}}
3477 \def\bbl@localecntr#1#2{\localenumeral{#2}{#1}}
3478 \newcommand\localecounter[2]{%
3479 \expandafter\bbl@localecntr
3480 \expandafter{\number\csname c@#2\endcsname}{#1}}
3481 \def\bbl@alphnumeral#1#2{%
3482 \expandafter\bbl@alphnumeral@i\number#2 76543210\@@{#1}}
3483 \def\bbl@alphnumeral@i#1#2#3#4#5#6#7#8\@@#9{%
3484 \ifcase\@car#8\@nil\or % Currenty <10000, but prepared for bigger
3485 \bbl@alphnumeral@ii{#9}000000#1\or
3486 \bbl@alphnumeral@ii{#9}00000#1#2\or
3487 \bbl@alphnumeral@ii{#9}0000#1#2#3\or
3488 \bbl@alphnumeral@ii{#9}000#1#2#3#4\else
3489 \bbl@alphnum@invalid{>9999}%
3490 \fi}
3491 \def\bbl@alphnumeral@ii#1#2#3#4#5#6#7#8{%
3492 \bbl@ifunset{bbl@cntr@#1.F.\number#5#6#7#8@\languagename}%
3493 {\bbl@cs{cntr@#1.4@\languagename}#5%
3494 \bbl@cs{cntr@#1.3@\languagename}#6%
3495 \bbl@cs{cntr@#1.2@\languagename}#7%
3496 \bbl@cs{cntr@#1.1@\languagename}#8%
3497 \ifnum#6#7#8>\z@ % TODO. An ad hoc rule for Greek. Ugly.
3498 \bbl@ifunset{bbl@cntr@#1.S.321@\languagename}{}%
3499 {\bbl@cs{cntr@#1.S.321@\languagename}}%
3500 \fi}%
3501 {\bbl@cs{cntr@#1.F.\number#5#6#7#8@\languagename}}}
3502 \def\bbl@alphnum@invalid#1{%
3503 \bbl@error{Alphabetic numeral too large (#1)}%
3504 {Currently this is the limit.}}

The information in the identification section can be useful, so the following macro just exposes it
with a user command.

3505 \def\bbl@localeinfo#1#2{%
3506 \bbl@ifunset{bbl@info@#2}{#1}%
3507 {\bbl@ifunset{bbl@\csname bbl@info@#2\endcsname @\languagename}{#1}%
3508 {\bbl@cs{\csname bbl@info@#2\endcsname @\languagename}}}}
3509 \newcommand\localeinfo[1]{%
3510 \ifx*#1\@empty % TODO. A bit hackish to make it expandable.
3511 \bbl@afterelse\bbl@localeinfo{}%
3512 \else
3513 \bbl@localeinfo
3514 {\bbl@error{I've found no info for the current locale.\\%
3515 The corresponding ini file has not been loaded\\%
3516 Perhaps it doesn't exist}%
3517 {See the manual for details.}}%
3518 {#1}%
3519 \fi}
3520 % \@namedef{bbl@[email protected]}{lcname}
3521 \@namedef{bbl@[email protected]}{lini}
3522 \@namedef{bbl@[email protected]}{elname}
3523 \@namedef{bbl@[email protected]}{lname}
3524 \@namedef{bbl@[email protected]}{tbcp}
3525 \@namedef{bbl@[email protected]}{lbcp}
3526 \@namedef{bbl@[email protected]}{lotf}
3527 \@namedef{bbl@[email protected]}{esname}
3528 \@namedef{bbl@[email protected]}{sname}
3529 \@namedef{bbl@[email protected]}{sbcp}
3530 \@namedef{bbl@[email protected]}{sotf}

133
3531 \@namedef{bbl@[email protected]}{rbcp}
3532 \@namedef{bbl@[email protected]}{vbcp}
3533 % Extensions are dealt with in a special way
3534 % Now, an internal \LaTeX{} macro:
3535 \providecommand\BCPdata[1]{\localeinfo*{#1.tag.bcp47}}

With version 3.75 \BabelEnsureInfo is executed always, but there is an option to disable it.

3536 hh∗More package optionsii ≡


3537 \DeclareOption{ensureinfo=off}{}
3538 hh/More package optionsii
3539 %
3540 \let\bbl@ensureinfo\@gobble
3541 \newcommand\BabelEnsureInfo{%
3542 \ifx\InputIfFileExists\@undefined\else
3543 \def\bbl@ensureinfo##1{%
3544 \bbl@ifunset{bbl@lname@##1}{\bbl@load@info{##1}}{}}%
3545 \fi
3546 \bbl@foreach\bbl@loaded{{%
3547 \def\languagename{##1}%
3548 \bbl@ensureinfo{##1}}}}
3549 \@ifpackagewith{babel}{ensureinfo=off}{}%
3550 {\AtEndOfPackage{% Test for plain.
3551 \ifx\@undefined\bbl@loaded\else\BabelEnsureInfo\fi}}

More general, but non-expandable, is \getlocaleproperty. To inspect every possible loaded ini, we
define \LocaleForEach, where \bbl@ini@loaded is a comma-separated list of locales, built by
\bbl@read@ini.

3552 \newcommand\getlocaleproperty{%
3553 \@ifstar\bbl@getproperty@s\bbl@getproperty@x}
3554 \def\bbl@getproperty@s#1#2#3{%
3555 \let#1\relax
3556 \def\bbl@elt##1##2##3{%
3557 \bbl@ifsamestring{##1/##2}{#3}%
3558 {\providecommand#1{##3}%
3559 \def\bbl@elt####1####2####3{}}%
3560 {}}%
3561 \bbl@cs{inidata@#2}}%
3562 \def\bbl@getproperty@x#1#2#3{%
3563 \bbl@getproperty@s{#1}{#2}{#3}%
3564 \ifx#1\relax
3565 \bbl@error
3566 {Unknown key for locale '#2':\\%
3567 #3\\%
3568 \string#1 will be set to \relax}%
3569 {Perhaps you misspelled it.}%
3570 \fi}
3571 \let\bbl@ini@loaded\@empty
3572 \newcommand\LocaleForEach{\bbl@foreach\bbl@ini@loaded}

8 Adjusting the Babel bahavior


A generic high level inteface is provided to adjust some global and general settings.

3573 \newcommand\babeladjust[1]{% TODO. Error handling.


3574 \bbl@forkv{#1}{%
3575 \bbl@ifunset{bbl@ADJ@##1@##2}%
3576 {\bbl@cs{ADJ@##1}{##2}}%
3577 {\bbl@cs{ADJ@##1@##2}}}}
3578 %
3579 \def\bbl@adjust@lua#1#2{%
3580 \ifvmode
3581 \ifnum\currentgrouplevel=\z@
3582 \directlua{ Babel.#2 }%

134
3583 \expandafter\expandafter\expandafter\@gobble
3584 \fi
3585 \fi
3586 {\bbl@error % The error is gobbled if everything went ok.
3587 {Currently, #1 related features can be adjusted only\\%
3588 in the main vertical list.}%
3589 {Maybe things change in the future, but this is what it is.}}}
3590 \@namedef{bbl@[email protected]@on}{%
3591 \bbl@adjust@lua{bidi}{mirroring_enabled=true}}
3592 \@namedef{bbl@[email protected]@off}{%
3593 \bbl@adjust@lua{bidi}{mirroring_enabled=false}}
3594 \@namedef{bbl@[email protected]@on}{%
3595 \bbl@adjust@lua{bidi}{bidi_enabled=true}}
3596 \@namedef{bbl@[email protected]@off}{%
3597 \bbl@adjust@lua{bidi}{bidi_enabled=false}}
3598 \@namedef{bbl@[email protected]@on}{%
3599 \bbl@adjust@lua{bidi}{digits_mapped=true}}
3600 \@namedef{bbl@[email protected]@off}{%
3601 \bbl@adjust@lua{bidi}{digits_mapped=false}}
3602 %
3603 \@namedef{bbl@[email protected]@on}{%
3604 \bbl@adjust@lua{linebreak}{sea_enabled=true}}
3605 \@namedef{bbl@[email protected]@off}{%
3606 \bbl@adjust@lua{linebreak}{sea_enabled=false}}
3607 \@namedef{bbl@[email protected]@on}{%
3608 \bbl@adjust@lua{linebreak}{cjk_enabled=true}}
3609 \@namedef{bbl@[email protected]@off}{%
3610 \bbl@adjust@lua{linebreak}{cjk_enabled=false}}
3611 \@namedef{bbl@[email protected]@on}{%
3612 \bbl@adjust@lua{linebreak}{arabic.justify_enabled=true}}
3613 \@namedef{bbl@[email protected]@off}{%
3614 \bbl@adjust@lua{linebreak}{arabic.justify_enabled=false}}
3615 %
3616 \def\bbl@adjust@layout#1{%
3617 \ifvmode
3618 #1%
3619 \expandafter\@gobble
3620 \fi
3621 {\bbl@error % The error is gobbled if everything went ok.
3622 {Currently, layout related features can be adjusted only\\%
3623 in vertical mode.}%
3624 {Maybe things change in the future, but this is what it is.}}}
3625 \@namedef{bbl@[email protected]@on}{%
3626 \bbl@adjust@layout{\let\@tabular\bbl@NL@@tabular}}
3627 \@namedef{bbl@[email protected]@off}{%
3628 \bbl@adjust@layout{\let\@tabular\bbl@OL@@tabular}}
3629 \@namedef{bbl@[email protected]@on}{%
3630 \bbl@adjust@layout{\let\list\bbl@NL@list}}
3631 \@namedef{bbl@[email protected]@off}{%
3632 \bbl@adjust@layout{\let\list\bbl@OL@list}}
3633 \@namedef{bbl@[email protected]@on}{%
3634 \bbl@activateposthyphen}
3635 %
3636 \@namedef{bbl@[email protected]@on}{%
3637 \bbl@bcpallowedtrue}
3638 \@namedef{bbl@[email protected]@off}{%
3639 \bbl@bcpallowedfalse}
3640 \@namedef{bbl@[email protected]}#1{%
3641 \def\bbl@bcp@prefix{#1}}
3642 \def\bbl@bcp@prefix{bcp47-}
3643 \@namedef{bbl@[email protected]}#1{%
3644 \def\bbl@autoload@options{#1}}
3645 \let\bbl@autoload@bcpoptions\@empty

135
3646 \@namedef{bbl@[email protected]}#1{%
3647 \def\bbl@autoload@bcpoptions{#1}}
3648 \newif\ifbbl@bcptoname
3649 \@namedef{bbl@[email protected]@on}{%
3650 \bbl@bcptonametrue
3651 \BabelEnsureInfo}
3652 \@namedef{bbl@[email protected]@off}{%
3653 \bbl@bcptonamefalse}
3654 \@namedef{bbl@[email protected]@nohyphenation}{%
3655 \directlua{ Babel.ignore_pre_char = function(node)
3656 return (node.lang == \the\csname l@nohyphenation\endcsname)
3657 end }}
3658 \@namedef{bbl@[email protected]@off}{%
3659 \directlua{ Babel.ignore_pre_char = function(node)
3660 return false
3661 end }}
3662 \@namedef{bbl@[email protected]@shift}{%
3663 \let\bbl@restorelastskip\relax
3664 \def\bbl@savelastskip{%
3665 \let\bbl@restorelastskip\relax
3666 \ifvmode
3667 \ifdim\lastskip=\z@
3668 \let\bbl@restorelastskip\nobreak
3669 \else
3670 \bbl@exp{%
3671 \def\\\bbl@restorelastskip{%
3672 \skip@=\the\lastskip
3673 \\\nobreak \vskip-\skip@ \vskip\skip@}}%
3674 \fi
3675 \fi}}
3676 \@namedef{bbl@[email protected]@keep}{%
3677 \let\bbl@restorelastskip\relax
3678 \let\bbl@savelastskip\relax}
3679 \@namedef{bbl@[email protected]@omit}{%
3680 \let\bbl@restorelastskip\relax
3681 \def\bbl@savelastskip##1\bbl@restorelastskip{}}

As the final task, load the code for lua. TODO: use babel name, override

3682 \ifx\directlua\@undefined\else
3683 \ifx\bbl@luapatterns\@undefined
3684 \input luababel.def
3685 \fi
3686 \fi

Continue with LATEX.

3687 h/package | corei


3688 h∗packagei

8.1 Cross referencing macros


The LATEX book states:
The key argument is any sequence of letters, digits, and punctuation symbols; upper- and
lowercase letters are regarded as different.
When the above quote should still be true when a document is typeset in a language that has active
characters, special care has to be taken of the category codes of these characters when they appear in
an argument of the cross referencing macros.
When a cross referencing command processes its argument, all tokens in this argument should be
character tokens with category ‘letter’ or ‘other’.
The following package options control which macros are to be redefined.

3689 hh∗More package optionsii ≡


3690 \DeclareOption{safe=none}{\let\bbl@opt@safe\@empty}
3691 \DeclareOption{safe=bib}{\def\bbl@opt@safe{B}}

136
3692 \DeclareOption{safe=ref}{\def\bbl@opt@safe{R}}
3693 \DeclareOption{safe=refbib}{\def\bbl@opt@safe{BR}}
3694 \DeclareOption{safe=bibref}{\def\bbl@opt@safe{BR}}
3695 hh/More package optionsii

\@newl@bel First we open a new group to keep the changed setting of \protect local and then we set the
@safe@actives switch to true to make sure that any shorthand that appears in any of the arguments
immediately expands to its non-active self.
3696 \bbl@trace{Cross referencing macros}
3697 \ifx\bbl@opt@safe\@empty\else % ie, if 'ref' and/or 'bib'
3698 \def\@newl@bel#1#2#3{%
3699 {\@safe@activestrue
3700 \bbl@ifunset{#1@#2}%
3701 \relax
3702 {\gdef\@multiplelabels{%
3703 \@latex@warning@no@line{There were multiply-defined labels}}%
3704 \@latex@warning@no@line{Label `#2' multiply defined}}%
3705 \global\@namedef{#1@#2}{#3}}}

\@testdef An internal LATEX macro used to test if the labels that have been written on the .aux file have
changed. It is called by the \enddocument macro.
3706 \CheckCommand*\@testdef[3]{%
3707 \def\reserved@a{#3}%
3708 \expandafter\ifx\csname#1@#2\endcsname\reserved@a
3709 \else
3710 \@tempswatrue
3711 \fi}
Now that we made sure that \@testdef still has the same definition we can rewrite it. First we make
the shorthands ‘safe’. Then we use \bbl@tempa as an ‘alias’ for the macro that contains the label
which is being checked. Then we define \bbl@tempb just as \@newl@bel does it. When the label is
defined we replace the definition of \bbl@tempa by its meaning. If the label didn’t change,
\bbl@tempa and \bbl@tempb should be identical macros.
3712 \def\@testdef#1#2#3{% TODO. With @samestring?
3713 \@safe@activestrue
3714 \expandafter\let\expandafter\bbl@tempa\csname #1@#2\endcsname
3715 \def\bbl@tempb{#3}%
3716 \@safe@activesfalse
3717 \ifx\bbl@tempa\relax
3718 \else
3719 \edef\bbl@tempa{\expandafter\strip@prefix\meaning\bbl@tempa}%
3720 \fi
3721 \edef\bbl@tempb{\expandafter\strip@prefix\meaning\bbl@tempb}%
3722 \ifx\bbl@tempa\bbl@tempb
3723 \else
3724 \@tempswatrue
3725 \fi}
3726 \fi

\ref The same holds for the macro \ref that references a label and \pageref to reference a page. We
\pageref make them robust as well (if they weren’t already) to prevent problems if they should become
expanded at the wrong moment.
3727 \bbl@xin@{R}\bbl@opt@safe
3728 \ifin@
3729 \edef\bbl@tempc{\expandafter\string\csname ref code\endcsname}%
3730 \bbl@xin@{\expandafter\strip@prefix\meaning\bbl@tempc}%
3731 {\expandafter\strip@prefix\meaning\ref}%
3732 \ifin@
3733 \bbl@redefine\@kernel@ref#1{%
3734 \@safe@activestrue\org@@kernel@ref{#1}\@safe@activesfalse}
3735 \bbl@redefine\@kernel@pageref#1{%
3736 \@safe@activestrue\org@@kernel@pageref{#1}\@safe@activesfalse}
3737 \bbl@redefine\@kernel@sref#1{%

137
3738 \@safe@activestrue\org@@kernel@sref{#1}\@safe@activesfalse}
3739 \bbl@redefine\@kernel@spageref#1{%
3740 \@safe@activestrue\org@@kernel@spageref{#1}\@safe@activesfalse}
3741 \else
3742 \bbl@redefinerobust\ref#1{%
3743 \@safe@activestrue\org@ref{#1}\@safe@activesfalse}
3744 \bbl@redefinerobust\pageref#1{%
3745 \@safe@activestrue\org@pageref{#1}\@safe@activesfalse}
3746 \fi
3747 \else
3748 \let\org@ref\ref
3749 \let\org@pageref\pageref
3750 \fi

\@citex The macro used to cite from a bibliography, \cite, uses an internal macro, \@citex. It is this
internal macro that picks up the argument(s), so we redefine this internal macro and leave \cite
alone. The first argument is used for typesetting, so the shorthands need only be deactivated in the
second argument.

3751 \bbl@xin@{B}\bbl@opt@safe
3752 \ifin@
3753 \bbl@redefine\@citex[#1]#2{%
3754 \@safe@activestrue\edef\@tempa{#2}\@safe@activesfalse
3755 \org@@citex[#1]{\@tempa}}

Unfortunately, the packages natbib and cite need a different definition of \@citex... To begin with,
natbib has a definition for \@citex with three arguments... We only know that a package is loaded
when \begin{document} is executed, so we need to postpone the different redefinition.

3756 \AtBeginDocument{%
3757 \@ifpackageloaded{natbib}{%

Notice that we use \def here instead of \bbl@redefine because \org@@citex is already defined and
we don’t want to overwrite that definition (it would result in parameter stack overflow because of a
circular definition).
(Recent versions of natbib change dynamically \@citex, so PR4087 doesn’t seem fixable in a simple
way. Just load natbib before.)

3758 \def\@citex[#1][#2]#3{%
3759 \@safe@activestrue\edef\@tempa{#3}\@safe@activesfalse
3760 \org@@citex[#1][#2]{\@tempa}}%
3761 }{}}

The package cite has a definition of \@citex where the shorthands need to be turned off in both
arguments.

3762 \AtBeginDocument{%
3763 \@ifpackageloaded{cite}{%
3764 \def\@citex[#1]#2{%
3765 \@safe@activestrue\org@@citex[#1]{#2}\@safe@activesfalse}%
3766 }{}}

\nocite The macro \nocite which is used to instruct BiBTEX to extract uncited references from the database.

3767 \bbl@redefine\nocite#1{%
3768 \@safe@activestrue\org@nocite{#1}\@safe@activesfalse}

\bibcite The macro that is used in the .aux file to define citation labels. When packages such as natbib or
cite are not loaded its second argument is used to typeset the citation label. In that case, this second
argument can contain active characters but is used in an environment where \@safe@activestrue
is in effect. This switch needs to be reset inside the \hbox which contains the citation label. In order
to determine during .aux file processing which definition of \bibcite is needed we define \bibcite
in such a way that it redefines itself with the proper definition. We call \bbl@cite@choice to select
the proper definition for \bibcite. This new definition is then activated.

3769 \bbl@redefine\bibcite{%
3770 \bbl@cite@choice
3771 \bibcite}

138
\bbl@bibcite The macro \bbl@bibcite holds the definition of \bibcite needed when neither natbib nor cite is
loaded.

3772 \def\bbl@bibcite#1#2{%
3773 \org@bibcite{#1}{\@safe@activesfalse#2}}

\bbl@cite@choice The macro \bbl@cite@choice determines which definition of \bibcite is needed. First we give
\bibcite its default definition.

3774 \def\bbl@cite@choice{%
3775 \global\let\bibcite\bbl@bibcite
3776 \@ifpackageloaded{natbib}{\global\let\bibcite\org@bibcite}{}%
3777 \@ifpackageloaded{cite}{\global\let\bibcite\org@bibcite}{}%
3778 \global\let\bbl@cite@choice\relax}

When a document is run for the first time, no .aux file is available, and \bibcite will not yet be
properly defined. In this case, this has to happen before the document starts.

3779 \AtBeginDocument{\bbl@cite@choice}

\@bibitem One of the two internal LATEX macros called by \bibitem that write the citation label on the .aux file.

3780 \bbl@redefine\@bibitem#1{%
3781 \@safe@activestrue\org@@bibitem{#1}\@safe@activesfalse}
3782 \else
3783 \let\org@nocite\nocite
3784 \let\org@@citex\@citex
3785 \let\org@bibcite\bibcite
3786 \let\org@@bibitem\@bibitem
3787 \fi

8.2 Marks
\markright Because the output routine is asynchronous, we must pass the current language attribute to the head
lines. To achieve this we need to adapt the definition of \markright and \markboth somewhat.
However, headlines and footlines can contain text outside marks; for that we must take some actions
in the output routine if the ’headfoot’ options is used.
We need to make some redefinitions to the output routine to avoid an endless loop and to correctly
handle the page number in bidi documents.

3788 \bbl@trace{Marks}
3789 \IfBabelLayout{sectioning}
3790 {\ifx\bbl@opt@headfoot\@nnil
3791 \g@addto@macro\@resetactivechars{%
3792 \set@typeset@protect
3793 \expandafter\select@language@x\expandafter{\bbl@main@language}%
3794 \let\protect\noexpand
3795 \ifcase\bbl@bidimode\else % Only with bidi. See also above
3796 \edef\thepage{%
3797 \noexpand\babelsublr{\unexpanded\expandafter{\thepage}}}%
3798 \fi}%
3799 \fi}
3800 {\ifbbl@single\else
3801 \bbl@ifunset{markright }\bbl@redefine\bbl@redefinerobust
3802 \markright#1{%
3803 \bbl@ifblank{#1}%
3804 {\org@markright{}}%
3805 {\toks@{#1}%
3806 \bbl@exp{%
3807 \\\org@markright{\\\protect\\\foreignlanguage{\languagename}%
3808 {\\\protect\\\bbl@restore@actives\the\toks@}}}}}%

\markboth The definition of \markboth is equivalent to that of \markright, except that we need two token
\@mkboth registers. The documentclasses report and book define and set the headings for the page. While
doing so they also store a copy of \markboth in \@mkboth. Therefore we need to check whether
\@mkboth has already been set. If so we neeed to do that again with the new definition of \markboth.

139
(As of Oct 2019, LATEX stores the definition in an intermediate macro, so it’s not necessary anymore,
but it’s preserved for older versions.)

3809 \ifx\@mkboth\markboth
3810 \def\bbl@tempc{\let\@mkboth\markboth}
3811 \else
3812 \def\bbl@tempc{}
3813 \fi
3814 \bbl@ifunset{markboth }\bbl@redefine\bbl@redefinerobust
3815 \markboth#1#2{%
3816 \protected@edef\bbl@tempb##1{%
3817 \protect\foreignlanguage
3818 {\languagename}{\protect\bbl@restore@actives##1}}%
3819 \bbl@ifblank{#1}%
3820 {\toks@{}}%
3821 {\toks@\expandafter{\bbl@tempb{#1}}}%
3822 \bbl@ifblank{#2}%
3823 {\@temptokena{}}%
3824 {\@temptokena\expandafter{\bbl@tempb{#2}}}%
3825 \bbl@exp{\\\org@markboth{\the\toks@}{\the\@temptokena}}}
3826 \bbl@tempc
3827 \fi} % end ifbbl@single, end \IfBabelLayout

8.3 Preventing clashes with other packages


8.3.1 ifthen
\ifthenelse Sometimes a document writer wants to create a special effect depending on the page a certain
fragment of text appears on. This can be achieved by the following piece of code:

\ifthenelse{\isodd{\pageref{some:label}}}
{code for odd pages}
{code for even pages}

In order for this to work the argument of \isodd needs to be fully expandable. With the above
redefinition of \pageref it is not in the case of this example. To overcome that, we add some code to
the definition of \ifthenelse to make things work.
We want to revert the definition of \pageref and \ref to their original definition for the first
argument of \ifthenelse, so we first need to store their current meanings.
Then we can set the \@safe@actives switch and call the original \ifthenelse. In order to be able to
use shorthands in the second and third arguments of \ifthenelse the resetting of the switch and the
definition of \pageref happens inside those arguments.

3828 \bbl@trace{Preventing clashes with other packages}


3829 \ifx\org@ref\@undefined\else
3830 \bbl@xin@{R}\bbl@opt@safe
3831 \ifin@
3832 \AtBeginDocument{%
3833 \@ifpackageloaded{ifthen}{%
3834 \bbl@redefine@long\ifthenelse#1#2#3{%
3835 \let\bbl@temp@pref\pageref
3836 \let\pageref\org@pageref
3837 \let\bbl@temp@ref\ref
3838 \let\ref\org@ref
3839 \@safe@activestrue
3840 \org@ifthenelse{#1}%
3841 {\let\pageref\bbl@temp@pref
3842 \let\ref\bbl@temp@ref
3843 \@safe@activesfalse
3844 #2}%
3845 {\let\pageref\bbl@temp@pref
3846 \let\ref\bbl@temp@ref
3847 \@safe@activesfalse
3848 #3}%

140
3849 }%
3850 }{}%
3851 }
3852 \fi

8.3.2 varioref
\@@vpageref When the package varioref is in use we need to modify its internal command \@@vpageref in order
\vrefpagenum to prevent problems when an active character ends up in the argument of \vref. The same needs to
\Ref happen for \vrefpagenum.

3853 \AtBeginDocument{%
3854 \@ifpackageloaded{varioref}{%
3855 \bbl@redefine\@@vpageref#1[#2]#3{%
3856 \@safe@activestrue
3857 \org@@@vpageref{#1}[#2]{#3}%
3858 \@safe@activesfalse}%
3859 \bbl@redefine\vrefpagenum#1#2{%
3860 \@safe@activestrue
3861 \org@vrefpagenum{#1}{#2}%
3862 \@safe@activesfalse}%

The package varioref defines \Ref to be a robust command wich uppercases the first character of
the reference text. In order to be able to do that it needs to access the expandable form of \ref. So
we employ a little trick here. We redefine the (internal) command \Ref to call \org@ref instead of
\ref. The disadvantage of this solution is that whenever the definition of \Ref changes, this
definition needs to be updated as well.

3863 \expandafter\def\csname Ref \endcsname#1{%


3864 \protected@edef\@tempa{\org@ref{#1}}\expandafter\MakeUppercase\@tempa}
3865 }{}%
3866 }
3867 \fi

8.3.3 hhline
\hhline Delaying the activation of the shorthand characters has introduced a problem with the hhline
package. The reason is that it uses the ‘:’ character which is made active by the french support in
babel. Therefore we need to reload the package when the ‘:’ is an active character. Note that this
happens after the category code of the @-sign has been changed to other, so we need to temporarily
change it to letter again.

3868 \AtEndOfPackage{%
3869 \AtBeginDocument{%
3870 \@ifpackageloaded{hhline}%
3871 {\expandafter\ifx\csname normal@char\string:\endcsname\relax
3872 \else
3873 \makeatletter
3874 \def\@currname{hhline}\input{hhline.sty}\makeatother
3875 \fi}%
3876 {}}}

\substitutefontfamily Deprecated. Use the tools provides by LATEX. The command \substitutefontfamily creates an .fd
file on the fly. The first argument is an encoding mnemonic, the second and third arguments are font
family names.

3877 \def\substitutefontfamily#1#2#3{%
3878 \lowercase{\immediate\openout15=#1#2.fd\relax}%
3879 \immediate\write15{%
3880 \string\ProvidesFile{#1#2.fd}%
3881 [\the\year/\two@digits{\the\month}/\two@digits{\the\day}
3882 \space generated font description file]^^J
3883 \string\DeclareFontFamily{#1}{#2}{}^^J
3884 \string\DeclareFontShape{#1}{#2}{m}{n}{<->ssub * #3/m/n}{}^^J
3885 \string\DeclareFontShape{#1}{#2}{m}{it}{<->ssub * #3/m/it}{}^^J
3886 \string\DeclareFontShape{#1}{#2}{m}{sl}{<->ssub * #3/m/sl}{}^^J

141
3887 \string\DeclareFontShape{#1}{#2}{m}{sc}{<->ssub * #3/m/sc}{}^^J
3888 \string\DeclareFontShape{#1}{#2}{b}{n}{<->ssub * #3/bx/n}{}^^J
3889 \string\DeclareFontShape{#1}{#2}{b}{it}{<->ssub * #3/bx/it}{}^^J
3890 \string\DeclareFontShape{#1}{#2}{b}{sl}{<->ssub * #3/bx/sl}{}^^J
3891 \string\DeclareFontShape{#1}{#2}{b}{sc}{<->ssub * #3/bx/sc}{}^^J
3892 }%
3893 \closeout15
3894 }
3895 \@onlypreamble\substitutefontfamily

8.4 Encoding and fonts


Because documents may use non-ASCII font encodings, we make sure that the logos of TEX and LATEX
always come out in the right encoding. There is a list of non-ASCII encodings. Requested encodings
are currently stored in \@fontenc@load@list. If a non-ASCII has been loaded, we define versions of
\TeX and \LaTeX for them using \ensureascii. The default ASCII encoding is set, too (in reverse
order): the “main” encoding (when the document begins), the last loaded, or OT1.

\ensureascii
3896 \bbl@trace{Encoding and fonts}
3897 \newcommand\BabelNonASCII{LGR,X2,OT2,OT3,OT6,LHE,LWN,LMA,LMC,LMS,LMU}
3898 \newcommand\BabelNonText{TS1,T3,TS3}
3899 \let\org@TeX\TeX
3900 \let\org@LaTeX\LaTeX
3901 \let\ensureascii\@firstofone
3902 \AtBeginDocument{%
3903 \def\@elt#1{,#1,}%
3904 \edef\bbl@tempa{\expandafter\@gobbletwo\@fontenc@load@list}%
3905 \let\@elt\relax
3906 \let\bbl@tempb\@empty
3907 \def\bbl@tempc{OT1}%
3908 \bbl@foreach\BabelNonASCII{% LGR loaded in a non-standard way
3909 \bbl@ifunset{T@#1}{}{\def\bbl@tempb{#1}}}%
3910 \bbl@foreach\bbl@tempa{%
3911 \bbl@xin@{#1}{\BabelNonASCII}%
3912 \ifin@
3913 \def\bbl@tempb{#1}% Store last non-ascii
3914 \else\bbl@xin@{#1}{\BabelNonText}% Pass
3915 \ifin@\else
3916 \def\bbl@tempc{#1}% Store last ascii
3917 \fi
3918 \fi}%
3919 \ifx\bbl@tempb\@empty\else
3920 \bbl@xin@{,\cf@encoding,}{,\BabelNonASCII,\BabelNonText,}%
3921 \ifin@\else
3922 \edef\bbl@tempc{\cf@encoding}% The default if ascii wins
3923 \fi
3924 \edef\ensureascii#1{%
3925 {\noexpand\fontencoding{\bbl@tempc}\noexpand\selectfont#1}}%
3926 \DeclareTextCommandDefault{\TeX}{\ensureascii{\org@TeX}}%
3927 \DeclareTextCommandDefault{\LaTeX}{\ensureascii{\org@LaTeX}}%
3928 \fi}

Now comes the old deprecated stuff (with a little change in 3.9l, for fontspec). The first thing we need
to do is to determine, at \begin{document}, which latin fontencoding to use.

\latinencoding When text is being typeset in an encoding other than ‘latin’ (OT1 or T1), it would be nice to still have
Roman numerals come out in the Latin encoding. So we first assume that the current encoding at the
end of processing the package is the Latin encoding.

3929 \AtEndOfPackage{\edef\latinencoding{\cf@encoding}}

But this might be overruled with a later loading of the package fontenc. Therefore we check at the
execution of \begin{document} whether it was loaded with the T1 option. The normal way to do this

142
(using \@ifpackageloaded) is disabled for this package. Now we have to revert to parsing the
internal macro \@filelist which contains all the filenames loaded.

3930 \AtBeginDocument{%
3931 \@ifpackageloaded{fontspec}%
3932 {\xdef\latinencoding{%
3933 \ifx\UTFencname\@undefined
3934 EU\ifcase\bbl@engine\or2\or1\fi
3935 \else
3936 \UTFencname
3937 \fi}}%
3938 {\gdef\latinencoding{OT1}%
3939 \ifx\cf@encoding\bbl@t@one
3940 \xdef\latinencoding{\bbl@t@one}%
3941 \else
3942 \def\@elt#1{,#1,}%
3943 \edef\bbl@tempa{\expandafter\@gobbletwo\@fontenc@load@list}%
3944 \let\@elt\relax
3945 \bbl@xin@{,T1,}\bbl@tempa
3946 \ifin@
3947 \xdef\latinencoding{\bbl@t@one}%
3948 \fi
3949 \fi}}

\latintext Then we can define the command \latintext which is a declarative switch to a latin font-encoding.
Usage of this macro is deprecated.

3950 \DeclareRobustCommand{\latintext}{%
3951 \fontencoding{\latinencoding}\selectfont
3952 \def\encodingdefault{\latinencoding}}

\textlatin This command takes an argument which is then typeset using the requested font encoding. In order
to avoid many encoding switches it operates in a local scope.

3953 \ifx\@undefined\DeclareTextFontCommand
3954 \DeclareRobustCommand{\textlatin}[1]{\leavevmode{\latintext #1}}
3955 \else
3956 \DeclareTextFontCommand{\textlatin}{\latintext}
3957 \fi

For several functions, we need to execute some code with \selectfont. With LATEX 2021-06-01, there
is a hook for this purpose, but in older versions the LATEX command is patched (the latter solution will
be eventually removed).

3958 \def\bbl@patchfont#1{\AddToHook{selectfont}{#1}}

8.5 Basic bidi support


Work in progress. This code is currently placed here for practical reasons. It will be moved to the
correct place soon, I hope.
It is loosely based on rlbabel.def, but most of it has been developed from scratch. This babel
module (by Johannes Braams and Boris Lavva) has served the purpose of typesetting R documents
for two decades, and despite its flaws I think it is still a good starting point (some parts have been
copied here almost verbatim), partly thanks to its simplicity. I’ve also looked at arabi (by Youssef
Jabri), which is compatible with babel.
There are two ways of modifying macros to make them “bidi”, namely, by patching the internal
low-level macros (which is what I have done with lists, columns, counters, tocs, much like rlbabel
did), and by introducing a “middle layer” just below the user interface (sectioning, footnotes).

• pdftex provides a minimal support for bidi text, and it must be done by hand. Vertical typesetting
is not possible.
• xetex is somewhat better, thanks to its font engine (even if not always reliable) and a few
additional tools. However, very little is done at the paragraph level. Another challenging problem
is text direction does not honour TEX grouping.

143
• luatex can provide the most complete solution, as we can manipulate almost freely the node list,
the generated lines, and so on, but bidi text does not work out of the box and some development
is necessary. It also provides tools to properly set left-to-right and right-to-left page layouts. As
LuaTEX-ja shows, vertical typesetting is possible, too.

3959 \bbl@trace{Loading basic (internal) bidi support}


3960 \ifodd\bbl@engine
3961 \else % TODO. Move to txtbabel
3962 \ifnum\bbl@bidimode>100 \ifnum\bbl@bidimode<200
3963 \bbl@error
3964 {The bidi method 'basic' is available only in\\%
3965 luatex. I'll continue with 'bidi=default', so\\%
3966 expect wrong results}%
3967 {See the manual for further details.}%
3968 \let\bbl@beforeforeign\leavevmode
3969 \AtEndOfPackage{%
3970 \EnableBabelHook{babel-bidi}%
3971 \bbl@xebidipar}
3972 \fi\fi
3973 \def\bbl@loadxebidi#1{%
3974 \ifx\RTLfootnotetext\@undefined
3975 \AtEndOfPackage{%
3976 \EnableBabelHook{babel-bidi}%
3977 \bbl@loadfontspec % bidi needs fontspec
3978 \usepackage#1{bidi}}%
3979 \fi}
3980 \ifnum\bbl@bidimode>200
3981 \ifcase\expandafter\@gobbletwo\the\bbl@bidimode\or
3982 \bbl@tentative{bidi=bidi}
3983 \bbl@loadxebidi{}
3984 \or
3985 \bbl@loadxebidi{[rldocument]}
3986 \or
3987 \bbl@loadxebidi{}
3988 \fi
3989 \fi
3990 \fi
3991 % TODO? Separate:
3992 \ifnum\bbl@bidimode=\@ne
3993 \let\bbl@beforeforeign\leavevmode
3994 \ifodd\bbl@engine
3995 \newattribute\bbl@attr@dir
3996 \directlua{ Babel.attr_dir = luatexbase.registernumber'bbl@attr@dir' }
3997 \bbl@exp{\output{\bodydir\pagedir\the\output}}
3998 \fi
3999 \AtEndOfPackage{%
4000 \EnableBabelHook{babel-bidi}%
4001 \ifodd\bbl@engine\else
4002 \bbl@xebidipar
4003 \fi}
4004 \fi

Now come the macros used to set the direction when a language is switched. First the (mostly)
common macros.

4005 \bbl@trace{Macros to switch the text direction}


4006 \def\bbl@alscripts{,Arabic,Syriac,Thaana,}
4007 \def\bbl@rscripts{% TODO. Base on codes ??
4008 ,Imperial Aramaic,Avestan,Cypriot,Hatran,Hebrew,%
4009 Old Hungarian,Old Hungarian,Lydian,Mandaean,Manichaean,%
4010 Manichaean,Meroitic Cursive,Meroitic,Old North Arabian,%
4011 Nabataean,N'Ko,Orkhon,Palmyrene,Inscriptional Pahlavi,%
4012 Psalter Pahlavi,Phoenician,Inscriptional Parthian,Samaritan,%
4013 Old South Arabian,}%
4014 \def\bbl@provide@dirs#1{%

144
4015 \bbl@xin@{\csname bbl@sname@#1\endcsname}{\bbl@alscripts\bbl@rscripts}%
4016 \ifin@
4017 \global\bbl@csarg\chardef{wdir@#1}\@ne
4018 \bbl@xin@{\csname bbl@sname@#1\endcsname}{\bbl@alscripts}%
4019 \ifin@
4020 \global\bbl@csarg\chardef{wdir@#1}\tw@ % useless in xetex
4021 \fi
4022 \else
4023 \global\bbl@csarg\chardef{wdir@#1}\z@
4024 \fi
4025 \ifodd\bbl@engine
4026 \bbl@csarg\ifcase{wdir@#1}%
4027 \directlua{ Babel.locale_props[\the\localeid].textdir = 'l' }%
4028 \or
4029 \directlua{ Babel.locale_props[\the\localeid].textdir = 'r' }%
4030 \or
4031 \directlua{ Babel.locale_props[\the\localeid].textdir = 'al' }%
4032 \fi
4033 \fi}
4034 \def\bbl@switchdir{%
4035 \bbl@ifunset{bbl@lsys@\languagename}{\bbl@provide@lsys{\languagename}}{}%
4036 \bbl@ifunset{bbl@wdir@\languagename}{\bbl@provide@dirs{\languagename}}{}%
4037 \bbl@exp{\\\bbl@setdirs\bbl@cl{wdir}}}
4038 \def\bbl@setdirs#1{% TODO - math
4039 \ifcase\bbl@select@type % TODO - strictly, not the right test
4040 \bbl@bodydir{#1}%
4041 \bbl@pardir{#1}%
4042 \fi
4043 \bbl@textdir{#1}}
4044 % TODO. Only if \bbl@bidimode > 0?:
4045 \AddBabelHook{babel-bidi}{afterextras}{\bbl@switchdir}
4046 \DisableBabelHook{babel-bidi}

Now the engine-dependent macros. TODO. Must be moved to the engine files.

4047 \ifodd\bbl@engine % luatex=1


4048 \else % pdftex=0, xetex=2
4049 \newcount\bbl@dirlevel
4050 \chardef\bbl@thetextdir\z@
4051 \chardef\bbl@thepardir\z@
4052 \def\bbl@textdir#1{%
4053 \ifcase#1\relax
4054 \chardef\bbl@thetextdir\z@
4055 \bbl@textdir@i\beginL\endL
4056 \else
4057 \chardef\bbl@thetextdir\@ne
4058 \bbl@textdir@i\beginR\endR
4059 \fi}
4060 \def\bbl@textdir@i#1#2{%
4061 \ifhmode
4062 \ifnum\currentgrouplevel>\z@
4063 \ifnum\currentgrouplevel=\bbl@dirlevel
4064 \bbl@error{Multiple bidi settings inside a group}%
4065 {I'll insert a new group, but expect wrong results.}%
4066 \bgroup\aftergroup#2\aftergroup\egroup
4067 \else
4068 \ifcase\currentgrouptype\or % 0 bottom
4069 \aftergroup#2% 1 simple {}
4070 \or
4071 \bgroup\aftergroup#2\aftergroup\egroup % 2 hbox
4072 \or
4073 \bgroup\aftergroup#2\aftergroup\egroup % 3 adj hbox
4074 \or\or\or % vbox vtop align
4075 \or

145
4076 \bgroup\aftergroup#2\aftergroup\egroup % 7 noalign
4077 \or\or\or\or\or\or % output math disc insert vcent mathchoice
4078 \or
4079 \aftergroup#2% 14 \begingroup
4080 \else
4081 \bgroup\aftergroup#2\aftergroup\egroup % 15 adj
4082 \fi
4083 \fi
4084 \bbl@dirlevel\currentgrouplevel
4085 \fi
4086 #1%
4087 \fi}
4088 \def\bbl@pardir#1{\chardef\bbl@thepardir#1\relax}
4089 \let\bbl@bodydir\@gobble
4090 \let\bbl@pagedir\@gobble
4091 \def\bbl@dirparastext{\chardef\bbl@thepardir\bbl@thetextdir}

The following command is executed only if there is a right-to-left script (once). It activates the
\everypar hack for xetex, to properly handle the par direction. Note text and par dirs are decoupled
to some extent (although not completely).

4092 \def\bbl@xebidipar{%
4093 \let\bbl@xebidipar\relax
4094 \TeXXeTstate\@ne
4095 \def\bbl@xeeverypar{%
4096 \ifcase\bbl@thepardir
4097 \ifcase\bbl@thetextdir\else\beginR\fi
4098 \else
4099 {\setbox\z@\lastbox\beginR\box\z@}%
4100 \fi}%
4101 \let\bbl@severypar\everypar
4102 \newtoks\everypar
4103 \everypar=\bbl@severypar
4104 \bbl@severypar{\bbl@xeeverypar\the\everypar}}
4105 \ifnum\bbl@bidimode>200
4106 \let\bbl@textdir@i\@gobbletwo
4107 \let\bbl@xebidipar\@empty
4108 \AddBabelHook{bidi}{foreign}{%
4109 \def\bbl@tempa{\def\BabelText####1}%
4110 \ifcase\bbl@thetextdir
4111 \expandafter\bbl@tempa\expandafter{\BabelText{\LR{##1}}}%
4112 \else
4113 \expandafter\bbl@tempa\expandafter{\BabelText{\RL{##1}}}%
4114 \fi}
4115 \def\bbl@pardir#1{\ifcase#1\relax\setLR\else\setRL\fi}
4116 \fi
4117 \fi

A tool for weak L (mainly digits). We also disable warnings with hyperref.

4118 \DeclareRobustCommand\babelsublr[1]{\leavevmode{\bbl@textdir\z@#1}}
4119 \AtBeginDocument{%
4120 \ifx\pdfstringdefDisableCommands\@undefined\else
4121 \ifx\pdfstringdefDisableCommands\relax\else
4122 \pdfstringdefDisableCommands{\let\babelsublr\@firstofone}%
4123 \fi
4124 \fi}

8.6 Local Language Configuration


\loadlocalcfg At some sites it may be necessary to add site-specific actions to a language definition file. This can be
done by creating a file with the same name as the language definition file, but with the extension
.cfg. For instance the file norsk.cfg will be loaded when the language definition file norsk.ldf is
loaded.
For plain-based formats we don’t want to override the definition of \loadlocalcfg from plain.def.

146
4125 \bbl@trace{Local Language Configuration}
4126 \ifx\loadlocalcfg\@undefined
4127 \@ifpackagewith{babel}{noconfigs}%
4128 {\let\loadlocalcfg\@gobble}%
4129 {\def\loadlocalcfg#1{%
4130 \InputIfFileExists{#1.cfg}%
4131 {\typeout{*************************************^^J%
4132 * Local config file #1.cfg used^^J%
4133 *}}%
4134 \@empty}}
4135 \fi

8.7 Language options


Languages are loaded when processing the corresponding option except if a main language has been
set. In such a case, it is not loaded until all options has been processed. The following macro inputs
the ldf file and does some additional checks (\input works, too, but possible errors are not catched).

4136 \bbl@trace{Language options}


4137 \let\bbl@afterlang\relax
4138 \let\BabelModifiers\relax
4139 \let\bbl@loaded\@empty
4140 \def\bbl@load@language#1{%
4141 \InputIfFileExists{#1.ldf}%
4142 {\edef\bbl@loaded{\CurrentOption
4143 \ifx\bbl@loaded\@empty\else,\bbl@loaded\fi}%
4144 \expandafter\let\expandafter\bbl@afterlang
4145 \csname\CurrentOption.ldf-h@@k\endcsname
4146 \expandafter\let\expandafter\BabelModifiers
4147 \csname bbl@mod@\CurrentOption\endcsname}%
4148 {\bbl@error{%
4149 Unknown option '\CurrentOption'. Either you misspelled it\\%
4150 or the language definition file \CurrentOption.ldf was not found}{%
4151 Valid options are, among others: shorthands=, KeepShorthandsActive,\\%
4152 activeacute, activegrave, noconfigs, safe=, main=, math=\\%
4153 headfoot=, strings=, config=, hyphenmap=, or a language name.}}}

Now, we set a few language options whose names are different from ldf files. These declarations are
preserved for backwards compatibility, but they must be eventually removed. Use proxy files
instead.

4154 \def\bbl@try@load@lang#1#2#3{%
4155 \IfFileExists{\CurrentOption.ldf}%
4156 {\bbl@load@language{\CurrentOption}}%
4157 {#1\bbl@load@language{#2}#3}}
4158 %
4159 \DeclareOption{hebrew}{%
4160 \input{rlbabel.def}%
4161 \bbl@load@language{hebrew}}
4162 \DeclareOption{hungarian}{\bbl@try@load@lang{}{magyar}{}}
4163 \DeclareOption{lowersorbian}{\bbl@try@load@lang{}{lsorbian}{}}
4164 \DeclareOption{nynorsk}{\bbl@try@load@lang{}{norsk}{}}
4165 \DeclareOption{polutonikogreek}{%
4166 \bbl@try@load@lang{}{greek}{\languageattribute{greek}{polutoniko}}}
4167 \DeclareOption{russian}{\bbl@try@load@lang{}{russianb}{}}
4168 \DeclareOption{ukrainian}{\bbl@try@load@lang{}{ukraineb}{}}
4169 \DeclareOption{uppersorbian}{\bbl@try@load@lang{}{usorbian}{}}

Another way to extend the list of ‘known’ options for babel was to create the file bblopts.cfg in
which one can add option declarations. However, this mechanism is deprecated – if you want an
alternative name for a language, just create a new .ldf file loading the actual one. You can also set
the name of the file with the package option config=<name>, which will load <name>.cfg instead.

4170 \ifx\bbl@opt@config\@nnil
4171 \@ifpackagewith{babel}{noconfigs}{}%
4172 {\InputIfFileExists{bblopts.cfg}%

147
4173 {\typeout{*************************************^^J%
4174 * Local config file bblopts.cfg used^^J%
4175 *}}%
4176 {}}%
4177 \else
4178 \InputIfFileExists{\bbl@[email protected]}%
4179 {\typeout{*************************************^^J%
4180 * Local config file \bbl@[email protected] used^^J%
4181 *}}%
4182 {\bbl@error{%
4183 Local config file '\bbl@[email protected]' not found}{%
4184 Perhaps you misspelled it.}}%
4185 \fi

Recognizing global options in packages not having a closed set of them is not trivial, as for them to be
processed they must be defined explicitly. So, package options not yet taken into account and stored
in bbl@language@opts are assumed to be languages. If not declared above, the names of the option
and the file are the same. We first pre-process the class and package options to determine the main
language, which is processed in the third ‘main’ pass, except if all files are ldf and there is no main
key. In the latter case (\bbl@opt@main is still \@nnil), the traditional way to set the main language is
kept — the last loaded is the main language.
4186 \ifx\bbl@opt@main\@nnil
4187 \ifnum\bbl@iniflag>\z@ % if all ldf's: set implicitly, no main pass
4188 \let\bbl@tempb\@empty
4189 \edef\bbl@tempa{\@classoptionslist,\bbl@language@opts}%
4190 \bbl@foreach\bbl@tempa{\edef\bbl@tempb{#1,\bbl@tempb}}%
4191 \bbl@foreach\bbl@tempb{% \bbl@tempb is a reversed list
4192 \ifx\bbl@opt@main\@nnil % ie, if not yet assigned
4193 \ifodd\bbl@iniflag % = *=
4194 \IfFileExists{babel-#1.tex}{\def\bbl@opt@main{#1}}{}%
4195 \else % n +=
4196 \IfFileExists{#1.ldf}{\def\bbl@opt@main{#1}}{}%
4197 \fi
4198 \fi}%
4199 \fi
4200 \else
4201 \bbl@info{Main language set with 'main='. Except if you have\\%
4202 problems, prefer the default mechanism for setting\\%
4203 the main language. Reported}
4204 \fi

A few languages are still defined explicitly. They are stored in case they are needed in the ‘main’ pass
(the value can be \relax).
4205 \ifx\bbl@opt@main\@nnil\else
4206 \bbl@ncarg\let\bbl@loadmain{ds@\bbl@opt@main}%
4207 \expandafter\let\csname ds@\bbl@opt@main\endcsname\relax
4208 \fi

Now define the corresponding loaders. With package options, assume the language exists. With class
options, check if the option is a language by checking if the correspondin file exists.
4209 \bbl@foreach\bbl@language@opts{%
4210 \def\bbl@tempa{#1}%
4211 \ifx\bbl@tempa\bbl@opt@main\else
4212 \ifnum\bbl@iniflag<\tw@ % 0 ø (other = ldf)
4213 \bbl@ifunset{ds@#1}%
4214 {\DeclareOption{#1}{\bbl@load@language{#1}}}%
4215 {}%
4216 \else % + * (other = ini)
4217 \DeclareOption{#1}{%
4218 \bbl@ldfinit
4219 \babelprovide[import]{#1}%
4220 \bbl@afterldf{}}%
4221 \fi
4222 \fi}

148
4223 \bbl@foreach\@classoptionslist{%
4224 \def\bbl@tempa{#1}%
4225 \ifx\bbl@tempa\bbl@opt@main\else
4226 \ifnum\bbl@iniflag<\tw@ % 0 ø (other = ldf)
4227 \bbl@ifunset{ds@#1}%
4228 {\IfFileExists{#1.ldf}%
4229 {\DeclareOption{#1}{\bbl@load@language{#1}}}%
4230 {}}%
4231 {}%
4232 \else % + * (other = ini)
4233 \IfFileExists{babel-#1.tex}%
4234 {\DeclareOption{#1}{%
4235 \bbl@ldfinit
4236 \babelprovide[import]{#1}%
4237 \bbl@afterldf{}}}%
4238 {}%
4239 \fi
4240 \fi}
And we are done, because all options for this pass has been declared. Those already processed in the
first pass are just ignored.
The options have to be processed in the order in which the user specified them (but remember class
options are processes before):
4241 \def\AfterBabelLanguage#1{%
4242 \bbl@ifsamestring\CurrentOption{#1}{\global\bbl@add\bbl@afterlang}{}}
4243 \DeclareOption*{}
4244 \ProcessOptions*

This finished the second pass. Now the third one begins, which loads the main language set with the
key main. A warning is raised if the main language is not the same as the last named one, or if the
value of the key main is not a language. With some options in provide, the package luatexbase is
loaded (and immediately used), and therefore \babelprovide can’t go inside a \DeclareOption; this
explains why it’s executed directly, with a dummy declaration. Then all languages have been loaded,
so we deactivate \AfterBabelLanguage.
4245 \bbl@trace{Option 'main'}
4246 \ifx\bbl@opt@main\@nnil
4247 \edef\bbl@tempa{\@classoptionslist,\bbl@language@opts}
4248 \let\bbl@tempc\@empty
4249 \edef\bbl@templ{,\bbl@loaded,}
4250 \edef\bbl@templ{\expandafter\strip@prefix\meaning\bbl@templ}
4251 \bbl@for\bbl@tempb\bbl@tempa{%
4252 \edef\bbl@tempd{,\bbl@tempb,}%
4253 \edef\bbl@tempd{\expandafter\strip@prefix\meaning\bbl@tempd}%
4254 \bbl@xin@{\bbl@tempd}{\bbl@templ}%
4255 \ifin@\edef\bbl@tempc{\bbl@tempb}\fi}
4256 \def\bbl@tempa#1,#2\@nnil{\def\bbl@tempb{#1}}
4257 \expandafter\bbl@tempa\bbl@loaded,\@nnil
4258 \ifx\bbl@tempb\bbl@tempc\else
4259 \bbl@warning{%
4260 Last declared language option is '\bbl@tempc',\\%
4261 but the last processed one was '\bbl@tempb'.\\%
4262 The main language can't be set as both a global\\%
4263 and a package option. Use 'main=\bbl@tempc' as\\%
4264 option. Reported}
4265 \fi
4266 \else
4267 \ifodd\bbl@iniflag % case 1,3 (main is ini)
4268 \bbl@ldfinit
4269 \let\CurrentOption\bbl@opt@main
4270 \bbl@exp{% \bbl@opt@provide = empty if *
4271 \\\babelprovide[\bbl@opt@provide,import,main]{\bbl@opt@main}}%
4272 \bbl@afterldf{}
4273 \DeclareOption{\bbl@opt@main}{}
4274 \else % case 0,2 (main is ldf)

149
4275 \ifx\bbl@loadmain\relax
4276 \DeclareOption{\bbl@opt@main}{\bbl@load@language{\bbl@opt@main}}
4277 \else
4278 \DeclareOption{\bbl@opt@main}{\bbl@loadmain}
4279 \fi
4280 \ExecuteOptions{\bbl@opt@main}
4281 \@namedef{ds@\bbl@opt@main}{}%
4282 \fi
4283 \DeclareOption*{}
4284 \ProcessOptions*
4285 \fi
4286 \def\AfterBabelLanguage{%
4287 \bbl@error
4288 {Too late for \string\AfterBabelLanguage}%
4289 {Languages have been loaded, so I can do nothing}}

In order to catch the case where the user didn’t specify a language we check whether
\bbl@main@language, has become defined. If not, the nil language is loaded.

4290 \ifx\bbl@main@language\@undefined
4291 \bbl@info{%
4292 You haven't specified a language. I'll use 'nil'\\%
4293 as the main language. Reported}
4294 \bbl@load@language{nil}
4295 \fi
4296 h/packagei

9 The kernel of Babel (babel.def, common)


The kernel of the babel system is currently stored in babel.def. The file babel.def contains most of
the code. The file hyphen.cfg is a file that can be loaded into the format, which is necessary when
you want to be able to switch hyphenation patterns.
Because plain TEX users might want to use some of the features of the babel system too, care has to be
taken that plain TEX can process the files. For this reason the current format will have to be checked
in a number of places. Some of the code below is common to plain TEX and LATEX, some of it is for the
LATEX case only.
Plain formats based on etex (etex, xetex, luatex) don’t load hyphen.cfg but etex.src, which follows
a different naming convention, so we need to define the babel names. It presumes language.def
exists and it is the same file used when formats were created.
A proxy file for switch.def

4297 h∗kerneli
4298 \let\bbl@onlyswitch\@empty
4299 \input babel.def
4300 \let\bbl@onlyswitch\@undefined
4301 h/kerneli
4302 h∗patternsi

10 Loading hyphenation patterns


The following code is meant to be read by iniTEX because it should instruct TEX to read hyphenation
patterns. To this end the docstrip option patterns is used to include this code in the file
hyphen.cfg. Code is written with lower level macros.

4303 hhMake sure ProvidesFile is definedii


4304 \ProvidesFile{hyphen.cfg}[ hhdateii hhversionii Babel hyphens]
4305 \xdef\bbl@format{\jobname}
4306 \def\bbl@version{ hhversionii}
4307 \def\bbl@date{ hhdateii}
4308 \ifx\AtBeginDocument\@undefined
4309 \def\@empty{}
4310 \fi
4311 hhDefine core switching macrosii

150
\process@line Each line in the file language.dat is processed by \process@line after it is read. The first thing this
macro does is to check whether the line starts with =. When the first token of a line is an =, the macro
\process@synonym is called; otherwise the macro \process@language will continue.

4312 \def\process@line#1#2 #3 #4 {%
4313 \ifx=#1%
4314 \process@synonym{#2}%
4315 \else
4316 \process@language{#1#2}{#3}{#4}%
4317 \fi
4318 \ignorespaces}

\process@synonym This macro takes care of the lines which start with an =. It needs an empty token register to begin
with. \bbl@languages is also set to empty.

4319 \toks@{}
4320 \def\bbl@languages{}

When no languages have been loaded yet, the name following the = will be a synonym for
hyphenation register 0. So, it is stored in a token register and executed when the first pattern file has
been processed. (The \relax just helps to the \if below catching synonyms without a language.)
Otherwise the name will be a synonym for the language loaded last.
We also need to copy the hyphenmin parameters for the synonym.

4321 \def\process@synonym#1{%
4322 \ifnum\last@language=\m@ne
4323 \toks@\expandafter{\the\toks@\relax\process@synonym{#1}}%
4324 \else
4325 \expandafter\chardef\csname l@#1\endcsname\last@language
4326 \wlog{\string\l@#1=\string\language\the\last@language}%
4327 \expandafter\let\csname #1hyphenmins\expandafter\endcsname
4328 \csname\languagename hyphenmins\endcsname
4329 \let\bbl@elt\relax
4330 \edef\bbl@languages{\bbl@languages\bbl@elt{#1}{\the\last@language}{}{}}%
4331 \fi}

\process@language The macro \process@language is used to process a non-empty line from the ‘configuration file’. It
has three arguments, each delimited by white space. The first argument is the ‘name’ of a language;
the second is the name of the file that contains the patterns. The optional third argument is the name
of a file containing hyphenation exceptions.
The first thing to do is call \addlanguage to allocate a pattern register and to make that register
‘active’. Then the pattern file is read.
For some hyphenation patterns it is needed to load them with a specific font encoding selected. This
can be specified in the file language.dat by adding for instance ‘:T1’ to the name of the language.
The macro \bbl@get@enc extracts the font encoding from the language name and stores it in
\bbl@hyph@enc. The latter can be used in hyphenation files if you need to set a behavior depending
on the given encoding (it is set to empty if no encoding is given).
Pattern files may contain assignments to \lefthyphenmin and \righthyphenmin. TEX does not keep
track of these assignments. Therefore we try to detect such assignments and store them in the
\hlang ihyphenmins macro. When no assignments were made we provide a default setting.
Some pattern files contain changes to the \lccode en \uccode arrays. Such changes should remain
local to the language; therefore we process the pattern file in a group; the \patterns command acts
globally so its effect will be remembered.
Then we globally store the settings of \lefthyphenmin and \righthyphenmin and close the group.
When the hyphenation patterns have been processed we need to see if a file with hyphenation
exceptions needs to be read. This is the case when the third argument is not empty and when it does
not contain a space token. (Note however there is no need to save hyphenation exceptions into the
format.)
\bbl@languages saves a snapshot of the loaded languages in the form
\bbl@elt{hlanguage-namei}{hnumberi} {hpatterns-filei}{hexceptions-filei}. Note the last 2
arguments are empty in ‘dialects’ defined in language.dat with =. Note also the language name can
have encoding info.
Finally, if the counter \language is equal to zero we execute the synonyms stored.

4332 \def\process@language#1#2#3{%
4333 \expandafter\addlanguage\csname l@#1\endcsname

151
4334 \expandafter\language\csname l@#1\endcsname
4335 \edef\languagename{#1}%
4336 \bbl@hook@everylanguage{#1}%
4337 % > luatex
4338 \bbl@get@enc#1::\@@@
4339 \begingroup
4340 \lefthyphenmin\m@ne
4341 \bbl@hook@loadpatterns{#2}%
4342 % > luatex
4343 \ifnum\lefthyphenmin=\m@ne
4344 \else
4345 \expandafter\xdef\csname #1hyphenmins\endcsname{%
4346 \the\lefthyphenmin\the\righthyphenmin}%
4347 \fi
4348 \endgroup
4349 \def\bbl@tempa{#3}%
4350 \ifx\bbl@tempa\@empty\else
4351 \bbl@hook@loadexceptions{#3}%
4352 % > luatex
4353 \fi
4354 \let\bbl@elt\relax
4355 \edef\bbl@languages{%
4356 \bbl@languages\bbl@elt{#1}{\the\language}{#2}{\bbl@tempa}}%
4357 \ifnum\the\language=\z@
4358 \expandafter\ifx\csname #1hyphenmins\endcsname\relax
4359 \set@hyphenmins\tw@\thr@@\relax
4360 \else
4361 \expandafter\expandafter\expandafter\set@hyphenmins
4362 \csname #1hyphenmins\endcsname
4363 \fi
4364 \the\toks@
4365 \toks@{}%
4366 \fi}

\bbl@get@enc The macro \bbl@get@enc extracts the font encoding from the language name and stores it in
\bbl@hyph@enc \bbl@hyph@enc. It uses delimited arguments to achieve this.

4367 \def\bbl@get@enc#1:#2:#3\@@@{\def\bbl@hyph@enc{#2}}

Now, hooks are defined. For efficiency reasons, they are dealt here in a special way. Besides luatex,
format-specific configuration files are taken into account. loadkernel currently loads nothing, but
define some basic macros instead.

4368 \def\bbl@hook@everylanguage#1{}
4369 \def\bbl@hook@loadpatterns#1{\input #1\relax}
4370 \let\bbl@hook@loadexceptions\bbl@hook@loadpatterns
4371 \def\bbl@hook@loadkernel#1{%
4372 \def\addlanguage{\csname newlanguage\endcsname}%
4373 \def\adddialect##1##2{%
4374 \global\chardef##1##2\relax
4375 \wlog{\string##1 = a dialect from \string\language##2}}%
4376 \def\iflanguage##1{%
4377 \expandafter\ifx\csname l@##1\endcsname\relax
4378 \@nolanerr{##1}%
4379 \else
4380 \ifnum\csname l@##1\endcsname=\language
4381 \expandafter\expandafter\expandafter\@firstoftwo
4382 \else
4383 \expandafter\expandafter\expandafter\@secondoftwo
4384 \fi
4385 \fi}%
4386 \def\providehyphenmins##1##2{%
4387 \expandafter\ifx\csname ##1hyphenmins\endcsname\relax
4388 \@namedef{##1hyphenmins}{##2}%
4389 \fi}%

152
4390 \def\set@hyphenmins##1##2{%
4391 \lefthyphenmin##1\relax
4392 \righthyphenmin##2\relax}%
4393 \def\selectlanguage{%
4394 \errhelp{Selecting a language requires a package supporting it}%
4395 \errmessage{Not loaded}}%
4396 \let\foreignlanguage\selectlanguage
4397 \let\otherlanguage\selectlanguage
4398 \expandafter\let\csname otherlanguage*\endcsname\selectlanguage
4399 \def\bbl@usehooks##1##2{}% TODO. Temporary!!
4400 \def\setlocale{%
4401 \errhelp{Find an armchair, sit down and wait}%
4402 \errmessage{Not yet available}}%
4403 \let\uselocale\setlocale
4404 \let\locale\setlocale
4405 \let\selectlocale\setlocale
4406 \let\localename\setlocale
4407 \let\textlocale\setlocale
4408 \let\textlanguage\setlocale
4409 \let\languagetext\setlocale}
4410 \begingroup
4411 \def\AddBabelHook#1#2{%
4412 \expandafter\ifx\csname bbl@hook@#2\endcsname\relax
4413 \def\next{\toks1}%
4414 \else
4415 \def\next{\expandafter\gdef\csname bbl@hook@#2\endcsname####1}%
4416 \fi
4417 \next}
4418 \ifx\directlua\@undefined
4419 \ifx\XeTeXinputencoding\@undefined\else
4420 \input xebabel.def
4421 \fi
4422 \else
4423 \input luababel.def
4424 \fi
4425 \openin1 = babel-\[email protected]
4426 \ifeof1
4427 \else
4428 \input babel-\[email protected]\relax
4429 \fi
4430 \closein1
4431 \endgroup
4432 \bbl@hook@loadkernel{switch.def}

\readconfigfile The configuration file can now be opened for reading.

4433 \openin1 = language.dat

See if the file exists, if not, use the default hyphenation file hyphen.tex. The user will be informed
about this.

4434 \def\languagename{english}%
4435 \ifeof1
4436 \message{I couldn't find the file language.dat,\space
4437 I will try the file hyphen.tex}
4438 \input hyphen.tex\relax
4439 \chardef\l@english\z@
4440 \else

Pattern registers are allocated using count register \last@language. Its initial value is 0. The
definition of the macro \newlanguage is such that it first increments the count register and then
defines the language. In order to have the first patterns loaded in pattern register number 0 we
initialize \last@language with the value −1.

4441 \last@language\m@ne

153
We now read lines from the file until the end is found. While reading from the input, it is useful to
switch off recognition of the end-of-line character. This saves us stripping off spaces from the
contents of the control sequence.

4442 \loop
4443 \endlinechar\m@ne
4444 \read1 to \bbl@line
4445 \endlinechar`\^^M

If the file has reached its end, exit from the loop here. If not, empty lines are skipped. Add 3 space
characters to the end of \bbl@line. This is needed to be able to recognize the arguments of
\process@line later on. The default language should be the very first one.

4446 \if T\ifeof1F\fi T\relax


4447 \ifx\bbl@line\@empty\else
4448 \edef\bbl@line{\bbl@line\space\space\space}%
4449 \expandafter\process@line\bbl@line\relax
4450 \fi
4451 \repeat

Check for the end of the file. We must reverse the test for \ifeof without \else. Then reactivate the
default patterns, and close the configuration file.

4452 \begingroup
4453 \def\bbl@elt#1#2#3#4{%
4454 \global\language=#2\relax
4455 \gdef\languagename{#1}%
4456 \def\bbl@elt##1##2##3##4{}}%
4457 \bbl@languages
4458 \endgroup
4459 \fi
4460 \closein1

We add a message about the fact that babel is loaded in the format and with which language patterns
to the \everyjob register.

4461 \if/\the\toks@/\else
4462 \errhelp{language.dat loads no language, only synonyms}
4463 \errmessage{Orphan language synonym}
4464 \fi

Also remove some macros from memory and raise an error if \toks@ is not empty. Finally load
switch.def, but the latter is not required and the line inputting it may be commented out.

4465 \let\bbl@line\@undefined
4466 \let\process@line\@undefined
4467 \let\process@synonym\@undefined
4468 \let\process@language\@undefined
4469 \let\bbl@get@enc\@undefined
4470 \let\bbl@hyph@enc\@undefined
4471 \let\bbl@tempa\@undefined
4472 \let\bbl@hook@loadkernel\@undefined
4473 \let\bbl@hook@everylanguage\@undefined
4474 \let\bbl@hook@loadpatterns\@undefined
4475 \let\bbl@hook@loadexceptions\@undefined
4476 h/patternsi

Here the code for iniTEX ends.

11 Font handling with fontspec


Add the bidi handler just before luaoftload, which is loaded by default by LaTeX. Just in case, consider
the possibility it has not been loaded. First, a couple of definitions related to bidi [misplaced].

4477 hh∗More package optionsii ≡


4478 \chardef\bbl@bidimode\z@
4479 \DeclareOption{bidi=default}{\chardef\bbl@bidimode=\@ne}
4480 \DeclareOption{bidi=basic}{\chardef\bbl@bidimode=101 }

154
4481 \DeclareOption{bidi=basic-r}{\chardef\bbl@bidimode=102 }
4482 \DeclareOption{bidi=bidi}{\chardef\bbl@bidimode=201 }
4483 \DeclareOption{bidi=bidi-r}{\chardef\bbl@bidimode=202 }
4484 \DeclareOption{bidi=bidi-l}{\chardef\bbl@bidimode=203 }
4485 hh/More package optionsii

With explicit languages, we could define the font at once, but we don’t. Just wait and see if the
language is actually activated. bbl@font replaces hardcoded font names inside \..family by the
corresponding macro \..default.
At the time of this writing, fontspec shows a warning about there are languages not available, which
some people think refers to babel, even if there is nothing wrong. Here is hack to patch fontspec to
avoid the misleading message, which is replaced ba a more explanatory one.
4486 hh∗Font selectionii ≡
4487 \bbl@trace{Font handling with fontspec}
4488 \ifx\ExplSyntaxOn\@undefined\else
4489 \def\bbl@fs@warn@nx#1#2{% \bbl@tempfs is the original macro
4490 \in@{,#1,}{,no-script,language-not-exist,}%
4491 \ifin@\else\bbl@tempfs@nx{#1}{#2}\fi}
4492 \def\bbl@fs@warn@nxx#1#2#3{%
4493 \in@{,#1,}{,no-script,language-not-exist,}%
4494 \ifin@\else\bbl@tempfs@nxx{#1}{#2}{#3}\fi}
4495 \def\bbl@loadfontspec{%
4496 \let\bbl@loadfontspec\relax
4497 \ifx\fontspec\@undefined
4498 \usepackage{fontspec}%
4499 \fi}%
4500 \fi
4501 \@onlypreamble\babelfont
4502 \newcommand\babelfont[2][]{% 1=langs/scripts 2=fam
4503 \bbl@foreach{#1}{%
4504 \expandafter\ifx\csname date##1\endcsname\relax
4505 \IfFileExists{babel-##1.tex}%
4506 {\babelprovide{##1}}%
4507 {}%
4508 \fi}%
4509 \edef\bbl@tempa{#1}%
4510 \def\bbl@tempb{#2}% Used by \bbl@bblfont
4511 \bbl@loadfontspec
4512 \EnableBabelHook{babel-fontspec}% Just calls \bbl@switchfont
4513 \bbl@bblfont}
4514 \newcommand\bbl@bblfont[2][]{% 1=features 2=fontname, @font=rm|sf|tt
4515 \bbl@ifunset{\bbl@tempb family}%
4516 {\bbl@providefam{\bbl@tempb}}%
4517 {}%
4518 % For the default font, just in case:
4519 \bbl@ifunset{bbl@lsys@\languagename}{\bbl@provide@lsys{\languagename}}{}%
4520 \expandafter\bbl@ifblank\expandafter{\bbl@tempa}%
4521 {\bbl@csarg\edef{\bbl@tempb dflt@}{<>{#1}{#2}}% save bbl@rmdflt@
4522 \bbl@exp{%
4523 \let\<bbl@\bbl@tempb dflt@\languagename>\<bbl@\bbl@tempb dflt@>%
4524 \\\bbl@font@set\<bbl@\bbl@tempb dflt@\languagename>%
4525 \<\bbl@tempb default>\<\bbl@tempb family>}}%
4526 {\bbl@foreach\bbl@tempa{% ie bbl@rmdflt@lang / *scrt
4527 \bbl@csarg\def{\bbl@tempb dflt@##1}{<>{#1}{#2}}}}}%
If the family in the previous command does not exist, it must be defined. Here is how:
4528 \def\bbl@providefam#1{%
4529 \bbl@exp{%
4530 \\\newcommand\<#1default>{}% Just define it
4531 \\\bbl@add@list\\\bbl@font@fams{#1}%
4532 \\\DeclareRobustCommand\<#1family>{%
4533 \\\not@math@alphabet\<#1family>\relax
4534 % \\\prepare@family@series@update{#1}\<#1default>% TODO. Fails
4535 \\\fontfamily\<#1default>%

155
4536 \<ifx>\\\UseHooks\\\@undefined\<else>\\\UseHook{#1family}\<fi>%
4537 \\\selectfont}%
4538 \\\DeclareTextFontCommand{\<text#1>}{\<#1family>}}}
The following macro is activated when the hook babel-fontspec is enabled. But before, we define a
macro for a warning, which sets a flag to avoid duplicate them.
4539 \def\bbl@nostdfont#1{%
4540 \bbl@ifunset{bbl@WFF@\f@family}%
4541 {\bbl@csarg\gdef{WFF@\f@family}{}% Flag, to avoid dupl warns
4542 \bbl@infowarn{The current font is not a babel standard family:\\%
4543 #1%
4544 \fontname\font\\%
4545 There is nothing intrinsically wrong with this warning, and\\%
4546 you can ignore it altogether if you do not need these\\%
4547 families. But if they are used in the document, you should be\\%
4548 aware 'babel' will not set Script and Language for them, so\\%
4549 you may consider defining a new family with \string\babelfont.\\%
4550 See the manual for further details about \string\babelfont.\\%
4551 Reported}}
4552 {}}%
4553 \gdef\bbl@switchfont{%
4554 \bbl@ifunset{bbl@lsys@\languagename}{\bbl@provide@lsys{\languagename}}{}%
4555 \bbl@exp{% eg Arabic -> arabic
4556 \lowercase{\edef\\\bbl@tempa{\bbl@cl{sname}}}}%
4557 \bbl@foreach\bbl@font@fams{%
4558 \bbl@ifunset{bbl@##1dflt@\languagename}% (1) language?
4559 {\bbl@ifunset{bbl@##1dflt@*\bbl@tempa}% (2) from script?
4560 {\bbl@ifunset{bbl@##1dflt@}% 2=F - (3) from generic?
4561 {}% 123=F - nothing!
4562 {\bbl@exp{% 3=T - from generic
4563 \global\let\<bbl@##1dflt@\languagename>%
4564 \<bbl@##1dflt@>}}}%
4565 {\bbl@exp{% 2=T - from script
4566 \global\let\<bbl@##1dflt@\languagename>%
4567 \<bbl@##1dflt@*\bbl@tempa>}}}%
4568 {}}% 1=T - language, already defined
4569 \def\bbl@tempa{\bbl@nostdfont{}}%
4570 \bbl@foreach\bbl@font@fams{% don't gather with prev for
4571 \bbl@ifunset{bbl@##1dflt@\languagename}%
4572 {\bbl@cs{famrst@##1}%
4573 \global\bbl@csarg\let{famrst@##1}\relax}%
4574 {\bbl@exp{% order is relevant. TODO: but sometimes wrong!
4575 \\\bbl@add\\\originalTeX{%
4576 \\\bbl@font@rst{\bbl@cl{##1dflt}}%
4577 \<##1default>\<##1family>{##1}}%
4578 \\\bbl@font@set\<bbl@##1dflt@\languagename>% the main part!
4579 \<##1default>\<##1family>}}}%
4580 \bbl@ifrestoring{}{\bbl@tempa}}%
The following is executed at the beginning of the aux file or the document to warn about fonts not
defined with \babelfont.
4581 \ifx\f@family\@undefined\else % if latex
4582 \ifcase\bbl@engine % if pdftex
4583 \let\bbl@ckeckstdfonts\relax
4584 \else
4585 \def\bbl@ckeckstdfonts{%
4586 \begingroup
4587 \global\let\bbl@ckeckstdfonts\relax
4588 \let\bbl@tempa\@empty
4589 \bbl@foreach\bbl@font@fams{%
4590 \bbl@ifunset{bbl@##1dflt@}%
4591 {\@nameuse{##1family}%
4592 \bbl@csarg\gdef{WFF@\f@family}{}% Flag
4593 \bbl@exp{\\\bbl@add\\\bbl@tempa{* \<##1family>= \f@family\\\\%

156
4594 \space\space\fontname\font\\\\}}%
4595 \bbl@csarg\xdef{##1dflt@}{\f@family}%
4596 \expandafter\xdef\csname ##1default\endcsname{\f@family}}%
4597 {}}%
4598 \ifx\bbl@tempa\@empty\else
4599 \bbl@infowarn{The following font families will use the default\\%
4600 settings for all or some languages:\\%
4601 \bbl@tempa
4602 There is nothing intrinsically wrong with it, but\\%
4603 'babel' will no set Script and Language, which could\\%
4604 be relevant in some languages. If your document uses\\%
4605 these families, consider redefining them with \string\babelfont.\\%
4606 Reported}%
4607 \fi
4608 \endgroup}
4609 \fi
4610 \fi

Now the macros defining the font with fontspec.


When there are repeated keys in fontspec, the last value wins. So, we just place the ini settings at the
beginning, and user settings will take precedence. We must deactivate temporarily \bbl@mapselect
because \selectfont is called internally when a font is defined.

4611 \def\bbl@font@set#1#2#3{% eg \bbl@rmdflt@lang \rmdefault \rmfamily


4612 \bbl@xin@{<>}{#1}%
4613 \ifin@
4614 \bbl@exp{\\\bbl@fontspec@set\\#1\expandafter\@gobbletwo#1\\#3}%
4615 \fi
4616 \bbl@exp{% 'Unprotected' macros return prev values
4617 \def\\#2{#1}% eg, \rmdefault{\bbl@rmdflt@lang}
4618 \\\bbl@ifsamestring{#2}{\f@family}%
4619 {\\#3%
4620 \\\bbl@ifsamestring{\f@series}{\bfdefault}{\\\bfseries}{}%
4621 \let\\\bbl@tempa\relax}%
4622 {}}}
4623 % TODO - next should be global?, but even local does its job. I'm
4624 % still not sure -- must investigate:
4625 \def\bbl@fontspec@set#1#2#3#4{% eg \bbl@rmdflt@lang fnt-opt fnt-nme \xxfamily
4626 \let\bbl@tempe\bbl@mapselect
4627 \let\bbl@mapselect\relax
4628 \let\bbl@temp@fam#4% eg, '\rmfamily', to be restored below
4629 \let#4\@empty % Make sure \renewfontfamily is valid
4630 \bbl@exp{%
4631 \let\\\bbl@temp@pfam\<\bbl@stripslash#4\space>% eg, '\rmfamily '
4632 \<keys_if_exist:nnF>{fontspec-opentype}{Script/\bbl@cl{sname}}%
4633 {\\\newfontscript{\bbl@cl{sname}}{\bbl@cl{sotf}}}%
4634 \<keys_if_exist:nnF>{fontspec-opentype}{Language/\bbl@cl{lname}}%
4635 {\\\newfontlanguage{\bbl@cl{lname}}{\bbl@cl{lotf}}}%
4636 \let\\\bbl@tempfs@nx\<__fontspec_warning:nx>%
4637 \let\<__fontspec_warning:nx>\\\bbl@fs@warn@nx
4638 \let\\\bbl@tempfs@nxx\<__fontspec_warning:nxx>%
4639 \let\<__fontspec_warning:nxx>\\\bbl@fs@warn@nxx
4640 \\\renewfontfamily\\#4%
4641 [\bbl@cl{lsys},#2]}{#3}% ie \bbl@exp{..}{#3}
4642 \bbl@exp{%
4643 \let\<__fontspec_warning:nx>\\\bbl@tempfs@nx
4644 \let\<__fontspec_warning:nxx>\\\bbl@tempfs@nxx}%
4645 \begingroup
4646 #4%
4647 \xdef#1{\f@family}% eg, \bbl@rmdflt@lang{FreeSerif(0)}
4648 \endgroup
4649 \let#4\bbl@temp@fam
4650 \bbl@exp{\let\<\bbl@stripslash#4\space>}\bbl@temp@pfam
4651 \let\bbl@mapselect\bbl@tempe}%

157
font@rst and famrst are only used when there is no global settings, to save and restore de previous
families. Not really necessary, but done for optimization.

4652 \def\bbl@font@rst#1#2#3#4{%
4653 \bbl@csarg\def{famrst@#4}{\bbl@font@set{#1}#2#3}}

The default font families. They are eurocentric, but the list can be expanded easily with \babelfont.

4654 \def\bbl@font@fams{rm,sf,tt}
4655 hh/Font selectionii

12 Hooks for XeTeX and LuaTeX


12.1 XeTeX
Unfortunately, the current encoding cannot be retrieved and therefore it is reset always to utf8,
which seems a sensible default.

4656 hh∗Footnote changesii ≡


4657 \bbl@trace{Bidi footnotes}
4658 \ifnum\bbl@bidimode>\z@
4659 \def\bbl@footnote#1#2#3{%
4660 \@ifnextchar[%
4661 {\bbl@footnote@o{#1}{#2}{#3}}%
4662 {\bbl@footnote@x{#1}{#2}{#3}}}
4663 \long\def\bbl@footnote@x#1#2#3#4{%
4664 \bgroup
4665 \select@language@x{\bbl@main@language}%
4666 \bbl@fn@footnote{#2#1{\ignorespaces#4}#3}%
4667 \egroup}
4668 \long\def\bbl@footnote@o#1#2#3[#4]#5{%
4669 \bgroup
4670 \select@language@x{\bbl@main@language}%
4671 \bbl@fn@footnote[#4]{#2#1{\ignorespaces#5}#3}%
4672 \egroup}
4673 \def\bbl@footnotetext#1#2#3{%
4674 \@ifnextchar[%
4675 {\bbl@footnotetext@o{#1}{#2}{#3}}%
4676 {\bbl@footnotetext@x{#1}{#2}{#3}}}
4677 \long\def\bbl@footnotetext@x#1#2#3#4{%
4678 \bgroup
4679 \select@language@x{\bbl@main@language}%
4680 \bbl@fn@footnotetext{#2#1{\ignorespaces#4}#3}%
4681 \egroup}
4682 \long\def\bbl@footnotetext@o#1#2#3[#4]#5{%
4683 \bgroup
4684 \select@language@x{\bbl@main@language}%
4685 \bbl@fn@footnotetext[#4]{#2#1{\ignorespaces#5}#3}%
4686 \egroup}
4687 \def\BabelFootnote#1#2#3#4{%
4688 \ifx\bbl@fn@footnote\@undefined
4689 \let\bbl@fn@footnote\footnote
4690 \fi
4691 \ifx\bbl@fn@footnotetext\@undefined
4692 \let\bbl@fn@footnotetext\footnotetext
4693 \fi
4694 \bbl@ifblank{#2}%
4695 {\def#1{\bbl@footnote{\@firstofone}{#3}{#4}}
4696 \@namedef{\bbl@stripslash#1text}%
4697 {\bbl@footnotetext{\@firstofone}{#3}{#4}}}%
4698 {\def#1{\bbl@exp{\\\bbl@footnote{\\\foreignlanguage{#2}}}{#3}{#4}}%
4699 \@namedef{\bbl@stripslash#1text}%
4700 {\bbl@exp{\\\bbl@footnotetext{\\\foreignlanguage{#2}}}{#3}{#4}}}}
4701 \fi
4702 hh/Footnote changesii

158
Now, the code.

4703 h∗xetexi
4704 \def\BabelStringsDefault{unicode}
4705 \let\xebbl@stop\relax
4706 \AddBabelHook{xetex}{encodedcommands}{%
4707 \def\bbl@tempa{#1}%
4708 \ifx\bbl@tempa\@empty
4709 \XeTeXinputencoding"bytes"%
4710 \else
4711 \XeTeXinputencoding"#1"%
4712 \fi
4713 \def\xebbl@stop{\XeTeXinputencoding"utf8"}}
4714 \AddBabelHook{xetex}{stopcommands}{%
4715 \xebbl@stop
4716 \let\xebbl@stop\relax}
4717 \def\bbl@intraspace#1 #2 #3\@@{%
4718 \bbl@csarg\gdef{xeisp@\languagename}%
4719 {\XeTeXlinebreakskip #1em plus #2em minus #3em\relax}}
4720 \def\bbl@intrapenalty#1\@@{%
4721 \bbl@csarg\gdef{xeipn@\languagename}%
4722 {\XeTeXlinebreakpenalty #1\relax}}
4723 \def\bbl@provide@intraspace{%
4724 \bbl@xin@{/s}{/\bbl@cl{lnbrk}}%
4725 \ifin@\else\bbl@xin@{/c}{/\bbl@cl{lnbrk}}\fi
4726 \ifin@
4727 \bbl@ifunset{bbl@intsp@\languagename}{}%
4728 {\expandafter\ifx\csname bbl@intsp@\languagename\endcsname\@empty\else
4729 \ifx\bbl@KVP@intraspace\@nnil
4730 \bbl@exp{%
4731 \\\bbl@intraspace\bbl@cl{intsp}\\\@@}%
4732 \fi
4733 \ifx\bbl@KVP@intrapenalty\@nnil
4734 \bbl@intrapenalty0\@@
4735 \fi
4736 \fi
4737 \ifx\bbl@KVP@intraspace\@nnil\else % We may override the ini
4738 \expandafter\bbl@intraspace\bbl@KVP@intraspace\@@
4739 \fi
4740 \ifx\bbl@KVP@intrapenalty\@nnil\else
4741 \expandafter\bbl@intrapenalty\bbl@KVP@intrapenalty\@@
4742 \fi
4743 \bbl@exp{%
4744 % TODO. Execute only once (but redundant):
4745 \\\bbl@add\<extras\languagename>{%
4746 \XeTeXlinebreaklocale "\bbl@cl{tbcp}"%
4747 \<bbl@xeisp@\languagename>%
4748 \<bbl@xeipn@\languagename>}%
4749 \\\bbl@toglobal\<extras\languagename>%
4750 \\\bbl@add\<noextras\languagename>{%
4751 \XeTeXlinebreaklocale ""}%
4752 \\\bbl@toglobal\<noextras\languagename>}%
4753 \ifx\bbl@ispacesize\@undefined
4754 \gdef\bbl@ispacesize{\bbl@cl{xeisp}}%
4755 \ifx\AtBeginDocument\@notprerr
4756 \expandafter\@secondoftwo % to execute right now
4757 \fi
4758 \AtBeginDocument{\bbl@patchfont{\bbl@ispacesize}}%
4759 \fi}%
4760 \fi}
4761 \ifx\DisableBabelHook\@undefined\endinput\fi
4762 \AddBabelHook{babel-fontspec}{afterextras}{\bbl@switchfont}
4763 \AddBabelHook{babel-fontspec}{beforestart}{\bbl@ckeckstdfonts}
4764 \DisableBabelHook{babel-fontspec}

159
4765 hhFontselectionii
4766 \input txtbabel.def
4767 h/xetexi

12.2 Layout
In progress.
Note elements like headlines and margins can be modified easily with packages like fancyhdr,
typearea or titleps, and geometry.
\bbl@startskip and \bbl@endskip are available to package authors. Thanks to the TEX expansion
mechanism the following constructs are valid: \adim\bbl@startskip,
\advance\bbl@startskip\adim, \bbl@startskip\adim.
Consider txtbabel as a shorthand for tex–xet babel, which is the bidi model in both pdftex and xetex.
4768 h∗texxeti
4769 \providecommand\bbl@provide@intraspace{}
4770 \bbl@trace{Redefinitions for bidi layout}
4771 \def\bbl@sspre@caption{%
4772 \bbl@exp{\everyhbox{\\\bbl@textdir\bbl@cs{wdir@\bbl@main@language}}}}
4773 \ifx\bbl@opt@layout\@nnil\endinput\fi % No layout
4774 \def\bbl@startskip{\ifcase\bbl@thepardir\leftskip\else\rightskip\fi}
4775 \def\bbl@endskip{\ifcase\bbl@thepardir\rightskip\else\leftskip\fi}
4776 \ifx\bbl@beforeforeign\leavevmode % A poor test for bidi=
4777 \def\@hangfrom#1{%
4778 \setbox\@tempboxa\hbox{{#1}}%
4779 \hangindent\ifcase\bbl@thepardir\wd\@tempboxa\else-\wd\@tempboxa\fi
4780 \noindent\box\@tempboxa}
4781 \def\raggedright{%
4782 \let\\\@centercr
4783 \bbl@startskip\z@skip
4784 \@rightskip\@flushglue
4785 \bbl@endskip\@rightskip
4786 \parindent\z@
4787 \parfillskip\bbl@startskip}
4788 \def\raggedleft{%
4789 \let\\\@centercr
4790 \bbl@startskip\@flushglue
4791 \bbl@endskip\z@skip
4792 \parindent\z@
4793 \parfillskip\bbl@endskip}
4794 \fi
4795 \IfBabelLayout{lists}
4796 {\bbl@sreplace\list
4797 {\@totalleftmargin\leftmargin}{\@totalleftmargin\bbl@listleftmargin}%
4798 \def\bbl@listleftmargin{%
4799 \ifcase\bbl@thepardir\leftmargin\else\rightmargin\fi}%
4800 \ifcase\bbl@engine
4801 \def\labelenumii{)\theenumii(}% pdftex doesn't reverse ()
4802 \def\p@enumiii{\p@enumii)\theenumii(}%
4803 \fi
4804 \bbl@sreplace\@verbatim
4805 {\leftskip\@totalleftmargin}%
4806 {\bbl@startskip\textwidth
4807 \advance\bbl@startskip-\linewidth}%
4808 \bbl@sreplace\@verbatim
4809 {\rightskip\z@skip}%
4810 {\bbl@endskip\z@skip}}%
4811 {}
4812 \IfBabelLayout{contents}
4813 {\bbl@sreplace\@dottedtocline{\leftskip}{\bbl@startskip}%
4814 \bbl@sreplace\@dottedtocline{\rightskip}{\bbl@endskip}}
4815 {}
4816 \IfBabelLayout{columns}
4817 {\bbl@sreplace\@outputdblcol{\hb@xt@\textwidth}{\bbl@outputhbox}%

160
4818 \def\bbl@outputhbox#1{%
4819 \hb@xt@\textwidth{%
4820 \hskip\columnwidth
4821 \hfil
4822 {\normalcolor\vrule \@width\columnseprule}%
4823 \hfil
4824 \hb@xt@\columnwidth{\box\@leftcolumn \hss}%
4825 \hskip-\textwidth
4826 \hb@xt@\columnwidth{\box\@outputbox \hss}%
4827 \hskip\columnsep
4828 \hskip\columnwidth}}}%
4829 {}
4830 hhFootnote changesii
4831 \IfBabelLayout{footnotes}%
4832 {\BabelFootnote\footnote\languagename{}{}%
4833 \BabelFootnote\localfootnote\languagename{}{}%
4834 \BabelFootnote\mainfootnote{}{}{}}
4835 {}

Implicitly reverses sectioning labels in bidi=basic, because the full stop is not in contact with L
numbers any more. I think there must be a better way.

4836 \IfBabelLayout{counters}%
4837 {\let\bbl@latinarabic=\@arabic
4838 \def\@arabic#1{\babelsublr{\bbl@latinarabic#1}}%
4839 \let\bbl@asciiroman=\@roman
4840 \def\@roman#1{\babelsublr{\ensureascii{\bbl@asciiroman#1}}}%
4841 \let\bbl@asciiRoman=\@Roman
4842 \def\@Roman#1{\babelsublr{\ensureascii{\bbl@asciiRoman#1}}}}{}
4843 h/texxeti

12.3 LuaTeX
The loader for luatex is based solely on language.dat, which is read on the fly. The code shouldn’t be
executed when the format is build, so we check if \AddBabelHook is defined. Then comes a modified
version of the loader in hyphen.cfg (without the hyphenmins stuff, which is under the direct control
of babel).
The names \l@<language> are defined and take some value from the beginning because all ldf files
assume this for the corresponding language to be considered valid, but patterns are not loaded
(except the first one). This is done later, when the language is first selected (which usually means
when the ldf finishes). If a language has been loaded, \bbl@hyphendata@<num> exists (with the
names of the files read).
The default setup preloads the first language into the format. This is intended mainly for ‘english’, so
that it’s available without further intervention from the user. To avoid duplicating it, the following
rule applies: if the “0th” language and the first language in language.dat have the same name then
just ignore the latter. If there are new synonymous, the are added, but note if the language patterns
have not been preloaded they won’t at run time.
Other preloaded languages could be read twice, if they have been preloaded into the format. This is
not optimal, but it shouldn’t happen very often – with luatex patterns are best loaded when the
document is typeset, and the “0th” language is preloaded just for backwards compatibility.
As of 1.1b, lua(e)tex is taken into account. Formerly, loading of patterns on the fly didn’t work in this
format, but with the new loader it does. Unfortunately, the format is not based on babel, and data
could be duplicated, because languages are reassigned above those in the format (nothing serious,
anyway). Note even with this format language.dat is used (under the principle of a single source),
instead of language.def.
Of course, there is room for improvements, like tools to read and reassign languages, which would
require modifying the language list, and better error handling.
We need catcode tables, but no format (targeted by babel) provide a command to allocate them
(although there are packages like ctablestack). FIX - This isn’t true anymore. For the moment, a
dangerous approach is used - just allocate a high random number and cross the fingers. To
complicate things, etex.sty changes the way languages are allocated.
This files is read at three places: (1) when plain.def, babel.sty starts, to read the list of available
languages from language.dat (for the base option); (2) at hyphen.cfg, to modify some macros; (3) in

161
the middle of plain.def and babel.sty, by babel.def, with the commands and other definitions for
luatex (eg, \babelpatterns).

4844 h∗luatexi
4845 \ifx\AddBabelHook\@undefined % When plain.def, babel.sty starts
4846 \bbl@trace{Read language.dat}
4847 \ifx\bbl@readstream\@undefined
4848 \csname newread\endcsname\bbl@readstream
4849 \fi
4850 \begingroup
4851 \toks@{}
4852 \count@\z@ % 0=start, 1=0th, 2=normal
4853 \def\bbl@process@line#1#2 #3 #4 {%
4854 \ifx=#1%
4855 \bbl@process@synonym{#2}%
4856 \else
4857 \bbl@process@language{#1#2}{#3}{#4}%
4858 \fi
4859 \ignorespaces}
4860 \def\bbl@manylang{%
4861 \ifnum\bbl@last>\@ne
4862 \bbl@info{Non-standard hyphenation setup}%
4863 \fi
4864 \let\bbl@manylang\relax}
4865 \def\bbl@process@language#1#2#3{%
4866 \ifcase\count@
4867 \@ifundefined{zth@#1}{\count@\tw@}{\count@\@ne}%
4868 \or
4869 \count@\tw@
4870 \fi
4871 \ifnum\count@=\tw@
4872 \expandafter\addlanguage\csname l@#1\endcsname
4873 \language\allocationnumber
4874 \chardef\bbl@last\allocationnumber
4875 \bbl@manylang
4876 \let\bbl@elt\relax
4877 \xdef\bbl@languages{%
4878 \bbl@languages\bbl@elt{#1}{\the\language}{#2}{#3}}%
4879 \fi
4880 \the\toks@
4881 \toks@{}}
4882 \def\bbl@process@synonym@aux#1#2{%
4883 \global\expandafter\chardef\csname l@#1\endcsname#2\relax
4884 \let\bbl@elt\relax
4885 \xdef\bbl@languages{%
4886 \bbl@languages\bbl@elt{#1}{#2}{}{}}}%
4887 \def\bbl@process@synonym#1{%
4888 \ifcase\count@
4889 \toks@\expandafter{\the\toks@\relax\bbl@process@synonym{#1}}%
4890 \or
4891 \@ifundefined{zth@#1}{\bbl@process@synonym@aux{#1}{0}}{}%
4892 \else
4893 \bbl@process@synonym@aux{#1}{\the\bbl@last}%
4894 \fi}
4895 \ifx\bbl@languages\@undefined % Just a (sensible?) guess
4896 \chardef\l@english\z@
4897 \chardef\l@USenglish\z@
4898 \chardef\bbl@last\z@
4899 \global\@namedef{bbl@hyphendata@0}{{hyphen.tex}{}}
4900 \gdef\bbl@languages{%
4901 \bbl@elt{english}{0}{hyphen.tex}{}%
4902 \bbl@elt{USenglish}{0}{}{}}
4903 \else
4904 \global\let\bbl@languages@format\bbl@languages

162
4905 \def\bbl@elt#1#2#3#4{% Remove all except language 0
4906 \ifnum#2>\z@\else
4907 \noexpand\bbl@elt{#1}{#2}{#3}{#4}%
4908 \fi}%
4909 \xdef\bbl@languages{\bbl@languages}%
4910 \fi
4911 \def\bbl@elt#1#2#3#4{\@namedef{zth@#1}{}} % Define flags
4912 \bbl@languages
4913 \openin\bbl@readstream=language.dat
4914 \ifeof\bbl@readstream
4915 \bbl@warning{I couldn't find language.dat. No additional\\%
4916 patterns loaded. Reported}%
4917 \else
4918 \loop
4919 \endlinechar\m@ne
4920 \read\bbl@readstream to \bbl@line
4921 \endlinechar`\^^M
4922 \if T\ifeof\bbl@readstream F\fi T\relax
4923 \ifx\bbl@line\@empty\else
4924 \edef\bbl@line{\bbl@line\space\space\space}%
4925 \expandafter\bbl@process@line\bbl@line\relax
4926 \fi
4927 \repeat
4928 \fi
4929 \endgroup
4930 \bbl@trace{Macros for reading patterns files}
4931 \def\bbl@get@enc#1:#2:#3\@@@{\def\bbl@hyph@enc{#2}}
4932 \ifx\babelcatcodetablenum\@undefined
4933 \ifx\newcatcodetable\@undefined
4934 \def\babelcatcodetablenum{5211}
4935 \def\bbl@pattcodes{\numexpr\babelcatcodetablenum+1\relax}
4936 \else
4937 \newcatcodetable\babelcatcodetablenum
4938 \newcatcodetable\bbl@pattcodes
4939 \fi
4940 \else
4941 \def\bbl@pattcodes{\numexpr\babelcatcodetablenum+1\relax}
4942 \fi
4943 \def\bbl@luapatterns#1#2{%
4944 \bbl@get@enc#1::\@@@
4945 \setbox\z@\hbox\bgroup
4946 \begingroup
4947 \savecatcodetable\babelcatcodetablenum\relax
4948 \initcatcodetable\bbl@pattcodes\relax
4949 \catcodetable\bbl@pattcodes\relax
4950 \catcode`\#=6 \catcode`\$=3 \catcode`\&=4 \catcode`\^=7
4951 \catcode`\_=8 \catcode`\{=1 \catcode`\}=2 \catcode`\~=13
4952 \catcode`\@=11 \catcode`\^^I=10 \catcode`\^^J=12
4953 \catcode`\<=12 \catcode`\>=12 \catcode`\*=12 \catcode`\.=12
4954 \catcode`\-=12 \catcode`\/=12 \catcode`\[=12 \catcode`\]=12
4955 \catcode`\`=12 \catcode`\'=12 \catcode`\"=12
4956 \input #1\relax
4957 \catcodetable\babelcatcodetablenum\relax
4958 \endgroup
4959 \def\bbl@tempa{#2}%
4960 \ifx\bbl@tempa\@empty\else
4961 \input #2\relax
4962 \fi
4963 \egroup}%
4964 \def\bbl@patterns@lua#1{%
4965 \language=\expandafter\ifx\csname l@#1:\f@encoding\endcsname\relax
4966 \csname l@#1\endcsname
4967 \edef\bbl@tempa{#1}%

163
4968 \else
4969 \csname l@#1:\f@encoding\endcsname
4970 \edef\bbl@tempa{#1:\f@encoding}%
4971 \fi\relax
4972 \@namedef{lu@texhyphen@loaded@\the\language}{}% Temp
4973 \@ifundefined{bbl@hyphendata@\the\language}%
4974 {\def\bbl@elt##1##2##3##4{%
4975 \ifnum##2=\csname l@\bbl@tempa\endcsname % #2=spanish, dutch:OT1...
4976 \def\bbl@tempb{##3}%
4977 \ifx\bbl@tempb\@empty\else % if not a synonymous
4978 \def\bbl@tempc{{##3}{##4}}%
4979 \fi
4980 \bbl@csarg\xdef{hyphendata@##2}{\bbl@tempc}%
4981 \fi}%
4982 \bbl@languages
4983 \@ifundefined{bbl@hyphendata@\the\language}%
4984 {\bbl@info{No hyphenation patterns were set for\\%
4985 language '\bbl@tempa'. Reported}}%
4986 {\expandafter\expandafter\expandafter\bbl@luapatterns
4987 \csname bbl@hyphendata@\the\language\endcsname}}{}}
4988 \endinput\fi
4989 % Here ends \ifx\AddBabelHook\@undefined
4990 % A few lines are only read by hyphen.cfg
4991 \ifx\DisableBabelHook\@undefined
4992 \AddBabelHook{luatex}{everylanguage}{%
4993 \def\process@language##1##2##3{%
4994 \def\process@line####1####2 ####3 ####4 {}}}
4995 \AddBabelHook{luatex}{loadpatterns}{%
4996 \input #1\relax
4997 \expandafter\gdef\csname bbl@hyphendata@\the\language\endcsname
4998 {{#1}{}}}
4999 \AddBabelHook{luatex}{loadexceptions}{%
5000 \input #1\relax
5001 \def\bbl@tempb##1##2{{##1}{#1}}%
5002 \expandafter\xdef\csname bbl@hyphendata@\the\language\endcsname
5003 {\expandafter\expandafter\expandafter\bbl@tempb
5004 \csname bbl@hyphendata@\the\language\endcsname}}
5005 \endinput\fi
5006 % Here stops reading code for hyphen.cfg
5007 % The following is read the 2nd time it's loaded
5008 \begingroup % TODO - to a lua file
5009 \catcode`\%=12
5010 \catcode`\'=12
5011 \catcode`\"=12
5012 \catcode`\:=12
5013 \directlua{
5014 Babel = Babel or {}
5015 function Babel.bytes(line)
5016 return line:gsub("(.)",
5017 function (chr) return unicode.utf8.char(string.byte(chr)) end)
5018 end
5019 function Babel.begin_process_input()
5020 if luatexbase and luatexbase.add_to_callback then
5021 luatexbase.add_to_callback('process_input_buffer',
5022 Babel.bytes,'Babel.bytes')
5023 else
5024 Babel.callback = callback.find('process_input_buffer')
5025 callback.register('process_input_buffer',Babel.bytes)
5026 end
5027 end
5028 function Babel.end_process_input ()
5029 if luatexbase and luatexbase.remove_from_callback then
5030 luatexbase.remove_from_callback('process_input_buffer','Babel.bytes')

164
5031 else
5032 callback.register('process_input_buffer',Babel.callback)
5033 end
5034 end
5035 function Babel.addpatterns(pp, lg)
5036 local lg = lang.new(lg)
5037 local pats = lang.patterns(lg) or ''
5038 lang.clear_patterns(lg)
5039 for p in pp:gmatch('[^%s]+') do
5040 ss = ''
5041 for i in string.utfcharacters(p:gsub('%d', '')) do
5042 ss = ss .. '%d?' .. i
5043 end
5044 ss = ss:gsub('^%%d%?%.', '%%.') .. '%d?'
5045 ss = ss:gsub('%.%%d%?$', '%%.')
5046 pats, n = pats:gsub('%s' .. ss .. '%s', ' ' .. p .. ' ')
5047 if n == 0 then
5048 tex.sprint(
5049 [[\string\csname\space bbl@info\endcsname{New pattern: ]]
5050 .. p .. [[}]])
5051 pats = pats .. ' ' .. p
5052 else
5053 tex.sprint(
5054 [[\string\csname\space bbl@info\endcsname{Renew pattern: ]]
5055 .. p .. [[}]])
5056 end
5057 end
5058 lang.patterns(lg, pats)
5059 end
5060 Babel.characters = Babel.characters or {}
5061 Babel.ranges = Babel.ranges or {}
5062 function Babel.hlist_has_bidi(head)
5063 local has_bidi = false
5064 local ranges = Babel.ranges
5065 for item in node.traverse(head) do
5066 if item.id == node.id'glyph' then
5067 local itemchar = item.char
5068 local chardata = Babel.characters[itemchar]
5069 local dir = chardata and chardata.d or nil
5070 if not dir then
5071 for nn, et in ipairs(ranges) do
5072 if itemchar < et[1] then
5073 break
5074 elseif itemchar <= et[2] then
5075 dir = et[3]
5076 break
5077 end
5078 end
5079 end
5080 if dir and (dir == 'al' or dir == 'r') then
5081 has_bidi = true
5082 end
5083 end
5084 end
5085 return has_bidi
5086 end
5087 function Babel.set_chranges_b (script, chrng)
5088 if chrng == '' then return end
5089 texio.write('Replacing ' .. script .. ' script ranges')
5090 Babel.script_blocks[script] = {}
5091 for s, e in string.gmatch(chrng..' ', '(.-)%.%.(.-)%s') do
5092 table.insert(
5093 Babel.script_blocks[script], {tonumber(s,16), tonumber(e,16)})

165
5094 end
5095 end
5096 }
5097 \endgroup
5098 \ifx\newattribute\@undefined\else
5099 \newattribute\bbl@attr@locale
5100 \directlua{ Babel.attr_locale = luatexbase.registernumber'bbl@attr@locale' }
5101 \AddBabelHook{luatex}{beforeextras}{%
5102 \setattribute\bbl@attr@locale\localeid}
5103 \fi
5104 \def\BabelStringsDefault{unicode}
5105 \let\luabbl@stop\relax
5106 \AddBabelHook{luatex}{encodedcommands}{%
5107 \def\bbl@tempa{utf8}\def\bbl@tempb{#1}%
5108 \ifx\bbl@tempa\bbl@tempb\else
5109 \directlua{Babel.begin_process_input()}%
5110 \def\luabbl@stop{%
5111 \directlua{Babel.end_process_input()}}%
5112 \fi}%
5113 \AddBabelHook{luatex}{stopcommands}{%
5114 \luabbl@stop
5115 \let\luabbl@stop\relax}
5116 \AddBabelHook{luatex}{patterns}{%
5117 \@ifundefined{bbl@hyphendata@\the\language}%
5118 {\def\bbl@elt##1##2##3##4{%
5119 \ifnum##2=\csname l@#2\endcsname % #2=spanish, dutch:OT1...
5120 \def\bbl@tempb{##3}%
5121 \ifx\bbl@tempb\@empty\else % if not a synonymous
5122 \def\bbl@tempc{{##3}{##4}}%
5123 \fi
5124 \bbl@csarg\xdef{hyphendata@##2}{\bbl@tempc}%
5125 \fi}%
5126 \bbl@languages
5127 \@ifundefined{bbl@hyphendata@\the\language}%
5128 {\bbl@info{No hyphenation patterns were set for\\%
5129 language '#2'. Reported}}%
5130 {\expandafter\expandafter\expandafter\bbl@luapatterns
5131 \csname bbl@hyphendata@\the\language\endcsname}}{}%
5132 \@ifundefined{bbl@patterns@}{}{%
5133 \begingroup
5134 \bbl@xin@{,\number\language,}{,\bbl@pttnlist}%
5135 \ifin@\else
5136 \ifx\bbl@patterns@\@empty\else
5137 \directlua{ Babel.addpatterns(
5138 [[\bbl@patterns@]], \number\language) }%
5139 \fi
5140 \@ifundefined{bbl@patterns@#1}%
5141 \@empty
5142 {\directlua{ Babel.addpatterns(
5143 [[\space\csname bbl@patterns@#1\endcsname]],
5144 \number\language) }}%
5145 \xdef\bbl@pttnlist{\bbl@pttnlist\number\language,}%
5146 \fi
5147 \endgroup}%
5148 \bbl@exp{%
5149 \bbl@ifunset{bbl@prehc@\languagename}{}%
5150 {\\\bbl@ifblank{\bbl@cs{prehc@\languagename}}{}%
5151 {\prehyphenchar=\bbl@cl{prehc}\relax}}}}

\babelpatterns This macro adds patterns. Two macros are used to store them: \bbl@patterns@ for the global ones
and \bbl@patterns@<lang> for language ones. We make sure there is a space between words when
multiple commands are used.

5152 \@onlypreamble\babelpatterns

166
5153 \AtEndOfPackage{%
5154 \newcommand\babelpatterns[2][\@empty]{%
5155 \ifx\bbl@patterns@\relax
5156 \let\bbl@patterns@\@empty
5157 \fi
5158 \ifx\bbl@pttnlist\@empty\else
5159 \bbl@warning{%
5160 You must not intermingle \string\selectlanguage\space and\\%
5161 \string\babelpatterns\space or some patterns will not\\%
5162 be taken into account. Reported}%
5163 \fi
5164 \ifx\@empty#1%
5165 \protected@edef\bbl@patterns@{\bbl@patterns@\space#2}%
5166 \else
5167 \edef\bbl@tempb{\zap@space#1 \@empty}%
5168 \bbl@for\bbl@tempa\bbl@tempb{%
5169 \bbl@fixname\bbl@tempa
5170 \bbl@iflanguage\bbl@tempa{%
5171 \bbl@csarg\protected@edef{patterns@\bbl@tempa}{%
5172 \@ifundefined{bbl@patterns@\bbl@tempa}%
5173 \@empty
5174 {\csname bbl@patterns@\bbl@tempa\endcsname\space}%
5175 #2}}}%
5176 \fi}}

12.4 Southeast Asian scripts


First, some general code for line breaking, used by \babelposthyphenation.
Replace regular (ie, implicit) discretionaries by spaceskips, based on the previous glyph (which I
think makes sense, because the hyphen and the previous char go always together). Other
discretionaries are not touched. See Unicode UAX 14.
5177 % TODO - to a lua file
5178 \directlua{
5179 Babel = Babel or {}
5180 Babel.linebreaking = Babel.linebreaking or {}
5181 Babel.linebreaking.before = {}
5182 Babel.linebreaking.after = {}
5183 Babel.locale = {} % Free to use, indexed by \localeid
5184 function Babel.linebreaking.add_before(func)
5185 tex.print([[\noexpand\csname bbl@luahyphenate\endcsname]])
5186 table.insert(Babel.linebreaking.before, func)
5187 end
5188 function Babel.linebreaking.add_after(func)
5189 tex.print([[\noexpand\csname bbl@luahyphenate\endcsname]])
5190 table.insert(Babel.linebreaking.after, func)
5191 end
5192 }
5193 \def\bbl@intraspace#1 #2 #3\@@{%
5194 \directlua{
5195 Babel = Babel or {}
5196 Babel.intraspaces = Babel.intraspaces or {}
5197 Babel.intraspaces['\csname bbl@sbcp@\languagename\endcsname'] = %
5198 {b = #1, p = #2, m = #3}
5199 Babel.locale_props[\the\localeid].intraspace = %
5200 {b = #1, p = #2, m = #3}
5201 }}
5202 \def\bbl@intrapenalty#1\@@{%
5203 \directlua{
5204 Babel = Babel or {}
5205 Babel.intrapenalties = Babel.intrapenalties or {}
5206 Babel.intrapenalties['\csname bbl@sbcp@\languagename\endcsname'] = #1
5207 Babel.locale_props[\the\localeid].intrapenalty = #1
5208 }}

167
5209 \begingroup
5210 \catcode`\%=12
5211 \catcode`\^=14
5212 \catcode`\'=12
5213 \catcode`\~=12
5214 \gdef\bbl@seaintraspace{^
5215 \let\bbl@seaintraspace\relax
5216 \directlua{
5217 Babel = Babel or {}
5218 Babel.sea_enabled = true
5219 Babel.sea_ranges = Babel.sea_ranges or {}
5220 function Babel.set_chranges (script, chrng)
5221 local c = 0
5222 for s, e in string.gmatch(chrng..' ', '(.-)%.%.(.-)%s') do
5223 Babel.sea_ranges[script..c]={tonumber(s,16), tonumber(e,16)}
5224 c = c + 1
5225 end
5226 end
5227 function Babel.sea_disc_to_space (head)
5228 local sea_ranges = Babel.sea_ranges
5229 local last_char = nil
5230 local quad = 655360 ^% 10 pt = 655360 = 10 * 65536
5231 for item in node.traverse(head) do
5232 local i = item.id
5233 if i == node.id'glyph' then
5234 last_char = item
5235 elseif i == 7 and item.subtype == 3 and last_char
5236 and last_char.char > 0x0C99 then
5237 quad = font.getfont(last_char.font).size
5238 for lg, rg in pairs(sea_ranges) do
5239 if last_char.char > rg[1] and last_char.char < rg[2] then
5240 lg = lg:sub(1, 4) ^% Remove trailing number of, eg, Cyrl1
5241 local intraspace = Babel.intraspaces[lg]
5242 local intrapenalty = Babel.intrapenalties[lg]
5243 local n
5244 if intrapenalty ~= 0 then
5245 n = node.new(14, 0) ^% penalty
5246 n.penalty = intrapenalty
5247 node.insert_before(head, item, n)
5248 end
5249 n = node.new(12, 13) ^% (glue, spaceskip)
5250 node.setglue(n, intraspace.b * quad,
5251 intraspace.p * quad,
5252 intraspace.m * quad)
5253 node.insert_before(head, item, n)
5254 node.remove(head, item)
5255 end
5256 end
5257 end
5258 end
5259 end
5260 }^^
5261 \bbl@luahyphenate}

12.5 CJK line breaking


Minimal line breaking for CJK scripts, mainly intended for simple documents and short texts as a
secundary language. Only line breaking, with a little stretching for justification, without any attempt
to adjust the spacing. It is based on (but does not strictly follow) the Unicode algorithm.
We first need a little table with the corresponding line breaking properties. A few characters have an
additional key for the width (fullwidth vs. halfwidth), not yet used. There is a separate file, defined
below.
5262 \catcode`\%=14

168
5263 \gdef\bbl@cjkintraspace{%
5264 \let\bbl@cjkintraspace\relax
5265 \directlua{
5266 Babel = Babel or {}
5267 require('babel-data-cjk.lua')
5268 Babel.cjk_enabled = true
5269 function Babel.cjk_linebreak(head)
5270 local GLYPH = node.id'glyph'
5271 local last_char = nil
5272 local quad = 655360 % 10 pt = 655360 = 10 * 65536
5273 local last_class = nil
5274 local last_lang = nil
5275
5276 for item in node.traverse(head) do
5277 if item.id == GLYPH then
5278
5279 local lang = item.lang
5280
5281 local LOCALE = node.get_attribute(item,
5282 Babel.attr_locale)
5283 local props = Babel.locale_props[LOCALE]
5284
5285 local class = Babel.cjk_class[item.char].c
5286
5287 if props.cjk_quotes and props.cjk_quotes[item.char] then
5288 class = props.cjk_quotes[item.char]
5289 end
5290
5291 if class == 'cp' then class = 'cl' end % )] as CL
5292 if class == 'id' then class = 'I' end
5293
5294 local br = 0
5295 if class and last_class and Babel.cjk_breaks[last_class][class] then
5296 br = Babel.cjk_breaks[last_class][class]
5297 end
5298
5299 if br == 1 and props.linebreak == 'c' and
5300 lang ~= \the\l@nohyphenation\space and
5301 last_lang ~= \the\l@nohyphenation then
5302 local intrapenalty = props.intrapenalty
5303 if intrapenalty ~= 0 then
5304 local n = node.new(14, 0) % penalty
5305 n.penalty = intrapenalty
5306 node.insert_before(head, item, n)
5307 end
5308 local intraspace = props.intraspace
5309 local n = node.new(12, 13) % (glue, spaceskip)
5310 node.setglue(n, intraspace.b * quad,
5311 intraspace.p * quad,
5312 intraspace.m * quad)
5313 node.insert_before(head, item, n)
5314 end
5315
5316 if font.getfont(item.font) then
5317 quad = font.getfont(item.font).size
5318 end
5319 last_class = class
5320 last_lang = lang
5321 else % if penalty, glue or anything else
5322 last_class = nil
5323 end
5324 end
5325 lang.hyphenate(head)

169
5326 end
5327 }%
5328 \bbl@luahyphenate}
5329 \gdef\bbl@luahyphenate{%
5330 \let\bbl@luahyphenate\relax
5331 \directlua{
5332 luatexbase.add_to_callback('hyphenate',
5333 function (head, tail)
5334 if Babel.linebreaking.before then
5335 for k, func in ipairs(Babel.linebreaking.before) do
5336 func(head)
5337 end
5338 end
5339 if Babel.cjk_enabled then
5340 Babel.cjk_linebreak(head)
5341 end
5342 lang.hyphenate(head)
5343 if Babel.linebreaking.after then
5344 for k, func in ipairs(Babel.linebreaking.after) do
5345 func(head)
5346 end
5347 end
5348 if Babel.sea_enabled then
5349 Babel.sea_disc_to_space(head)
5350 end
5351 end,
5352 'Babel.hyphenate')
5353 }
5354 }
5355 \endgroup
5356 \def\bbl@provide@intraspace{%
5357 \bbl@ifunset{bbl@intsp@\languagename}{}%
5358 {\expandafter\ifx\csname bbl@intsp@\languagename\endcsname\@empty\else
5359 \bbl@xin@{/c}{/\bbl@cl{lnbrk}}%
5360 \ifin@ % cjk
5361 \bbl@cjkintraspace
5362 \directlua{
5363 Babel = Babel or {}
5364 Babel.locale_props = Babel.locale_props or {}
5365 Babel.locale_props[\the\localeid].linebreak = 'c'
5366 }%
5367 \bbl@exp{\\\bbl@intraspace\bbl@cl{intsp}\\\@@}%
5368 \ifx\bbl@KVP@intrapenalty\@nnil
5369 \bbl@intrapenalty0\@@
5370 \fi
5371 \else % sea
5372 \bbl@seaintraspace
5373 \bbl@exp{\\\bbl@intraspace\bbl@cl{intsp}\\\@@}%
5374 \directlua{
5375 Babel = Babel or {}
5376 Babel.sea_ranges = Babel.sea_ranges or {}
5377 Babel.set_chranges('\bbl@cl{sbcp}',
5378 '\bbl@cl{chrng}')
5379 }%
5380 \ifx\bbl@KVP@intrapenalty\@nnil
5381 \bbl@intrapenalty0\@@
5382 \fi
5383 \fi
5384 \fi
5385 \ifx\bbl@KVP@intrapenalty\@nnil\else
5386 \expandafter\bbl@intrapenalty\bbl@KVP@intrapenalty\@@
5387 \fi}}

170
12.6 Arabic justification
5388 \ifnum\bbl@bidimode>100 \ifnum\bbl@bidimode<200
5389 \def\bblar@chars{%
5390 0628,0629,062A,062B,062C,062D,062E,062F,0630,0631,0632,0633,%
5391 0634,0635,0636,0637,0638,0639,063A,063B,063C,063D,063E,063F,%
5392 0640,0641,0642,0643,0644,0645,0646,0647,0649}
5393 \def\bblar@elongated{%
5394 0626,0628,062A,062B,0633,0634,0635,0636,063B,%
5395 063C,063D,063E,063F,0641,0642,0643,0644,0646,%
5396 0649,064A}
5397 \begingroup
5398 \catcode`_=11 \catcode`:=11
5399 \gdef\bblar@nofswarn{\gdef\msg_warning:nnx##1##2##3{}}
5400 \endgroup
5401 \gdef\bbl@arabicjust{%
5402 \let\bbl@arabicjust\relax
5403 \newattribute\bblar@kashida
5404 \directlua{ Babel.attr_kashida = luatexbase.registernumber'bblar@kashida' }%
5405 \bblar@kashida=\z@
5406 \bbl@patchfont{{\bbl@parsejalt}}%
5407 \directlua{
5408 Babel.arabic.elong_map = Babel.arabic.elong_map or {}
5409 Babel.arabic.elong_map[\the\localeid] = {}
5410 luatexbase.add_to_callback('post_linebreak_filter',
5411 Babel.arabic.justify, 'Babel.arabic.justify')
5412 luatexbase.add_to_callback('hpack_filter',
5413 Babel.arabic.justify_hbox, 'Babel.arabic.justify_hbox')
5414 }}%
5415 % Save both node lists to make replacement. TODO. Save also widths to
5416 % make computations
5417 \def\bblar@fetchjalt#1#2#3#4{%
5418 \bbl@exp{\\\bbl@foreach{#1}}{%
5419 \bbl@ifunset{bblar@JE@##1}%
5420 {\setbox\z@\hbox{^^^^200d\char"##1#2}}%
5421 {\setbox\z@\hbox{^^^^200d\char"\@nameuse{bblar@JE@##1}#2}}%
5422 \directlua{%
5423 local last = nil
5424 for item in node.traverse(tex.box[0].head) do
5425 if item.id == node.id'glyph' and item.char > 0x600 and
5426 not (item.char == 0x200D) then
5427 last = item
5428 end
5429 end
5430 Babel.arabic.#3['##1#4'] = last.char
5431 }}}
5432 % Brute force. No rules at all, yet. The ideal: look at jalt table. And
5433 % perhaps other tables (falt?, cswh?). What about kaf? And diacritic
5434 % positioning?
5435 \gdef\bbl@parsejalt{%
5436 \ifx\addfontfeature\@undefined\else
5437 \bbl@xin@{/e}{/\bbl@cl{lnbrk}}%
5438 \ifin@
5439 \directlua{%
5440 if Babel.arabic.elong_map[\the\localeid][\fontid\font] == nil then
5441 Babel.arabic.elong_map[\the\localeid][\fontid\font] = {}
5442 tex.print([[\string\csname\space bbl@parsejalti\endcsname]])
5443 end
5444 }%
5445 \fi
5446 \fi}
5447 \gdef\bbl@parsejalti{%
5448 \begingroup

171
5449 \let\bbl@parsejalt\relax % To avoid infinite loop
5450 \edef\bbl@tempb{\fontid\font}%
5451 \bblar@nofswarn
5452 \bblar@fetchjalt\bblar@elongated{}{from}{}%
5453 \bblar@fetchjalt\bblar@chars{^^^^064a}{from}{a}% Alef maksura
5454 \bblar@fetchjalt\bblar@chars{^^^^0649}{from}{y}% Yeh
5455 \addfontfeature{RawFeature=+jalt}%
5456 % \@namedef{bblar@JE@0643}{06AA}% todo: catch medial kaf
5457 \bblar@fetchjalt\bblar@elongated{}{dest}{}%
5458 \bblar@fetchjalt\bblar@chars{^^^^064a}{dest}{a}%
5459 \bblar@fetchjalt\bblar@chars{^^^^0649}{dest}{y}%
5460 \directlua{%
5461 for k, v in pairs(Babel.arabic.from) do
5462 if Babel.arabic.dest[k] and
5463 not (Babel.arabic.from[k] == Babel.arabic.dest[k]) then
5464 Babel.arabic.elong_map[\the\localeid][\bbl@tempb]
5465 [Babel.arabic.from[k]] = Babel.arabic.dest[k]
5466 end
5467 end
5468 }%
5469 \endgroup}
5470 %
5471 \begingroup
5472 \catcode`#=11
5473 \catcode`~=11
5474 \directlua{
5475
5476 Babel.arabic = Babel.arabic or {}
5477 Babel.arabic.from = {}
5478 Babel.arabic.dest = {}
5479 Babel.arabic.justify_factor = 0.95
5480 Babel.arabic.justify_enabled = true
5481
5482 function Babel.arabic.justify(head)
5483 if not Babel.arabic.justify_enabled then return head end
5484 for line in node.traverse_id(node.id'hlist', head) do
5485 Babel.arabic.justify_hlist(head, line)
5486 end
5487 return head
5488 end
5489
5490 function Babel.arabic.justify_hbox(head, gc, size, pack)
5491 local has_inf = false
5492 if Babel.arabic.justify_enabled and pack == 'exactly' then
5493 for n in node.traverse_id(12, head) do
5494 if n.stretch_order > 0 then has_inf = true end
5495 end
5496 if not has_inf then
5497 Babel.arabic.justify_hlist(head, nil, gc, size, pack)
5498 end
5499 end
5500 return head
5501 end
5502
5503 function Babel.arabic.justify_hlist(head, line, gc, size, pack)
5504 local d, new
5505 local k_list, k_item, pos_inline
5506 local width, width_new, full, k_curr, wt_pos, goal, shift
5507 local subst_done = false
5508 local elong_map = Babel.arabic.elong_map
5509 local last_line
5510 local GLYPH = node.id'glyph'
5511 local KASHIDA = Babel.attr_kashida

172
5512 local LOCALE = Babel.attr_locale
5513
5514 if line == nil then
5515 line = {}
5516 line.glue_sign = 1
5517 line.glue_order = 0
5518 line.head = head
5519 line.shift = 0
5520 line.width = size
5521 end
5522
5523 % Exclude last line. todo. But-- it discards one-word lines, too!
5524 % ? Look for glue = 12:15
5525 if (line.glue_sign == 1 and line.glue_order == 0) then
5526 elongs = {} % Stores elongated candidates of each line
5527 k_list = {} % And all letters with kashida
5528 pos_inline = 0 % Not yet used
5529
5530 for n in node.traverse_id(GLYPH, line.head) do
5531 pos_inline = pos_inline + 1 % To find where it is. Not used.
5532
5533 % Elongated glyphs
5534 if elong_map then
5535 local locale = node.get_attribute(n, LOCALE)
5536 if elong_map[locale] and elong_map[locale][n.font] and
5537 elong_map[locale][n.font][n.char] then
5538 table.insert(elongs, {node = n, locale = locale} )
5539 node.set_attribute(n.prev, KASHIDA, 0)
5540 end
5541 end
5542
5543 % Tatwil
5544 if Babel.kashida_wts then
5545 local k_wt = node.get_attribute(n, KASHIDA)
5546 if k_wt > 0 then % todo. parameter for multi inserts
5547 table.insert(k_list, {node = n, weight = k_wt, pos = pos_inline})
5548 end
5549 end
5550
5551 end % of node.traverse_id
5552
5553 if #elongs == 0 and #k_list == 0 then goto next_line end
5554 full = line.width
5555 shift = line.shift
5556 goal = full * Babel.arabic.justify_factor % A bit crude
5557 width = node.dimensions(line.head) % The 'natural' width
5558
5559 % == Elongated ==
5560 % Original idea taken from 'chikenize'
5561 while (#elongs > 0 and width < goal) do
5562 subst_done = true
5563 local x = #elongs
5564 local curr = elongs[x].node
5565 local oldchar = curr.char
5566 curr.char = elong_map[elongs[x].locale][curr.font][curr.char]
5567 width = node.dimensions(line.head) % Check if the line is too wide
5568 % Substitute back if the line would be too wide and break:
5569 if width > goal then
5570 curr.char = oldchar
5571 break
5572 end
5573 % If continue, pop the just substituted node from the list:
5574 table.remove(elongs, x)

173
5575 end
5576
5577 % == Tatwil ==
5578 if #k_list == 0 then goto next_line end
5579
5580 width = node.dimensions(line.head) % The 'natural' width
5581 k_curr = #k_list
5582 wt_pos = 1
5583
5584 while width < goal do
5585 subst_done = true
5586 k_item = k_list[k_curr].node
5587 if k_list[k_curr].weight == Babel.kashida_wts[wt_pos] then
5588 d = node.copy(k_item)
5589 d.char = 0x0640
5590 line.head, new = node.insert_after(line.head, k_item, d)
5591 width_new = node.dimensions(line.head)
5592 if width > goal or width == width_new then
5593 node.remove(line.head, new) % Better compute before
5594 break
5595 end
5596 width = width_new
5597 end
5598 if k_curr == 1 then
5599 k_curr = #k_list
5600 wt_pos = (wt_pos >= table.getn(Babel.kashida_wts)) and 1 or wt_pos+1
5601 else
5602 k_curr = k_curr - 1
5603 end
5604 end
5605
5606 ::next_line::
5607
5608 % Must take into account marks and ins, see luatex manual.
5609 % Have to be executed only if there are changes. Investigate
5610 % what's going on exactly.
5611 if subst_done and not gc then
5612 d = node.hpack(line.head, full, 'exactly')
5613 d.shift = shift
5614 node.insert_before(head, line, d)
5615 node.remove(head, line)
5616 end
5617 end % if process line
5618 end
5619 }
5620 \endgroup
5621 \fi\fi % Arabic just block

12.7 Common stuff


5622 \AddBabelHook{babel-fontspec}{afterextras}{\bbl@switchfont}
5623 \AddBabelHook{babel-fontspec}{beforestart}{\bbl@ckeckstdfonts}
5624 \DisableBabelHook{babel-fontspec}
5625 hhFont selectionii

12.8 Automatic fonts and ids switching


After defining the blocks for a number of scripts (must be extended and very likely fine tuned), we
define a short function which just traverse the node list to carry out the replacements. The table
loc_to_scr gets the locale form a script range (note the locale is the key, and that there is an
intermediate table built on the fly for optimization). This locale is then used to get the \language and
the \localeid as stored in locale_props, as well as the font (as requested). In the latter table a key
starting with / maps the font from the global one (the key) to the local one (the value). Maths are
skipped and discretionaries are handled in a special way.

174
5626 % TODO - to a lua file
5627 \directlua{
5628 Babel.script_blocks = {
5629 ['dflt'] = {},
5630 ['Arab'] = {{0x0600, 0x06FF}, {0x08A0, 0x08FF}, {0x0750, 0x077F},
5631 {0xFE70, 0xFEFF}, {0xFB50, 0xFDFF}, {0x1EE00, 0x1EEFF}},
5632 ['Armn'] = {{0x0530, 0x058F}},
5633 ['Beng'] = {{0x0980, 0x09FF}},
5634 ['Cher'] = {{0x13A0, 0x13FF}, {0xAB70, 0xABBF}},
5635 ['Copt'] = {{0x03E2, 0x03EF}, {0x2C80, 0x2CFF}, {0x102E0, 0x102FF}},
5636 ['Cyrl'] = {{0x0400, 0x04FF}, {0x0500, 0x052F}, {0x1C80, 0x1C8F},
5637 {0x2DE0, 0x2DFF}, {0xA640, 0xA69F}},
5638 ['Deva'] = {{0x0900, 0x097F}, {0xA8E0, 0xA8FF}},
5639 ['Ethi'] = {{0x1200, 0x137F}, {0x1380, 0x139F}, {0x2D80, 0x2DDF},
5640 {0xAB00, 0xAB2F}},
5641 ['Geor'] = {{0x10A0, 0x10FF}, {0x2D00, 0x2D2F}},
5642 % Don't follow strictly Unicode, which places some Coptic letters in
5643 % the 'Greek and Coptic' block
5644 ['Grek'] = {{0x0370, 0x03E1}, {0x03F0, 0x03FF}, {0x1F00, 0x1FFF}},
5645 ['Hans'] = {{0x2E80, 0x2EFF}, {0x3000, 0x303F}, {0x31C0, 0x31EF},
5646 {0x3300, 0x33FF}, {0x3400, 0x4DBF}, {0x4E00, 0x9FFF},
5647 {0xF900, 0xFAFF}, {0xFE30, 0xFE4F}, {0xFF00, 0xFFEF},
5648 {0x20000, 0x2A6DF}, {0x2A700, 0x2B73F},
5649 {0x2B740, 0x2B81F}, {0x2B820, 0x2CEAF},
5650 {0x2CEB0, 0x2EBEF}, {0x2F800, 0x2FA1F}},
5651 ['Hebr'] = {{0x0590, 0x05FF}},
5652 ['Jpan'] = {{0x3000, 0x303F}, {0x3040, 0x309F}, {0x30A0, 0x30FF},
5653 {0x4E00, 0x9FAF}, {0xFF00, 0xFFEF}},
5654 ['Khmr'] = {{0x1780, 0x17FF}, {0x19E0, 0x19FF}},
5655 ['Knda'] = {{0x0C80, 0x0CFF}},
5656 ['Kore'] = {{0x1100, 0x11FF}, {0x3000, 0x303F}, {0x3130, 0x318F},
5657 {0x4E00, 0x9FAF}, {0xA960, 0xA97F}, {0xAC00, 0xD7AF},
5658 {0xD7B0, 0xD7FF}, {0xFF00, 0xFFEF}},
5659 ['Laoo'] = {{0x0E80, 0x0EFF}},
5660 ['Latn'] = {{0x0000, 0x007F}, {0x0080, 0x00FF}, {0x0100, 0x017F},
5661 {0x0180, 0x024F}, {0x1E00, 0x1EFF}, {0x2C60, 0x2C7F},
5662 {0xA720, 0xA7FF}, {0xAB30, 0xAB6F}},
5663 ['Mahj'] = {{0x11150, 0x1117F}},
5664 ['Mlym'] = {{0x0D00, 0x0D7F}},
5665 ['Mymr'] = {{0x1000, 0x109F}, {0xAA60, 0xAA7F}, {0xA9E0, 0xA9FF}},
5666 ['Orya'] = {{0x0B00, 0x0B7F}},
5667 ['Sinh'] = {{0x0D80, 0x0DFF}, {0x111E0, 0x111FF}},
5668 ['Syrc'] = {{0x0700, 0x074F}, {0x0860, 0x086F}},
5669 ['Taml'] = {{0x0B80, 0x0BFF}},
5670 ['Telu'] = {{0x0C00, 0x0C7F}},
5671 ['Tfng'] = {{0x2D30, 0x2D7F}},
5672 ['Thai'] = {{0x0E00, 0x0E7F}},
5673 ['Tibt'] = {{0x0F00, 0x0FFF}},
5674 ['Vaii'] = {{0xA500, 0xA63F}},
5675 ['Yiii'] = {{0xA000, 0xA48F}, {0xA490, 0xA4CF}}
5676 }
5677
5678 Babel.script_blocks.Cyrs = Babel.script_blocks.Cyrl
5679 Babel.script_blocks.Hant = Babel.script_blocks.Hans
5680 Babel.script_blocks.Kana = Babel.script_blocks.Jpan
5681
5682 function Babel.locale_map(head)
5683 if not Babel.locale_mapped then return head end
5684
5685 local LOCALE = Babel.attr_locale
5686 local GLYPH = node.id('glyph')
5687 local inmath = false
5688 local toloc_save

175
5689 for item in node.traverse(head) do
5690 local toloc
5691 if not inmath and item.id == GLYPH then
5692 % Optimization: build a table with the chars found
5693 if Babel.chr_to_loc[item.char] then
5694 toloc = Babel.chr_to_loc[item.char]
5695 else
5696 for lc, maps in pairs(Babel.loc_to_scr) do
5697 for _, rg in pairs(maps) do
5698 if item.char >= rg[1] and item.char <= rg[2] then
5699 Babel.chr_to_loc[item.char] = lc
5700 toloc = lc
5701 break
5702 end
5703 end
5704 end
5705 end
5706 % Now, take action, but treat composite chars in a different
5707 % fashion, because they 'inherit' the previous locale. Not yet
5708 % optimized.
5709 if not toloc and
5710 (item.char >= 0x0300 and item.char <= 0x036F) or
5711 (item.char >= 0x1AB0 and item.char <= 0x1AFF) or
5712 (item.char >= 0x1DC0 and item.char <= 0x1DFF) then
5713 toloc = toloc_save
5714 end
5715 if toloc and Babel.locale_props[toloc] and
5716 Babel.locale_props[toloc].letters and
5717 tex.getcatcode(item.char) \string~= 11 then
5718 toloc = nil
5719 end
5720 if toloc and toloc > -1 then
5721 if Babel.locale_props[toloc].lg then
5722 item.lang = Babel.locale_props[toloc].lg
5723 node.set_attribute(item, LOCALE, toloc)
5724 end
5725 if Babel.locale_props[toloc]['/'..item.font] then
5726 item.font = Babel.locale_props[toloc]['/'..item.font]
5727 end
5728 toloc_save = toloc
5729 end
5730 elseif not inmath and item.id == 7 then % Apply recursively
5731 item.replace = item.replace and Babel.locale_map(item.replace)
5732 item.pre = item.pre and Babel.locale_map(item.pre)
5733 item.post = item.post and Babel.locale_map(item.post)
5734 elseif item.id == node.id'math' then
5735 inmath = (item.subtype == 0)
5736 end
5737 end
5738 return head
5739 end
5740 }

The code for \babelcharproperty is straightforward. Just note the modified lua table can be
different.

5741 \newcommand\babelcharproperty[1]{%
5742 \count@=#1\relax
5743 \ifvmode
5744 \expandafter\bbl@chprop
5745 \else
5746 \bbl@error{\string\babelcharproperty\space can be used only in\\%
5747 vertical mode (preamble or between paragraphs)}%
5748 {See the manual for futher info}%

176
5749 \fi}
5750 \newcommand\bbl@chprop[3][\the\count@]{%
5751 \@tempcnta=#1\relax
5752 \bbl@ifunset{bbl@chprop@#2}%
5753 {\bbl@error{No property named '#2'. Allowed values are\\%
5754 direction (bc), mirror (bmg), and linebreak (lb)}%
5755 {See the manual for futher info}}%
5756 {}%
5757 \loop
5758 \bbl@cs{chprop@#2}{#3}%
5759 \ifnum\count@<\@tempcnta
5760 \advance\count@\@ne
5761 \repeat}
5762 \def\bbl@chprop@direction#1{%
5763 \directlua{
5764 Babel.characters[\the\count@] = Babel.characters[\the\count@] or {}
5765 Babel.characters[\the\count@]['d'] = '#1'
5766 }}
5767 \let\bbl@chprop@bc\bbl@chprop@direction
5768 \def\bbl@chprop@mirror#1{%
5769 \directlua{
5770 Babel.characters[\the\count@] = Babel.characters[\the\count@] or {}
5771 Babel.characters[\the\count@]['m'] = '\number#1'
5772 }}
5773 \let\bbl@chprop@bmg\bbl@chprop@mirror
5774 \def\bbl@chprop@linebreak#1{%
5775 \directlua{
5776 Babel.cjk_characters[\the\count@] = Babel.cjk_characters[\the\count@] or {}
5777 Babel.cjk_characters[\the\count@]['c'] = '#1'
5778 }}
5779 \let\bbl@chprop@lb\bbl@chprop@linebreak
5780 \def\bbl@chprop@locale#1{%
5781 \directlua{
5782 Babel.chr_to_loc = Babel.chr_to_loc or {}
5783 Babel.chr_to_loc[\the\count@] =
5784 \bbl@ifblank{#1}{-1000}{\the\bbl@cs{id@@#1}}\space
5785 }}
Post-handling hyphenation patterns for non-standard rules, like ff to ff-f. There are still some
issues with speed (not very slow, but still slow). The Lua code is below.
5786 \directlua{
5787 Babel.nohyphenation = \the\l@nohyphenation
5788 }

Now the TEX high level interface, which requires the function defined above for converting strings to
functions returning a string. These functions handle the {n} syntax. For example, pre={1}{1}-
becomes function(m) return m[1]..m[1]..'-' end, where m are the matches returned after
applying the pattern. With a mapped capture the functions are similar to
function(m) return Babel.capt_map(m[1],1) end, where the last argument identifies the
mapping to be applied to m[1]. The way it is carried out is somewhat tricky, but the effect in not
dissimilar to lua load – save the code as string in a TeX macro, and expand this macro at the
appropriate place. As \directlua does not take into account the current catcode of @, we just avoid
this character in macro names (which explains the internal group, too).
5789 \begingroup
5790 \catcode`\~=12
5791 \catcode`\%=12
5792 \catcode`\&=14
5793 \catcode`\|=12
5794 \gdef\babelprehyphenation{&%
5795 \@ifnextchar[{\bbl@settransform{0}}{\bbl@settransform{0}[]}}
5796 \gdef\babelposthyphenation{&%
5797 \@ifnextchar[{\bbl@settransform{1}}{\bbl@settransform{1}[]}}
5798 \gdef\bbl@postlinebreak{\bbl@settransform{2}[]} &% WIP
5799 \gdef\bbl@settransform#1[#2]#3#4#5{&%

177
5800 \ifcase#1
5801 \bbl@activateprehyphen
5802 \or
5803 \bbl@activateposthyphen
5804 \fi
5805 \begingroup
5806 \def\babeltempa{\bbl@add@list\babeltempb}&%
5807 \let\babeltempb\@empty
5808 \def\bbl@tempa{#5}&%
5809 \bbl@replace\bbl@tempa{,}{ ,}&% TODO. Ugly trick to preserve {}
5810 \expandafter\bbl@foreach\expandafter{\bbl@tempa}{&%
5811 \bbl@ifsamestring{##1}{remove}&%
5812 {\bbl@add@list\babeltempb{nil}}&%
5813 {\directlua{
5814 local rep = [=[##1]=]
5815 rep = rep:gsub('^%s*(remove)%s*$', 'remove = true')
5816 rep = rep:gsub('^%s*(insert)%s*,', 'insert = true, ')
5817 rep = rep:gsub('(string)%s*=%s*([^%s,]*)', Babel.capture_func)
5818 if #1 == 0 or #1 == 2 then
5819 rep = rep:gsub('(space)%s*=%s*([%d%.]+)%s+([%d%.]+)%s+([%d%.]+)',
5820 'space = {' .. '%2, %3, %4' .. '}')
5821 rep = rep:gsub('(spacefactor)%s*=%s*([%d%.]+)%s+([%d%.]+)%s+([%d%.]+)',
5822 'spacefactor = {' .. '%2, %3, %4' .. '}')
5823 rep = rep:gsub('(kashida)%s*=%s*([^%s,]*)', Babel.capture_kashida)
5824 else
5825 rep = rep:gsub( '(no)%s*=%s*([^%s,]*)', Babel.capture_func)
5826 rep = rep:gsub( '(pre)%s*=%s*([^%s,]*)', Babel.capture_func)
5827 rep = rep:gsub( '(post)%s*=%s*([^%s,]*)', Babel.capture_func)
5828 end
5829 tex.print([[\string\babeltempa{{]] .. rep .. [[}}]])
5830 }}}&%
5831 \bbl@foreach\babeltempb{&%
5832 \bbl@forkv{{##1}}{&%
5833 \in@{,####1,}{,nil,step,data,remove,insert,string,no,pre,&%
5834 no,post,penalty,kashida,space,spacefactor,}&%
5835 \ifin@\else
5836 \bbl@error
5837 {Bad option '####1' in a transform.\\&%
5838 I’ll ignore it but expect more errors}&%
5839 {See the manual for further info.}&%
5840 \fi}}&%
5841 \let\bbl@kv@attribute\relax
5842 \let\bbl@kv@label\relax
5843 \bbl@forkv{#2}{\bbl@csarg\edef{kv@##1}{##2}}&%
5844 \ifx\bbl@kv@attribute\relax\else
5845 \edef\bbl@kv@attribute{\expandafter\bbl@stripslash\bbl@kv@attribute}&%
5846 \fi
5847 \directlua{
5848 local lbkr = Babel.linebreaking.replacements[#1]
5849 local u = unicode.utf8
5850 local id, attr, label
5851 if #1 == 0 or #1 == 2 then
5852 id = \the\csname bbl@id@@#3\endcsname\space
5853 else
5854 id = \the\csname l@#3\endcsname\space
5855 end
5856 \ifx\bbl@kv@attribute\relax
5857 attr = -1
5858 \else
5859 attr = luatexbase.registernumber'\bbl@kv@attribute'
5860 \fi
5861 \ifx\bbl@kv@label\relax\else &% Same refs:
5862 label = [==[\bbl@kv@label]==]

178
5863 \fi
5864 &% Convert pattern:
5865 local patt = string.gsub([==[#4]==], '%s', '')
5866 if #1 == 0 or #1 == 2 then
5867 patt = string.gsub(patt, '|', ' ')
5868 end
5869 if not u.find(patt, '()', nil, true) then
5870 patt = '()' .. patt .. '()'
5871 end
5872 if #1 == 1 then
5873 patt = string.gsub(patt, '%(%)%^', '^()')
5874 patt = string.gsub(patt, '%$%(%)', '()$')
5875 end
5876 patt = u.gsub(patt, '{(.)}',
5877 function (n)
5878 return '%' .. (tonumber(n) and (tonumber(n)+1) or n)
5879 end)
5880 patt = u.gsub(patt, '{(%x%x%x%x+)}',
5881 function (n)
5882 return u.gsub(u.char(tonumber(n, 16)), '(%p)', '%%%1')
5883 end)
5884 lbkr[id] = lbkr[id] or {}
5885 table.insert(lbkr[id],
5886 { label=label, attr=attr, pattern=patt, replace={\babeltempb} })
5887 }&%
5888 \endgroup}
5889 \endgroup
5890 \def\bbl@activateposthyphen{%
5891 \let\bbl@activateposthyphen\relax
5892 \directlua{
5893 require('babel-transforms.lua')
5894 Babel.linebreaking.add_after(Babel.post_hyphenate_replace)
5895 }}
5896 \def\bbl@activateprehyphen{%
5897 \let\bbl@activateprehyphen\relax
5898 \directlua{
5899 require('babel-transforms.lua')
5900 Babel.linebreaking.add_before(Babel.pre_hyphenate_replace)
5901 }}

12.9 Bidi
As a first step, add a handler for bidi and digits (and potentially other processes) just before
luaoftload is applied, which is loaded by default by LATEX. Just in case, consider the possibility it has
not been loaded.
5902 \def\bbl@activate@preotf{%
5903 \let\bbl@activate@preotf\relax % only once
5904 \directlua{
5905 Babel = Babel or {}
5906 %
5907 function Babel.pre_otfload_v(head)
5908 if Babel.numbers and Babel.digits_mapped then
5909 head = Babel.numbers(head)
5910 end
5911 if Babel.bidi_enabled then
5912 head = Babel.bidi(head, false, dir)
5913 end
5914 return head
5915 end
5916 %
5917 function Babel.pre_otfload_h(head, gc, sz, pt, dir)
5918 if Babel.numbers and Babel.digits_mapped then
5919 head = Babel.numbers(head)

179
5920 end
5921 if Babel.bidi_enabled then
5922 head = Babel.bidi(head, false, dir)
5923 end
5924 return head
5925 end
5926 %
5927 luatexbase.add_to_callback('pre_linebreak_filter',
5928 Babel.pre_otfload_v,
5929 'Babel.pre_otfload_v',
5930 luatexbase.priority_in_callback('pre_linebreak_filter',
5931 'luaotfload.node_processor') or nil)
5932 %
5933 luatexbase.add_to_callback('hpack_filter',
5934 Babel.pre_otfload_h,
5935 'Babel.pre_otfload_h',
5936 luatexbase.priority_in_callback('hpack_filter',
5937 'luaotfload.node_processor') or nil)
5938 }}

The basic setup. The output is modified at a very low level to set the \bodydir to the \pagedir. Sadly,
we have to deal with boxes in math with basic, so the \bbl@mathboxdir hack is activated every math
with the package option bidi=.

5939 \ifnum\bbl@bidimode>100 \ifnum\bbl@bidimode<200


5940 \let\bbl@beforeforeign\leavevmode
5941 \AtEndOfPackage{\EnableBabelHook{babel-bidi}}
5942 \RequirePackage{luatexbase}
5943 \bbl@activate@preotf
5944 \directlua{
5945 require('babel-data-bidi.lua')
5946 \ifcase\expandafter\@gobbletwo\the\bbl@bidimode\or
5947 require('babel-bidi-basic.lua')
5948 \or
5949 require('babel-bidi-basic-r.lua')
5950 \fi}
5951 % TODO - to locale_props, not as separate attribute
5952 \newattribute\bbl@attr@dir
5953 \directlua{ Babel.attr_dir = luatexbase.registernumber'bbl@attr@dir' }
5954 % TODO. I don't like it, hackish:
5955 \bbl@exp{\output{\bodydir\pagedir\the\output}}
5956 \AtEndOfPackage{\EnableBabelHook{babel-bidi}}
5957 \fi\fi
5958 \chardef\bbl@thetextdir\z@
5959 \chardef\bbl@thepardir\z@
5960 \def\bbl@getluadir#1{%
5961 \directlua{
5962 if tex.#1dir == 'TLT' then
5963 tex.sprint('0')
5964 elseif tex.#1dir == 'TRT' then
5965 tex.sprint('1')
5966 end}}
5967 \def\bbl@setluadir#1#2#3{% 1=text/par.. 2=\textdir.. 3=0 lr/1 rl
5968 \ifcase#3\relax
5969 \ifcase\bbl@getluadir{#1}\relax\else
5970 #2 TLT\relax
5971 \fi
5972 \else
5973 \ifcase\bbl@getluadir{#1}\relax
5974 #2 TRT\relax
5975 \fi
5976 \fi}
5977 \def\bbl@thedir{0}
5978 \def\bbl@textdir#1{%

180
5979 \bbl@setluadir{text}\textdir{#1}%
5980 \chardef\bbl@thetextdir#1\relax
5981 \edef\bbl@thedir{\the\numexpr\bbl@thepardir*3+#1}%
5982 \setattribute\bbl@attr@dir{\numexpr\bbl@thepardir*3+#1}}
5983 \def\bbl@pardir#1{%
5984 \bbl@setluadir{par}\pardir{#1}%
5985 \chardef\bbl@thepardir#1\relax}
5986 \def\bbl@bodydir{\bbl@setluadir{body}\bodydir}
5987 \def\bbl@pagedir{\bbl@setluadir{page}\pagedir}
5988 \def\bbl@dirparastext{\pardir\the\textdir\relax}% %%%%
5989 %
5990 \ifnum\bbl@bidimode>\z@
5991 \def\bbl@insidemath{0}%
5992 \def\bbl@everymath{\def\bbl@insidemath{1}}
5993 \def\bbl@everydisplay{\def\bbl@insidemath{2}}
5994 \frozen@everymath\expandafter{%
5995 \expandafter\bbl@everymath\the\frozen@everymath}
5996 \frozen@everydisplay\expandafter{%
5997 \expandafter\bbl@everydisplay\the\frozen@everydisplay}
5998 \AtBeginDocument{
5999 \directlua{
6000 function Babel.math_box_dir(head)
6001 if not (token.get_macro('bbl@insidemath') == '0') then
6002 if Babel.hlist_has_bidi(head) then
6003 local d = node.new(node.id'dir')
6004 d.dir = '+TRT'
6005 node.insert_before(head, node.has_glyph(head), d)
6006 for item in node.traverse(head) do
6007 node.set_attribute(item,
6008 Babel.attr_dir, token.get_macro('bbl@thedir'))
6009 end
6010 end
6011 end
6012 return head
6013 end
6014 luatexbase.add_to_callback("hpack_filter", Babel.math_box_dir,
6015 "Babel.math_box_dir", 0)
6016 }}%
6017 \fi

12.10 Layout
Unlike xetex, luatex requires only minimal changes for right-to-left layouts, particularly in
monolingual documents (the engine itself reverses boxes – including column order or headings –,
margins, etc.) with bidi=basic, without having to patch almost any macro where text direction is
relevant.
\@hangfrom is useful in many contexts and it is redefined always with the layout option.
There are, however, a number of issues when the text direction is not the same as the box direction
(as set by \bodydir), and when \parbox and \hangindent are involved. Fortunately, latest releases
of luatex simplify a lot the solution with \shapemode.
With the issue #15 I realized commands are best patched, instead of redefined. With a few lines, a
modification could be applied to several classes and packages. Now, tabular seems to work (at least
in simple cases) with array, tabularx, hhline, colortbl, longtable, booktabs, etc. However, dcolumn still
fails.
6018 \bbl@trace{Redefinitions for bidi layout}
6019 %
6020 hh∗More package optionsii ≡
6021 \chardef\bbl@eqnpos\z@
6022 \DeclareOption{leqno}{\chardef\bbl@eqnpos\@ne}
6023 \DeclareOption{fleqn}{\chardef\bbl@eqnpos\tw@}
6024 hh/More package optionsii
6025 %
6026 \def\BabelNoAMSMath{\let\bbl@noamsmath\relax}

181
6027 \ifnum\bbl@bidimode>\z@
6028 \ifx\matheqdirmode\@undefined\else
6029 \matheqdirmode\@ne
6030 \fi
6031 \let\bbl@eqnodir\relax
6032 \def\bbl@eqdel{()}
6033 \def\bbl@eqnum{%
6034 {\normalfont\normalcolor
6035 \expandafter\@firstoftwo\bbl@eqdel
6036 \theequation
6037 \expandafter\@secondoftwo\bbl@eqdel}}
6038 \def\bbl@puteqno#1{\eqno\hbox{#1}}
6039 \def\bbl@putleqno#1{\leqno\hbox{#1}}
6040 \def\bbl@eqno@flip#1{%
6041 \ifdim\predisplaysize=-\maxdimen
6042 \eqno
6043 \hb@[email protected]{\hb@xt@\displaywidth{\hss{#1}}\hss}%
6044 \else
6045 \leqno\hbox{#1}%
6046 \fi}
6047 \def\bbl@leqno@flip#1{%
6048 \ifdim\predisplaysize=-\maxdimen
6049 \leqno
6050 \hb@[email protected]{\hss\hb@xt@\displaywidth{{#1}\hss}}%
6051 \else
6052 \eqno\hbox{#1}%
6053 \fi}
6054 \AtBeginDocument{%
6055 \ifx\maketag@@@\@undefined % Normal equation, eqnarray
6056 \AddToHook{env/equation/begin}{%
6057 \ifnum\bbl@thetextdir>\z@
6058 \let\@eqnnum\bbl@eqnum
6059 \edef\bbl@eqnodir{\noexpand\bbl@textdir{\the\bbl@thetextdir}}%
6060 \chardef\bbl@thetextdir\z@
6061 \bbl@add\normalfont{\bbl@eqnodir}%
6062 \ifcase\bbl@eqnpos
6063 \let\bbl@puteqno\bbl@eqno@flip
6064 \or
6065 \let\bbl@puteqno\bbl@leqno@flip
6066 \fi
6067 \fi}%
6068 \ifnum\bbl@eqnpos=\tw@\else
6069 \def\endequation{\bbl@puteqno{\@eqnnum}$$\@ignoretrue}%
6070 \fi
6071 \AddToHook{env/eqnarray/begin}{%
6072 \ifnum\bbl@thetextdir>\z@
6073 \edef\bbl@eqnodir{\noexpand\bbl@textdir{\the\bbl@thetextdir}}%
6074 \chardef\bbl@thetextdir\z@
6075 \bbl@add\normalfont{\bbl@eqnodir}%
6076 \ifnum\bbl@eqnpos=\@ne
6077 \def\@eqnnum{%
6078 \setbox\z@\hbox{\bbl@eqnum}%
6079 \hbox to0.01pt{\hss\hbox to\displaywidth{\box\z@\hss}}}%
6080 \else
6081 \let\@eqnnum\bbl@eqnum
6082 \fi
6083 \fi}
6084 % Hack. YA luatex bug?:
6085 \expandafter\bbl@sreplace\csname] \endcsname{$$}{\eqno\kern.001pt$$}%
6086 \else % amstex
6087 \ifx\bbl@noamsmath\@undefined
6088 \bbl@exp{% Hack to hide maybe undefined conditionals:
6089 \chardef\bbl@eqnpos=0%

182
6090 \<iftagsleft@>1\<else>\<if@fleqn>2\<fi>\<fi>\relax}%
6091 \ifnum\bbl@eqnpos=\@ne
6092 \let\bbl@ams@lap\hbox
6093 \else
6094 \let\bbl@ams@lap\llap
6095 \fi
6096 \ExplSyntaxOn
6097 \bbl@sreplace\intertext@{\normalbaselines}%
6098 {\normalbaselines
6099 \ifx\bbl@eqnodir\relax\else\bbl@pardir\@ne\bbl@eqnodir\fi}%
6100 \ExplSyntaxOff
6101 \def\bbl@ams@tagbox#1#2{#1{\bbl@eqnodir#2}}% #1=hbox|@lap|flip
6102 \ifx\bbl@ams@lap\hbox % leqno
6103 \def\bbl@ams@flip#1{%
6104 \hbox to 0.01pt{\hss\hbox to\displaywidth{{#1}\hss}}}%
6105 \else % eqno
6106 \def\bbl@ams@flip#1{%
6107 \hbox to 0.01pt{\hbox to\displaywidth{\hss{#1}}\hss}}%
6108 \fi
6109 \def\bbl@ams@preset#1{%
6110 \ifnum\bbl@thetextdir>\z@
6111 \edef\bbl@eqnodir{\noexpand\bbl@textdir{\the\bbl@thetextdir}}%
6112 \bbl@sreplace\textdef@{\hbox}{\bbl@ams@tagbox\hbox}%
6113 \bbl@sreplace\maketag@@@{\hbox}{\bbl@ams@tagbox#1}%
6114 \fi}%
6115 \ifnum\bbl@eqnpos=\tw@\else
6116 \def\bbl@ams@equation{%
6117 \ifnum\bbl@thetextdir>\z@
6118 \edef\bbl@eqnodir{\noexpand\bbl@textdir{\the\bbl@thetextdir}}%
6119 \chardef\bbl@thetextdir\z@
6120 \bbl@add\normalfont{\bbl@eqnodir}%
6121 \ifcase\bbl@eqnpos
6122 \def\veqno##1##2{\bbl@eqno@flip{##1##2}}%
6123 \or
6124 \def\veqno##1##2{\bbl@leqno@flip{##1##2}}%
6125 \fi
6126 \fi}%
6127 \AddToHook{env/equation/begin}{\bbl@ams@equation}%
6128 \AddToHook{env/equation*/begin}{\bbl@ams@equation}%
6129 \fi
6130 \AddToHook{env/cases/begin}{\bbl@ams@preset\bbl@ams@lap}%
6131 \AddToHook{env/multline/begin}{\bbl@ams@preset\hbox}%
6132 \AddToHook{env/gather/begin}{\bbl@ams@preset\bbl@ams@lap}%
6133 \AddToHook{env/gather*/begin}{\bbl@ams@preset\bbl@ams@lap}%
6134 \AddToHook{env/align/begin}{\bbl@ams@preset\bbl@ams@lap}%
6135 \AddToHook{env/align*/begin}{\bbl@ams@preset\bbl@ams@lap}%
6136 \AddToHook{env/eqnalign/begin}{\bbl@ams@preset\hbox}%
6137 % Hackish, for proper alignment. Don’t ask me why it works!:
6138 \bbl@exp{% Avoid a 'visible' conditional
6139 \\\AddToHook{env/align*/end}{\<iftag@>\<else>\\\tag*{}\<fi>}}%
6140 \AddToHook{env/flalign/begin}{\bbl@ams@preset\hbox}%
6141 \AddToHook{env/split/before}{%
6142 \ifnum\bbl@thetextdir>\z@
6143 \bbl@ifsamestring\@currenvir{equation}%
6144 {\ifx\bbl@ams@lap\hbox % leqno
6145 \def\bbl@ams@flip#1{%
6146 \hbox to 0.01pt{\hbox to\displaywidth{{#1}\hss}\hss}}%
6147 \else
6148 \def\bbl@ams@flip#1{%
6149 \hbox to 0.01pt{\hss\hbox to\displaywidth{\hss{#1}}}}%
6150 \fi}%
6151 {}%
6152 \fi}%

183
6153 \fi
6154 \fi}
6155 \fi
6156 \ifx\bbl@opt@layout\@nnil\endinput\fi % if no layout
6157 \ifnum\bbl@bidimode>\z@
6158 \def\bbl@nextfake#1{% non-local changes, use always inside a group!
6159 \bbl@exp{%
6160 \def\\\bbl@insidemath{0}%
6161 \mathdir\the\bodydir
6162 #1% Once entered in math, set boxes to restore values
6163 \<ifmmode>%
6164 \everyvbox{%
6165 \the\everyvbox
6166 \bodydir\the\bodydir
6167 \mathdir\the\mathdir
6168 \everyhbox{\the\everyhbox}%
6169 \everyvbox{\the\everyvbox}}%
6170 \everyhbox{%
6171 \the\everyhbox
6172 \bodydir\the\bodydir
6173 \mathdir\the\mathdir
6174 \everyhbox{\the\everyhbox}%
6175 \everyvbox{\the\everyvbox}}%
6176 \<fi>}}%
6177 \def\@hangfrom#1{%
6178 \setbox\@tempboxa\hbox{{#1}}%
6179 \hangindent\wd\@tempboxa
6180 \ifnum\bbl@getluadir{page}=\bbl@getluadir{par}\else
6181 \shapemode\@ne
6182 \fi
6183 \noindent\box\@tempboxa}
6184 \fi
6185 \IfBabelLayout{tabular}
6186 {\let\bbl@OL@@tabular\@tabular
6187 \bbl@replace\@tabular{$}{\bbl@nextfake$}%
6188 \let\bbl@NL@@tabular\@tabular
6189 \AtBeginDocument{%
6190 \ifx\bbl@NL@@tabular\@tabular\else
6191 \bbl@replace\@tabular{$}{\bbl@nextfake$}%
6192 \let\bbl@NL@@tabular\@tabular
6193 \fi}}
6194 {}
6195 \IfBabelLayout{lists}
6196 {\let\bbl@OL@list\list
6197 \bbl@sreplace\list{\parshape}{\bbl@listparshape}%
6198 \let\bbl@NL@list\list
6199 \def\bbl@listparshape#1#2#3{%
6200 \parshape #1 #2 #3 %
6201 \ifnum\bbl@getluadir{page}=\bbl@getluadir{par}\else
6202 \shapemode\tw@
6203 \fi}}
6204 {}
6205 \IfBabelLayout{graphics}
6206 {\let\bbl@pictresetdir\relax
6207 \def\bbl@pictsetdir#1{%
6208 \ifcase\bbl@thetextdir
6209 \let\bbl@pictresetdir\relax
6210 \else
6211 \ifcase#1\bodydir TLT % Remember this sets the inner boxes
6212 \or\textdir TLT
6213 \else\bodydir TLT \textdir TLT
6214 \fi
6215 % \(text|par)dir required in pgf:

184
6216 \def\bbl@pictresetdir{\bodydir TRT\pardir TRT\textdir TRT\relax}%
6217 \fi}%
6218 \AddToHook{env/picture/begin}{\bbl@pictsetdir\tw@}%
6219 \directlua{
6220 Babel.get_picture_dir = true
6221 Babel.picture_has_bidi = 0
6222 %
6223 function Babel.picture_dir (head)
6224 if not Babel.get_picture_dir then return head end
6225 if Babel.hlist_has_bidi(head) then
6226 Babel.picture_has_bidi = 1
6227 end
6228 return head
6229 end
6230 luatexbase.add_to_callback("hpack_filter", Babel.picture_dir,
6231 "Babel.picture_dir")
6232 }%
6233 \AtBeginDocument{%
6234 \def\LS@rot{%
6235 \setbox\@outputbox\vbox{%
6236 \hbox dir TLT{\rotatebox{90}{\box\@outputbox}}}}%
6237 \long\def\put(#1,#2)#3{%
6238 \@killglue
6239 % Try:
6240 \ifx\bbl@pictresetdir\relax
6241 \def\bbl@tempc{0}%
6242 \else
6243 \directlua{
6244 Babel.get_picture_dir = true
6245 Babel.picture_has_bidi = 0
6246 }%
6247 \setbox\z@\hb@xt@\z@{%
6248 \@defaultunitsset\@tempdimc{#1}\unitlength
6249 \kern\@tempdimc
6250 #3\hss}% TODO: #3 executed twice (below). That’s bad.
6251 \edef\bbl@tempc{\directlua{tex.print(Babel.picture_has_bidi)}}%
6252 \fi
6253 % Do:
6254 \@defaultunitsset\@tempdimc{#2}\unitlength
6255 \raise\@tempdimc\hb@xt@\z@{%
6256 \@defaultunitsset\@tempdimc{#1}\unitlength
6257 \kern\@tempdimc
6258 {\ifnum\bbl@tempc>\z@\bbl@pictresetdir\fi#3}\hss}%
6259 \ignorespaces}%
6260 \MakeRobust\put}%
6261 \AtBeginDocument
6262 {\AddToHook{cmd/diagbox@pict/before}{\let\bbl@pictsetdir\@gobble}%
6263 \ifx\pgfpicture\@undefined\else % TODO. Allow deactivate?
6264 \AddToHook{env/pgfpicture/begin}{\bbl@pictsetdir\@ne}%
6265 \bbl@add\pgfinterruptpicture{\bbl@pictresetdir}%
6266 \bbl@add\pgfsys@beginpicture{\bbl@pictsetdir\z@}%
6267 \fi
6268 \ifx\tikzpicture\@undefined\else
6269 \AddToHook{env/tikzpicture/begin}{\bbl@pictsetdir\z@}%
6270 \bbl@add\tikz@atbegin@node{\bbl@pictresetdir}%
6271 \bbl@sreplace\tikz{\begingroup}{\begingroup\bbl@pictsetdir\tw@}%
6272 \fi
6273 \ifx\tcolorbox\@undefined\else
6274 \def\tcb@drawing@env@begin{%
6275 \csname tcb@before@\tcb@split@state\endcsname
6276 \bbl@pictsetdir\tw@
6277 \begin{\kvtcb@graphenv}%
6278 \tcb@bbdraw%

185
6279 \tcb@apply@graph@patches
6280 }%
6281 \def\tcb@drawing@env@end{%
6282 \end{\kvtcb@graphenv}%
6283 \bbl@pictresetdir
6284 \csname tcb@after@\tcb@split@state\endcsname
6285 }%
6286 \fi
6287 }}
6288 {}

Implicitly reverses sectioning labels in bidi=basic-r, because the full stop is not in contact with L
numbers any more. I think there must be a better way. Assumes bidi=basic, but there are some
additional readjustments for bidi=default.

6289 \IfBabelLayout{counters}%
6290 {\let\bbl@OL@@textsuperscript\@textsuperscript
6291 \bbl@sreplace\@textsuperscript{\m@th}{\m@th\mathdir\pagedir}%
6292 \let\bbl@latinarabic=\@arabic
6293 \let\bbl@OL@@arabic\@arabic
6294 \def\@arabic#1{\babelsublr{\bbl@latinarabic#1}}%
6295 \@ifpackagewith{babel}{bidi=default}%
6296 {\let\bbl@asciiroman=\@roman
6297 \let\bbl@OL@@roman\@roman
6298 \def\@roman#1{\babelsublr{\ensureascii{\bbl@asciiroman#1}}}%
6299 \let\bbl@asciiRoman=\@Roman
6300 \let\bbl@OL@@roman\@Roman
6301 \def\@Roman#1{\babelsublr{\ensureascii{\bbl@asciiRoman#1}}}%
6302 \let\bbl@OL@labelenumii\labelenumii
6303 \def\labelenumii{)\theenumii(}%
6304 \let\bbl@OL@p@enumiii\p@enumiii
6305 \def\p@enumiii{\p@enumii)\theenumii(}}{}}{}
6306 hhFootnote changesii
6307 \IfBabelLayout{footnotes}%
6308 {\let\bbl@OL@footnote\footnote
6309 \BabelFootnote\footnote\languagename{}{}%
6310 \BabelFootnote\localfootnote\languagename{}{}%
6311 \BabelFootnote\mainfootnote{}{}{}}
6312 {}

Some LATEX macros use internally the math mode for text formatting. They have very little in
common and are grouped here, as a single option.

6313 \IfBabelLayout{extras}%
6314 {\let\bbl@OL@underline\underline
6315 \bbl@sreplace\underline{$\@@underline}{\bbl@nextfake$\@@underline}%
6316 \let\bbl@OL@LaTeX2e\LaTeX2e
6317 \DeclareRobustCommand{\LaTeXe}{\mbox{\m@th
6318 \if b\expandafter\@car\f@series\@nil\boldmath\fi
6319 \babelsublr{%
6320 \LaTeX\kern.15em2\bbl@nextfake$_{\textstyle\varepsilon}$}}}}
6321 {}
6322 h/luatexi

12.11 Lua: transforms


After declaring the table containing the patterns with their replacements, we define some auxiliary
functions: str_to_nodes converts the string returned by a function to a node list, taking the node at
base as a model (font, language, etc.); fetch_word fetches a series of glyphs and discretionaries,
which pattern is matched against (if there is a match, it is called again before trying other patterns,
and this is very likely the main bottleneck).
post_hyphenate_replace is the callback applied after lang.hyphenate. This means the automatic
hyphenation points are known. As empty captures return a byte position (as explained in the luatex
manual), we must convert it to a utf8 position. With first, the last byte can be the leading byte in a
utf8 sequence, so we just remove it and add 1 to the resulting length. With last we must take into

186
account the capture position points to the next character. Here word_head points to the starting node
of the text to be matched.

6323 h∗transformsi
6324 Babel.linebreaking.replacements = {}
6325 Babel.linebreaking.replacements[0] = {} -- pre
6326 Babel.linebreaking.replacements[1] = {} -- post
6327 Babel.linebreaking.replacements[2] = {} -- post-line WIP
6328
6329 -- Discretionaries contain strings as nodes
6330 function Babel.str_to_nodes(fn, matches, base)
6331 local n, head, last
6332 if fn == nil then return nil end
6333 for s in string.utfvalues(fn(matches)) do
6334 if base.id == 7 then
6335 base = base.replace
6336 end
6337 n = node.copy(base)
6338 n.char = s
6339 if not head then
6340 head = n
6341 else
6342 last.next = n
6343 end
6344 last = n
6345 end
6346 return head
6347 end
6348
6349 Babel.fetch_subtext = {}
6350
6351 Babel.ignore_pre_char = function(node)
6352 return (node.lang == Babel.nohyphenation)
6353 end
6354
6355 -- Merging both functions doesn't seen feasible, because there are too
6356 -- many differences.
6357 Babel.fetch_subtext[0] = function(head)
6358 local word_string = ''
6359 local word_nodes = {}
6360 local lang
6361 local item = head
6362 local inmath = false
6363
6364 while item do
6365
6366 if item.id == 11 then
6367 inmath = (item.subtype == 0)
6368 end
6369
6370 if inmath then
6371 -- pass
6372
6373 elseif item.id == 29 then
6374 local locale = node.get_attribute(item, Babel.attr_locale)
6375
6376 if lang == locale or lang == nil then
6377 lang = lang or locale
6378 if Babel.ignore_pre_char(item) then
6379 word_string = word_string .. Babel.us_char
6380 else
6381 word_string = word_string .. unicode.utf8.char(item.char)
6382 end
6383 word_nodes[#word_nodes+1] = item

187
6384 else
6385 break
6386 end
6387
6388 elseif item.id == 12 and item.subtype == 13 then
6389 word_string = word_string .. ' '
6390 word_nodes[#word_nodes+1] = item
6391
6392 -- Ignore leading unrecognized nodes, too.
6393 elseif word_string ~= '' then
6394 word_string = word_string .. Babel.us_char
6395 word_nodes[#word_nodes+1] = item -- Will be ignored
6396 end
6397
6398 item = item.next
6399 end
6400
6401 -- Here and above we remove some trailing chars but not the
6402 -- corresponding nodes. But they aren't accessed.
6403 if word_string:sub(-1) == ' ' then
6404 word_string = word_string:sub(1,-2)
6405 end
6406 word_string = unicode.utf8.gsub(word_string, Babel.us_char .. '+$', '')
6407 return word_string, word_nodes, item, lang
6408 end
6409
6410 Babel.fetch_subtext[1] = function(head)
6411 local word_string = ''
6412 local word_nodes = {}
6413 local lang
6414 local item = head
6415 local inmath = false
6416
6417 while item do
6418
6419 if item.id == 11 then
6420 inmath = (item.subtype == 0)
6421 end
6422
6423 if inmath then
6424 -- pass
6425
6426 elseif item.id == 29 then
6427 if item.lang == lang or lang == nil then
6428 if (item.char ~= 124) and (item.char ~= 61) then -- not =, not |
6429 lang = lang or item.lang
6430 word_string = word_string .. unicode.utf8.char(item.char)
6431 word_nodes[#word_nodes+1] = item
6432 end
6433 else
6434 break
6435 end
6436
6437 elseif item.id == 7 and item.subtype == 2 then
6438 word_string = word_string .. '='
6439 word_nodes[#word_nodes+1] = item
6440
6441 elseif item.id == 7 and item.subtype == 3 then
6442 word_string = word_string .. '|'
6443 word_nodes[#word_nodes+1] = item
6444
6445 -- (1) Go to next word if nothing was found, and (2) implicitly
6446 -- remove leading USs.

188
6447 elseif word_string == '' then
6448 -- pass
6449
6450 -- This is the responsible for splitting by words.
6451 elseif (item.id == 12 and item.subtype == 13) then
6452 break
6453
6454 else
6455 word_string = word_string .. Babel.us_char
6456 word_nodes[#word_nodes+1] = item -- Will be ignored
6457 end
6458
6459 item = item.next
6460 end
6461
6462 word_string = unicode.utf8.gsub(word_string, Babel.us_char .. '+$', '')
6463 return word_string, word_nodes, item, lang
6464 end
6465
6466 function Babel.pre_hyphenate_replace(head)
6467 Babel.hyphenate_replace(head, 0)
6468 end
6469
6470 function Babel.post_hyphenate_replace(head)
6471 Babel.hyphenate_replace(head, 1)
6472 end
6473
6474 Babel.us_char = string.char(31)
6475
6476 function Babel.hyphenate_replace(head, mode)
6477 local u = unicode.utf8
6478 local lbkr = Babel.linebreaking.replacements[mode]
6479 if mode == 2 then mode = 0 end -- WIP
6480
6481 local word_head = head
6482
6483 while true do -- for each subtext block
6484
6485 local w, w_nodes, nw, lang = Babel.fetch_subtext[mode](word_head)
6486
6487 if Babel.debug then
6488 print()
6489 print((mode == 0) and '@@@@<' or '@@@@>', w)
6490 end
6491
6492 if nw == nil and w == '' then break end
6493
6494 if not lang then goto next end
6495 if not lbkr[lang] then goto next end
6496
6497 -- For each saved (pre|post)hyphenation. TODO. Reconsider how
6498 -- loops are nested.
6499 for k=1, #lbkr[lang] do
6500 local p = lbkr[lang][k].pattern
6501 local r = lbkr[lang][k].replace
6502 local attr = lbkr[lang][k].attr or -1
6503
6504 if Babel.debug then
6505 print('*****', p, mode)
6506 end
6507
6508 -- This variable is set in some cases below to the first *byte*
6509 -- after the match, either as found by u.match (faster) or the

189
6510 -- computed position based on sc if w has changed.
6511 local last_match = 0
6512 local step = 0
6513
6514 -- For every match.
6515 while true do
6516 if Babel.debug then
6517 print('=====')
6518 end
6519 local new -- used when inserting and removing nodes
6520
6521 local matches = { u.match(w, p, last_match) }
6522
6523 if #matches < 2 then break end
6524
6525 -- Get and remove empty captures (with ()'s, which return a
6526 -- number with the position), and keep actual captures
6527 -- (from (...)), if any, in matches.
6528 local first = table.remove(matches, 1)
6529 local last = table.remove(matches, #matches)
6530 -- Non re-fetched substrings may contain \31, which separates
6531 -- subsubstrings.
6532 if string.find(w:sub(first, last-1), Babel.us_char) then break end
6533
6534 local save_last = last -- with A()BC()D, points to D
6535
6536 -- Fix offsets, from bytes to unicode. Explained above.
6537 first = u.len(w:sub(1, first-1)) + 1
6538 last = u.len(w:sub(1, last-1)) -- now last points to C
6539
6540 -- This loop stores in a small table the nodes
6541 -- corresponding to the pattern. Used by 'data' to provide a
6542 -- predictable behavior with 'insert' (w_nodes is modified on
6543 -- the fly), and also access to 'remove'd nodes.
6544 local sc = first-1 -- Used below, too
6545 local data_nodes = {}
6546
6547 local enabled = true
6548 for q = 1, last-first+1 do
6549 data_nodes[q] = w_nodes[sc+q]
6550 if enabled
6551 and attr > -1
6552 and not node.has_attribute(data_nodes[q], attr)
6553 then
6554 enabled = false
6555 end
6556 end
6557
6558 -- This loop traverses the matched substring and takes the
6559 -- corresponding action stored in the replacement list.
6560 -- sc = the position in substr nodes / string
6561 -- rc = the replacement table index
6562 local rc = 0
6563
6564 while rc < last-first+1 do -- for each replacement
6565 if Babel.debug then
6566 print('.....', rc + 1)
6567 end
6568 sc = sc + 1
6569 rc = rc + 1
6570
6571 if Babel.debug then
6572 Babel.debug_hyph(w, w_nodes, sc, first, last, last_match)

190
6573 local ss = ''
6574 for itt in node.traverse(head) do
6575 if itt.id == 29 then
6576 ss = ss .. unicode.utf8.char(itt.char)
6577 else
6578 ss = ss .. '{' .. itt.id .. '}'
6579 end
6580 end
6581 print('*****************', ss)
6582
6583 end
6584
6585 local crep = r[rc]
6586 local item = w_nodes[sc]
6587 local item_base = item
6588 local placeholder = Babel.us_char
6589 local d
6590
6591 if crep and crep.data then
6592 item_base = data_nodes[crep.data]
6593 end
6594
6595 if crep then
6596 step = crep.step or 0
6597 end
6598
6599 if (not enabled) or (crep and next(crep) == nil) then -- = {}
6600 last_match = save_last -- Optimization
6601 goto next
6602
6603 elseif crep == nil or crep.remove then
6604 node.remove(head, item)
6605 table.remove(w_nodes, sc)
6606 w = u.sub(w, 1, sc-1) .. u.sub(w, sc+1)
6607 sc = sc - 1 -- Nothing has been inserted.
6608 last_match = utf8.offset(w, sc+1+step)
6609 goto next
6610
6611 elseif crep and crep.kashida then -- Experimental
6612 node.set_attribute(item,
6613 Babel.attr_kashida,
6614 crep.kashida)
6615 last_match = utf8.offset(w, sc+1+step)
6616 goto next
6617
6618 elseif crep and crep.string then
6619 local str = crep.string(matches)
6620 if str == '' then -- Gather with nil
6621 node.remove(head, item)
6622 table.remove(w_nodes, sc)
6623 w = u.sub(w, 1, sc-1) .. u.sub(w, sc+1)
6624 sc = sc - 1 -- Nothing has been inserted.
6625 else
6626 local loop_first = true
6627 for s in string.utfvalues(str) do
6628 d = node.copy(item_base)
6629 d.char = s
6630 if loop_first then
6631 loop_first = false
6632 head, new = node.insert_before(head, item, d)
6633 if sc == 1 then
6634 word_head = head
6635 end

191
6636 w_nodes[sc] = d
6637 w = u.sub(w, 1, sc-1) .. u.char(s) .. u.sub(w, sc+1)
6638 else
6639 sc = sc + 1
6640 head, new = node.insert_before(head, item, d)
6641 table.insert(w_nodes, sc, new)
6642 w = u.sub(w, 1, sc-1) .. u.char(s) .. u.sub(w, sc)
6643 end
6644 if Babel.debug then
6645 print('.....', 'str')
6646 Babel.debug_hyph(w, w_nodes, sc, first, last, last_match)
6647 end
6648 end -- for
6649 node.remove(head, item)
6650 end -- if ''
6651 last_match = utf8.offset(w, sc+1+step)
6652 goto next
6653
6654 elseif mode == 1 and crep and (crep.pre or crep.no or crep.post) then
6655 d = node.new(7, 0) -- (disc, discretionary)
6656 d.pre = Babel.str_to_nodes(crep.pre, matches, item_base)
6657 d.post = Babel.str_to_nodes(crep.post, matches, item_base)
6658 d.replace = Babel.str_to_nodes(crep.no, matches, item_base)
6659 d.attr = item_base.attr
6660 if crep.pre == nil then -- TeXbook p96
6661 d.penalty = crep.penalty or tex.hyphenpenalty
6662 else
6663 d.penalty = crep.penalty or tex.exhyphenpenalty
6664 end
6665 placeholder = '|'
6666 head, new = node.insert_before(head, item, d)
6667
6668 elseif mode == 0 and crep and (crep.pre or crep.no or crep.post) then
6669 -- ERROR
6670
6671 elseif crep and crep.penalty then
6672 d = node.new(14, 0) -- (penalty, userpenalty)
6673 d.attr = item_base.attr
6674 d.penalty = crep.penalty
6675 head, new = node.insert_before(head, item, d)
6676
6677 elseif crep and crep.space then
6678 -- 655360 = 10 pt = 10 * 65536 sp
6679 d = node.new(12, 13) -- (glue, spaceskip)
6680 local quad = font.getfont(item_base.font).size or 655360
6681 node.setglue(d, crep.space[1] * quad,
6682 crep.space[2] * quad,
6683 crep.space[3] * quad)
6684 if mode == 0 then
6685 placeholder = ' '
6686 end
6687 head, new = node.insert_before(head, item, d)
6688
6689 elseif crep and crep.spacefactor then
6690 d = node.new(12, 13) -- (glue, spaceskip)
6691 local base_font = font.getfont(item_base.font)
6692 node.setglue(d,
6693 crep.spacefactor[1] * base_font.parameters['space'],
6694 crep.spacefactor[2] * base_font.parameters['space_stretch'],
6695 crep.spacefactor[3] * base_font.parameters['space_shrink'])
6696 if mode == 0 then
6697 placeholder = ' '
6698 end

192
6699 head, new = node.insert_before(head, item, d)
6700
6701 elseif mode == 0 and crep and crep.space then
6702 -- ERROR
6703
6704 end -- ie replacement cases
6705
6706 -- Shared by disc, space and penalty.
6707 if sc == 1 then
6708 word_head = head
6709 end
6710 if crep.insert then
6711 w = u.sub(w, 1, sc-1) .. placeholder .. u.sub(w, sc)
6712 table.insert(w_nodes, sc, new)
6713 last = last + 1
6714 else
6715 w_nodes[sc] = d
6716 node.remove(head, item)
6717 w = u.sub(w, 1, sc-1) .. placeholder .. u.sub(w, sc+1)
6718 end
6719
6720 last_match = utf8.offset(w, sc+1+step)
6721
6722 ::next::
6723
6724 end -- for each replacement
6725
6726 if Babel.debug then
6727 print('.....', '/')
6728 Babel.debug_hyph(w, w_nodes, sc, first, last, last_match)
6729 end
6730
6731 end -- for match
6732
6733 end -- for patterns
6734
6735 ::next::
6736 word_head = nw
6737 end -- for substring
6738 return head
6739 end
6740
6741 -- This table stores capture maps, numbered consecutively
6742 Babel.capture_maps = {}
6743
6744 -- The following functions belong to the next macro
6745 function Babel.capture_func(key, cap)
6746 local ret = "[[" .. cap:gsub('{([0-9])}', "]]..m[%1]..[[") .. "]]"
6747 local cnt
6748 local u = unicode.utf8
6749 ret, cnt = ret:gsub('{([0-9])|([^|]+)|(.-)}', Babel.capture_func_map)
6750 if cnt == 0 then
6751 ret = u.gsub(ret, '{(%x%x%x%x+)}',
6752 function (n)
6753 return u.char(tonumber(n, 16))
6754 end)
6755 end
6756 ret = ret:gsub("%[%[%]%]%.%.", '')
6757 ret = ret:gsub("%.%.%[%[%]%]", '')
6758 return key .. [[=function(m) return ]] .. ret .. [[ end]]
6759 end
6760
6761 function Babel.capt_map(from, mapno)

193
6762 return Babel.capture_maps[mapno][from] or from
6763 end
6764
6765 -- Handle the {n|abc|ABC} syntax in captures
6766 function Babel.capture_func_map(capno, from, to)
6767 local u = unicode.utf8
6768 from = u.gsub(from, '{(%x%x%x%x+)}',
6769 function (n)
6770 return u.char(tonumber(n, 16))
6771 end)
6772 to = u.gsub(to, '{(%x%x%x%x+)}',
6773 function (n)
6774 return u.char(tonumber(n, 16))
6775 end)
6776 local froms = {}
6777 for s in string.utfcharacters(from) do
6778 table.insert(froms, s)
6779 end
6780 local cnt = 1
6781 table.insert(Babel.capture_maps, {})
6782 local mlen = table.getn(Babel.capture_maps)
6783 for s in string.utfcharacters(to) do
6784 Babel.capture_maps[mlen][froms[cnt]] = s
6785 cnt = cnt + 1
6786 end
6787 return "]]..Babel.capt_map(m[" .. capno .. "]," ..
6788 (mlen) .. ").." .. "[["
6789 end
6790
6791 -- Create/Extend reversed sorted list of kashida weights:
6792 function Babel.capture_kashida(key, wt)
6793 wt = tonumber(wt)
6794 if Babel.kashida_wts then
6795 for p, q in ipairs(Babel.kashida_wts) do
6796 if wt == q then
6797 break
6798 elseif wt > q then
6799 table.insert(Babel.kashida_wts, p, wt)
6800 break
6801 elseif table.getn(Babel.kashida_wts) == p then
6802 table.insert(Babel.kashida_wts, wt)
6803 end
6804 end
6805 else
6806 Babel.kashida_wts = { wt }
6807 end
6808 return 'kashida = ' .. wt
6809 end
6810 h/transformsi

12.12 Lua: Auto bidi with basic and basic-r


The file babel-data-bidi.lua currently only contains data. It is a large and boring file and it is not
shown here (see the generated file), but here is a sample:

[0x25]={d='et'},
[0x26]={d='on'},
[0x27]={d='on'},
[0x28]={d='on', m=0x29},
[0x29]={d='on', m=0x28},
[0x2A]={d='on'},
[0x2B]={d='es'},

194
[0x2C]={d='cs'},

For the meaning of these codes, see the Unicode standard.


Now the basic-r bidi mode. One of the aims is to implement a fast and simple bidi algorithm, with a
single loop. I managed to do it for R texts, with a second smaller loop for a special case. The code is
still somewhat chaotic, but its behavior is essentially correct. I cannot resist copying the following
text from Emacs bidi.c (which also attempts to implement the bidi algorithm with a single loop):

Arrrgh!! The UAX#9 algorithm is too deeply entrenched in the assumption of batch-style
processing [...]. May the fleas of a thousand camels infest the armpits of those who design
supposedly general-purpose algorithms by looking at their own implementations, and fail to
consider other possible implementations!

Well, it took me some time to guess what the batch rules in UAX#9 actually mean (in other word,
what they do and why, and not only how), but I think (or I hope) I’ve managed to understand them.
In some sense, there are two bidi modes, one for numbers, and the other for text. Furthermore,
setting just the direction in R text is not enough, because there are actually two R modes (set
explicitly in Unicode with RLM and ALM). In babel the dir is set by a higher protocol based on the
language/script, which in turn sets the correct dir (<l>, <r> or <al>).
From UAX#9: “Where available, markup should be used instead of the explicit formatting
characters”. So, this simple version just ignores formatting characters. Actually, most of that annex is
devoted to how to handle them.
BD14-BD16 are not implemented. Unicode (and the W3C) are making a great effort to deal with some
special problematic cases in “streamed” plain text. I don’t think this is the way to go – particular
issues should be fixed by a high level interface taking into account the needs of the document. And
here is where luatex excels, because everything related to bidi writing is under our control.
6811 h∗basic-ri
6812 Babel = Babel or {}
6813
6814 Babel.bidi_enabled = true
6815
6816 require('babel-data-bidi.lua')
6817
6818 local characters = Babel.characters
6819 local ranges = Babel.ranges
6820
6821 local DIR = node.id("dir")
6822
6823 local function dir_mark(head, from, to, outer)
6824 dir = (outer == 'r') and 'TLT' or 'TRT' -- ie, reverse
6825 local d = node.new(DIR)
6826 d.dir = '+' .. dir
6827 node.insert_before(head, from, d)
6828 d = node.new(DIR)
6829 d.dir = '-' .. dir
6830 node.insert_after(head, to, d)
6831 end
6832
6833 function Babel.bidi(head, ispar)
6834 local first_n, last_n -- first and last char with nums
6835 local last_es -- an auxiliary 'last' used with nums
6836 local first_d, last_d -- first and last char in L/R block
6837 local dir, dir_real
Next also depends on script/lang (<al>/<r>). To be set by babel. tex.pardir is dangerous, could be
(re)set but it should be changed only in vmode. There are two strong’s – strong = l/al/r and
strong_lr = l/r (there must be a better way):
6838 local strong = ('TRT' == tex.pardir) and 'r' or 'l'
6839 local strong_lr = (strong == 'l') and 'l' or 'r'
6840 local outer = strong
6841
6842 local new_dir = false
6843 local first_dir = false

195
6844 local inmath = false
6845
6846 local last_lr
6847
6848 local type_n = ''
6849
6850 for item in node.traverse(head) do
6851
6852 -- three cases: glyph, dir, otherwise
6853 if item.id == node.id'glyph'
6854 or (item.id == 7 and item.subtype == 2) then
6855
6856 local itemchar
6857 if item.id == 7 and item.subtype == 2 then
6858 itemchar = item.replace.char
6859 else
6860 itemchar = item.char
6861 end
6862 local chardata = characters[itemchar]
6863 dir = chardata and chardata.d or nil
6864 if not dir then
6865 for nn, et in ipairs(ranges) do
6866 if itemchar < et[1] then
6867 break
6868 elseif itemchar <= et[2] then
6869 dir = et[3]
6870 break
6871 end
6872 end
6873 end
6874 dir = dir or 'l'
6875 if inmath then dir = ('TRT' == tex.mathdir) and 'r' or 'l' end

Next is based on the assumption babel sets the language AND switches the script with its dir. We
treat a language block as a separate Unicode sequence. The following piece of code is executed at the
first glyph after a ‘dir’ node. We don’t know the current language until then. This is not exactly true,
as the math mode may insert explicit dirs in the node list, so, for the moment there is a hack by brute
force (just above).

6876 if new_dir then


6877 attr_dir = 0
6878 for at in node.traverse(item.attr) do
6879 if at.number == Babel.attr_dir then
6880 attr_dir = at.value % 3
6881 end
6882 end
6883 if attr_dir == 1 then
6884 strong = 'r'
6885 elseif attr_dir == 2 then
6886 strong = 'al'
6887 else
6888 strong = 'l'
6889 end
6890 strong_lr = (strong == 'l') and 'l' or 'r'
6891 outer = strong_lr
6892 new_dir = false
6893 end
6894
6895 if dir == 'nsm' then dir = strong end -- W1

Numbers. The dual <al>/<r> system for R is somewhat cumbersome.

6896 dir_real = dir -- We need dir_real to set strong below


6897 if dir == 'al' then dir = 'r' end -- W3

196
By W2, there are no <en> <et> <es> if strong == <al>, only <an>. Therefore, there are not <et en>
nor <en et>, W5 can be ignored, and W6 applied:

6898 if strong == 'al' then


6899 if dir == 'en' then dir = 'an' end -- W2
6900 if dir == 'et' or dir == 'es' then dir = 'on' end -- W6
6901 strong_lr = 'r' -- W3
6902 end

Once finished the basic setup for glyphs, consider the two other cases: dir node and the rest.

6903 elseif item.id == node.id'dir' and not inmath then


6904 new_dir = true
6905 dir = nil
6906 elseif item.id == node.id'math' then
6907 inmath = (item.subtype == 0)
6908 else
6909 dir = nil -- Not a char
6910 end

Numbers in R mode. A sequence of <en>, <et>, <an>, <es> and <cs> is typeset (with some rules) in L
mode. We store the starting and ending points, and only when anything different is found (including
nil, ie, a non-char), the textdir is set. This means you cannot insert, say, a whatsit, but this is what I
would expect (with luacolor you may colorize some digits). Anyway, this behavior could be changed
with a switch in the future. Note in the first branch only <an> is relevant if <al>.

6911 if dir == 'en' or dir == 'an' or dir == 'et' then


6912 if dir ~= 'et' then
6913 type_n = dir
6914 end
6915 first_n = first_n or item
6916 last_n = last_es or item
6917 last_es = nil
6918 elseif dir == 'es' and last_n then -- W3+W6
6919 last_es = item
6920 elseif dir == 'cs' then -- it's right - do nothing
6921 elseif first_n then -- & if dir = any but en, et, an, es, cs, inc nil
6922 if strong_lr == 'r' and type_n ~= '' then
6923 dir_mark(head, first_n, last_n, 'r')
6924 elseif strong_lr == 'l' and first_d and type_n == 'an' then
6925 dir_mark(head, first_n, last_n, 'r')
6926 dir_mark(head, first_d, last_d, outer)
6927 first_d, last_d = nil, nil
6928 elseif strong_lr == 'l' and type_n ~= '' then
6929 last_d = last_n
6930 end
6931 type_n = ''
6932 first_n, last_n = nil, nil
6933 end

R text in L, or L text in R. Order of dir_ mark’s are relevant: d goes outside n, and therefore it’s
emitted after. See dir_mark to understand why (but is the nesting actually necessary or is a flat dir
structure enough?). Only L, R (and AL) chars are taken into account – everything else, including
spaces, whatsits, etc., are ignored:

6934 if dir == 'l' or dir == 'r' then


6935 if dir ~= outer then
6936 first_d = first_d or item
6937 last_d = item
6938 elseif first_d and dir ~= strong_lr then
6939 dir_mark(head, first_d, last_d, outer)
6940 first_d, last_d = nil, nil
6941 end
6942 end

Mirroring. Each chunk of text in a certain language is considered a “closed” sequence. If <r on r>
and <l on l>, it’s clearly <r> and <l>, resptly, but with other combinations depends on outer. From all

197
these, we select only those resolving <on> → <r>. At the beginning (when last_lr is nil) of an R text,
they are mirrored directly.
TODO - numbers in R mode are processed. It doesn’t hurt, but should not be done.

6943 if dir and not last_lr and dir ~= 'l' and outer == 'r' then
6944 item.char = characters[item.char] and
6945 characters[item.char].m or item.char
6946 elseif (dir or new_dir) and last_lr ~= item then
6947 local mir = outer .. strong_lr .. (dir or outer)
6948 if mir == 'rrr' or mir == 'lrr' or mir == 'rrl' or mir == 'rlr' then
6949 for ch in node.traverse(node.next(last_lr)) do
6950 if ch == item then break end
6951 if ch.id == node.id'glyph' and characters[ch.char] then
6952 ch.char = characters[ch.char].m or ch.char
6953 end
6954 end
6955 end
6956 end

Save some values for the next iteration. If the current node is ‘dir’, open a new sequence. Since dir
could be changed, strong is set with its real value (dir_real).

6957 if dir == 'l' or dir == 'r' then


6958 last_lr = item
6959 strong = dir_real -- Don't search back - best save now
6960 strong_lr = (strong == 'l') and 'l' or 'r'
6961 elseif new_dir then
6962 last_lr = nil
6963 end
6964 end

Mirror the last chars if they are no directed. And make sure any open block is closed, too.

6965 if last_lr and outer == 'r' then


6966 for ch in node.traverse_id(node.id'glyph', node.next(last_lr)) do
6967 if characters[ch.char] then
6968 ch.char = characters[ch.char].m or ch.char
6969 end
6970 end
6971 end
6972 if first_n then
6973 dir_mark(head, first_n, last_n, outer)
6974 end
6975 if first_d then
6976 dir_mark(head, first_d, last_d, outer)
6977 end

In boxes, the dir node could be added before the original head, so the actual head is the previous
node.

6978 return node.prev(head) or head


6979 end
6980 h/basic-ri

And here the Lua code for bidi=basic:

6981 h∗basici
6982 Babel = Babel or {}
6983
6984 -- eg, Babel.fontmap[1][<prefontid>]=<dirfontid>
6985
6986 Babel.fontmap = Babel.fontmap or {}
6987 Babel.fontmap[0] = {} -- l
6988 Babel.fontmap[1] = {} -- r
6989 Babel.fontmap[2] = {} -- al/an
6990
6991 Babel.bidi_enabled = true
6992 Babel.mirroring_enabled = true

198
6993
6994 require('babel-data-bidi.lua')
6995
6996 local characters = Babel.characters
6997 local ranges = Babel.ranges
6998
6999 local DIR = node.id('dir')
7000 local GLYPH = node.id('glyph')
7001
7002 local function insert_implicit(head, state, outer)
7003 local new_state = state
7004 if state.sim and state.eim and state.sim ~= state.eim then
7005 dir = ((outer == 'r') and 'TLT' or 'TRT') -- ie, reverse
7006 local d = node.new(DIR)
7007 d.dir = '+' .. dir
7008 node.insert_before(head, state.sim, d)
7009 local d = node.new(DIR)
7010 d.dir = '-' .. dir
7011 node.insert_after(head, state.eim, d)
7012 end
7013 new_state.sim, new_state.eim = nil, nil
7014 return head, new_state
7015 end
7016
7017 local function insert_numeric(head, state)
7018 local new
7019 local new_state = state
7020 if state.san and state.ean and state.san ~= state.ean then
7021 local d = node.new(DIR)
7022 d.dir = '+TLT'
7023 _, new = node.insert_before(head, state.san, d)
7024 if state.san == state.sim then state.sim = new end
7025 local d = node.new(DIR)
7026 d.dir = '-TLT'
7027 _, new = node.insert_after(head, state.ean, d)
7028 if state.ean == state.eim then state.eim = new end
7029 end
7030 new_state.san, new_state.ean = nil, nil
7031 return head, new_state
7032 end
7033
7034 -- TODO - \hbox with an explicit dir can lead to wrong results
7035 -- <R \hbox dir TLT{<R>}> and <L \hbox dir TRT{<L>}>. A small attempt
7036 -- was s made to improve the situation, but the problem is the 3-dir
7037 -- model in babel/Unicode and the 2-dir model in LuaTeX don't fit
7038 -- well.
7039
7040 function Babel.bidi(head, ispar, hdir)
7041 local d -- d is used mainly for computations in a loop
7042 local prev_d = ''
7043 local new_d = false
7044
7045 local nodes = {}
7046 local outer_first = nil
7047 local inmath = false
7048
7049 local glue_d = nil
7050 local glue_i = nil
7051
7052 local has_en = false
7053 local first_et = nil
7054
7055 local ATDIR = Babel.attr_dir

199
7056
7057 local save_outer
7058 local temp = node.get_attribute(head, ATDIR)
7059 if temp then
7060 temp = temp % 3
7061 save_outer = (temp == 0 and 'l') or
7062 (temp == 1 and 'r') or
7063 (temp == 2 and 'al')
7064 elseif ispar then -- Or error? Shouldn't happen
7065 save_outer = ('TRT' == tex.pardir) and 'r' or 'l'
7066 else -- Or error? Shouldn't happen
7067 save_outer = ('TRT' == hdir) and 'r' or 'l'
7068 end
7069 -- when the callback is called, we are just _after_ the box,
7070 -- and the textdir is that of the surrounding text
7071 -- if not ispar and hdir ~= tex.textdir then
7072 -- save_outer = ('TRT' == hdir) and 'r' or 'l'
7073 -- end
7074 local outer = save_outer
7075 local last = outer
7076 -- 'al' is only taken into account in the first, current loop
7077 if save_outer == 'al' then save_outer = 'r' end
7078
7079 local fontmap = Babel.fontmap
7080
7081 for item in node.traverse(head) do
7082
7083 -- In what follows, #node is the last (previous) node, because the
7084 -- current one is not added until we start processing the neutrals.
7085
7086 -- three cases: glyph, dir, otherwise
7087 if item.id == GLYPH
7088 or (item.id == 7 and item.subtype == 2) then
7089
7090 local d_font = nil
7091 local item_r
7092 if item.id == 7 and item.subtype == 2 then
7093 item_r = item.replace -- automatic discs have just 1 glyph
7094 else
7095 item_r = item
7096 end
7097 local chardata = characters[item_r.char]
7098 d = chardata and chardata.d or nil
7099 if not d or d == 'nsm' then
7100 for nn, et in ipairs(ranges) do
7101 if item_r.char < et[1] then
7102 break
7103 elseif item_r.char <= et[2] then
7104 if not d then d = et[3]
7105 elseif d == 'nsm' then d_font = et[3]
7106 end
7107 break
7108 end
7109 end
7110 end
7111 d = d or 'l'
7112
7113 -- A short 'pause' in bidi for mapfont
7114 d_font = d_font or d
7115 d_font = (d_font == 'l' and 0) or
7116 (d_font == 'nsm' and 0) or
7117 (d_font == 'r' and 1) or
7118 (d_font == 'al' and 2) or

200
7119 (d_font == 'an' and 2) or nil
7120 if d_font and fontmap and fontmap[d_font][item_r.font] then
7121 item_r.font = fontmap[d_font][item_r.font]
7122 end
7123
7124 if new_d then
7125 table.insert(nodes, {nil, (outer == 'l') and 'l' or 'r', nil})
7126 if inmath then
7127 attr_d = 0
7128 else
7129 attr_d = node.get_attribute(item, ATDIR)
7130 attr_d = attr_d % 3
7131 end
7132 if attr_d == 1 then
7133 outer_first = 'r'
7134 last = 'r'
7135 elseif attr_d == 2 then
7136 outer_first = 'r'
7137 last = 'al'
7138 else
7139 outer_first = 'l'
7140 last = 'l'
7141 end
7142 outer = last
7143 has_en = false
7144 first_et = nil
7145 new_d = false
7146 end
7147
7148 if glue_d then
7149 if (d == 'l' and 'l' or 'r') ~= glue_d then
7150 table.insert(nodes, {glue_i, 'on', nil})
7151 end
7152 glue_d = nil
7153 glue_i = nil
7154 end
7155
7156 elseif item.id == DIR then
7157 d = nil
7158 if head ~= item then new_d = true end
7159
7160 elseif item.id == node.id'glue' and item.subtype == 13 then
7161 glue_d = d
7162 glue_i = item
7163 d = nil
7164
7165 elseif item.id == node.id'math' then
7166 inmath = (item.subtype == 0)
7167
7168 else
7169 d = nil
7170 end
7171
7172 -- AL <= EN/ET/ES -- W2 + W3 + W6
7173 if last == 'al' and d == 'en' then
7174 d = 'an' -- W3
7175 elseif last == 'al' and (d == 'et' or d == 'es') then
7176 d = 'on' -- W6
7177 end
7178
7179 -- EN + CS/ES + EN -- W4
7180 if d == 'en' and #nodes >= 2 then
7181 if (nodes[#nodes][2] == 'es' or nodes[#nodes][2] == 'cs')

201
7182 and nodes[#nodes-1][2] == 'en' then
7183 nodes[#nodes][2] = 'en'
7184 end
7185 end
7186
7187 -- AN + CS + AN -- W4 too, because uax9 mixes both cases
7188 if d == 'an' and #nodes >= 2 then
7189 if (nodes[#nodes][2] == 'cs')
7190 and nodes[#nodes-1][2] == 'an' then
7191 nodes[#nodes][2] = 'an'
7192 end
7193 end
7194
7195 -- ET/EN -- W5 + W7->l / W6->on
7196 if d == 'et' then
7197 first_et = first_et or (#nodes + 1)
7198 elseif d == 'en' then
7199 has_en = true
7200 first_et = first_et or (#nodes + 1)
7201 elseif first_et then -- d may be nil here !
7202 if has_en then
7203 if last == 'l' then
7204 temp = 'l' -- W7
7205 else
7206 temp = 'en' -- W5
7207 end
7208 else
7209 temp = 'on' -- W6
7210 end
7211 for e = first_et, #nodes do
7212 if nodes[e][1].id == GLYPH then nodes[e][2] = temp end
7213 end
7214 first_et = nil
7215 has_en = false
7216 end
7217
7218 -- Force mathdir in math if ON (currently works as expected only
7219 -- with 'l')
7220 if inmath and d == 'on' then
7221 d = ('TRT' == tex.mathdir) and 'r' or 'l'
7222 end
7223
7224 if d then
7225 if d == 'al' then
7226 d = 'r'
7227 last = 'al'
7228 elseif d == 'l' or d == 'r' then
7229 last = d
7230 end
7231 prev_d = d
7232 table.insert(nodes, {item, d, outer_first})
7233 end
7234
7235 outer_first = nil
7236
7237 end
7238
7239 -- TODO -- repeated here in case EN/ET is the last node. Find a
7240 -- better way of doing things:
7241 if first_et then -- dir may be nil here !
7242 if has_en then
7243 if last == 'l' then
7244 temp = 'l' -- W7

202
7245 else
7246 temp = 'en' -- W5
7247 end
7248 else
7249 temp = 'on' -- W6
7250 end
7251 for e = first_et, #nodes do
7252 if nodes[e][1].id == GLYPH then nodes[e][2] = temp end
7253 end
7254 end
7255
7256 -- dummy node, to close things
7257 table.insert(nodes, {nil, (outer == 'l') and 'l' or 'r', nil})
7258
7259 --------------- NEUTRAL -----------------
7260
7261 outer = save_outer
7262 last = outer
7263
7264 local first_on = nil
7265
7266 for q = 1, #nodes do
7267 local item
7268
7269 local outer_first = nodes[q][3]
7270 outer = outer_first or outer
7271 last = outer_first or last
7272
7273 local d = nodes[q][2]
7274 if d == 'an' or d == 'en' then d = 'r' end
7275 if d == 'cs' or d == 'et' or d == 'es' then d = 'on' end --- W6
7276
7277 if d == 'on' then
7278 first_on = first_on or q
7279 elseif first_on then
7280 if last == d then
7281 temp = d
7282 else
7283 temp = outer
7284 end
7285 for r = first_on, q - 1 do
7286 nodes[r][2] = temp
7287 item = nodes[r][1] -- MIRRORING
7288 if Babel.mirroring_enabled and item.id == GLYPH
7289 and temp == 'r' and characters[item.char] then
7290 local font_mode = ''
7291 if item.font > 0 and font.fonts[item.font].properties then
7292 font_mode = font.fonts[item.font].properties.mode
7293 end
7294 if font_mode ~= 'harf' and font_mode ~= 'plug' then
7295 item.char = characters[item.char].m or item.char
7296 end
7297 end
7298 end
7299 first_on = nil
7300 end
7301
7302 if d == 'r' or d == 'l' then last = d end
7303 end
7304
7305 -------------- IMPLICIT, REORDER ----------------
7306
7307 outer = save_outer

203
7308 last = outer
7309
7310 local state = {}
7311 state.has_r = false
7312
7313 for q = 1, #nodes do
7314
7315 local item = nodes[q][1]
7316
7317 outer = nodes[q][3] or outer
7318
7319 local d = nodes[q][2]
7320
7321 if d == 'nsm' then d = last end -- W1
7322 if d == 'en' then d = 'an' end
7323 local isdir = (d == 'r' or d == 'l')
7324
7325 if outer == 'l' and d == 'an' then
7326 state.san = state.san or item
7327 state.ean = item
7328 elseif state.san then
7329 head, state = insert_numeric(head, state)
7330 end
7331
7332 if outer == 'l' then
7333 if d == 'an' or d == 'r' then -- im -> implicit
7334 if d == 'r' then state.has_r = true end
7335 state.sim = state.sim or item
7336 state.eim = item
7337 elseif d == 'l' and state.sim and state.has_r then
7338 head, state = insert_implicit(head, state, outer)
7339 elseif d == 'l' then
7340 state.sim, state.eim, state.has_r = nil, nil, false
7341 end
7342 else
7343 if d == 'an' or d == 'l' then
7344 if nodes[q][3] then -- nil except after an explicit dir
7345 state.sim = item -- so we move sim 'inside' the group
7346 else
7347 state.sim = state.sim or item
7348 end
7349 state.eim = item
7350 elseif d == 'r' and state.sim then
7351 head, state = insert_implicit(head, state, outer)
7352 elseif d == 'r' then
7353 state.sim, state.eim = nil, nil
7354 end
7355 end
7356
7357 if isdir then
7358 last = d -- Don't search back - best save now
7359 elseif d == 'on' and state.san then
7360 state.san = state.san or item
7361 state.ean = item
7362 end
7363
7364 end
7365
7366 return node.prev(head) or head
7367 end
7368 h/basici

204
13 Data for CJK
It is a boring file and it is not shown here (see the generated file), but here is a sample:

[0x0021]={c='ex'},
[0x0024]={c='pr'},
[0x0025]={c='po'},
[0x0028]={c='op'},
[0x0029]={c='cp'},
[0x002B]={c='pr'},

For the meaning of these codes, see the Unicode standard.

14 The ‘nil’ language


This ‘language’ does nothing, except setting the hyphenation patterns to nohyphenation.
For this language currently no special definitions are needed or available.
The macro \LdfInit takes care of preventing that this file is loaded more than once, checking the
category code of the @ sign, etc.

7369 h∗nili
7370 \ProvidesLanguage{nil}[ hhdateii hhversionii Nil language]
7371 \LdfInit{nil}{datenil}

When this file is read as an option, i.e. by the \usepackage command, nil could be an ‘unknown’
language in which case we have to make it known.

7372 \ifx\l@nil\@undefined
7373 \newlanguage\l@nil
7374 \@namedef{bbl@hyphendata@\the\l@nil}{{}{}}% Remove warning
7375 \let\bbl@elt\relax
7376 \edef\bbl@languages{% Add it to the list of languages
7377 \bbl@languages\bbl@elt{nil}{\the\l@nil}{}{}}
7378 \fi

This macro is used to store the values of the hyphenation parameters \lefthyphenmin and
\righthyphenmin.

7379 \providehyphenmins{\CurrentOption}{\m@ne\m@ne}

The next step consists of defining commands to switch to (and from) the ‘nil’ language.

\captionnil
\datenil 7380 \let\captionsnil\@empty
7381 \let\datenil\@empty

There is no locale file for this pseudo-language, so the corresponding fields are defined here.

7382 \def\bbl@inidata@nil{%
7383 \bbl@elt{identification}{tag.ini}{und}%
7384 \bbl@elt{identification}{load.level}{0}%
7385 \bbl@elt{identification}{charset}{utf8}%
7386 \bbl@elt{identification}{version}{1.0}%
7387 \bbl@elt{identification}{date}{2022-05-16}%
7388 \bbl@elt{identification}{name.local}{nil}%
7389 \bbl@elt{identification}{name.english}{nil}%
7390 \bbl@elt{identification}{name.babel}{nil}%
7391 \bbl@elt{identification}{tag.bcp47}{und}%
7392 \bbl@elt{identification}{language.tag.bcp47}{und}%
7393 \bbl@elt{identification}{tag.opentype}{dflt}%
7394 \bbl@elt{identification}{script.name}{Latin}%
7395 \bbl@elt{identification}{script.tag.bcp47}{Latn}%
7396 \bbl@elt{identification}{script.tag.opentype}{DFLT}%
7397 \bbl@elt{identification}{level}{1}%
7398 \bbl@elt{identification}{encodings}{}%
7399 \bbl@elt{identification}{derivate}{no}}

205
7400 \@namedef{bbl@tbcp@nil}{und}
7401 \@namedef{bbl@lbcp@nil}{und}
7402 \@namedef{bbl@lotf@nil}{dflt}
7403 \@namedef{bbl@elname@nil}{nil}
7404 \@namedef{bbl@lname@nil}{nil}
7405 \@namedef{bbl@esname@nil}{Latin}
7406 \@namedef{bbl@sname@nil}{Latin}
7407 \@namedef{bbl@sbcp@nil}{Latn}
7408 \@namedef{bbl@sotf@nil}{Latn}

The macro \ldf@finish takes care of looking for a configuration file, setting the main language to be
switched on at \begin{document} and resetting the category code of @ to its original value.

7409 \ldf@finish{nil}
7410 h/nili

15 Calendars
The code for specific calendars are placed in the specific files, loaded when requested by an ini file
in the identification section with require.calendars.
Start with function to compute the Julian day. It’s based on the little library calendar.js, by John
Walker, in the public domain.
7411 hh∗Compute Julian dayii ≡
7412 \def\bbl@fpmod#1#2{(#1-#2*floor(#1/#2))}
7413 \def\bbl@cs@gregleap#1{%
7414 (\bbl@fpmod{#1}{4} == 0) &&
7415 (!((\bbl@fpmod{#1}{100} == 0) && (\bbl@fpmod{#1}{400} != 0)))}
7416 \def\bbl@cs@jd#1#2#3{% year, month, day
7417 \fp_eval:n{ 1721424.5 + (365 * (#1 - 1)) +
7418 floor((#1 - 1) / 4) + (-floor((#1 - 1) / 100)) +
7419 floor((#1 - 1) / 400) + floor((((367 * #2) - 362) / 12) +
7420 ((#2 <= 2) ? 0 : (\bbl@cs@gregleap{#1} ? -1 : -2)) + #3) }}
7421 hh/Compute Julian dayii

15.1 Islamic
The code for the Civil calendar is based on it, too.

7422 h∗ca-islamici
7423 \ExplSyntaxOn
7424 hhComputeJulian dayii
7425 % == islamic (default)
7426 % Not yet implemented
7427 \def\bbl@ca@islamic#1-#2-#3\@@#4#5#6{}

The Civil calendar.

7428 \def\bbl@cs@isltojd#1#2#3{ % year, month, day


7429 ((#3 + ceil(29.5 * (#2 - 1)) +
7430 (#1 - 1) * 354 + floor((3 + (11 * #1)) / 30) +
7431 1948439.5) - 1) }
7432 \@namedef{bbl@ca@islamic-civil++}{\bbl@ca@islamicvl@x{+2}}
7433 \@namedef{bbl@ca@islamic-civil+}{\bbl@ca@islamicvl@x{+1}}
7434 \@namedef{bbl@ca@islamic-civil}{\bbl@ca@islamicvl@x{}}
7435 \@namedef{bbl@ca@islamic-civil-}{\bbl@ca@islamicvl@x{-1}}
7436 \@namedef{bbl@ca@islamic-civil--}{\bbl@ca@islamicvl@x{-2}}
7437 \def\bbl@ca@islamicvl@x#1#2-#3-#4\@@#5#6#7{%
7438 \edef\bbl@tempa{%
7439 \fp_eval:n{ floor(\bbl@cs@jd{#2}{#3}{#4})+0.5 #1}}%
7440 \edef#5{%
7441 \fp_eval:n{ floor(((30*(\[email protected])) + 10646)/10631) }}%
7442 \edef#6{\fp_eval:n{
7443 min(12,ceil((\bbl@tempa-(29+\bbl@cs@isltojd{#5}{1}{1}))/29.5)+1) }}%
7444 \edef#7{\fp_eval:n{ \bbl@tempa - \bbl@cs@isltojd{#5}{#6}{1} + 1} }}

206
The Umm al-Qura calendar, used mainly in Saudi Arabia, is based on moment-hijri, by Abdullah
Alsigar (license MIT).
Since the main aim is to provide a suitable \today, and maybe some close dates, data just covers
Hijri ∼1435/∼1460 (Gregorian ∼2014/∼2038).

7445 \def\bbl@cs@umalqura@data{56660, 56690,56719,56749,56778,56808,%


7446 56837,56867,56897,56926,56956,56985,57015,57044,57074,57103,%
7447 57133,57162,57192,57221,57251,57280,57310,57340,57369,57399,%
7448 57429,57458,57487,57517,57546,57576,57605,57634,57664,57694,%
7449 57723,57753,57783,57813,57842,57871,57901,57930,57959,57989,%
7450 58018,58048,58077,58107,58137,58167,58196,58226,58255,58285,%
7451 58314,58343,58373,58402,58432,58461,58491,58521,58551,58580,%
7452 58610,58639,58669,58698,58727,58757,58786,58816,58845,58875,%
7453 58905,58934,58964,58994,59023,59053,59082,59111,59141,59170,%
7454 59200,59229,59259,59288,59318,59348,59377,59407,59436,59466,%
7455 59495,59525,59554,59584,59613,59643,59672,59702,59731,59761,%
7456 59791,59820,59850,59879,59909,59939,59968,59997,60027,60056,%
7457 60086,60115,60145,60174,60204,60234,60264,60293,60323,60352,%
7458 60381,60411,60440,60469,60499,60528,60558,60588,60618,60648,%
7459 60677,60707,60736,60765,60795,60824,60853,60883,60912,60942,%
7460 60972,61002,61031,61061,61090,61120,61149,61179,61208,61237,%
7461 61267,61296,61326,61356,61385,61415,61445,61474,61504,61533,%
7462 61563,61592,61621,61651,61680,61710,61739,61769,61799,61828,%
7463 61858,61888,61917,61947,61976,62006,62035,62064,62094,62123,%
7464 62153,62182,62212,62242,62271,62301,62331,62360,62390,62419,%
7465 62448,62478,62507,62537,62566,62596,62625,62655,62685,62715,%
7466 62744,62774,62803,62832,62862,62891,62921,62950,62980,63009,%
7467 63039,63069,63099,63128,63157,63187,63216,63246,63275,63305,%
7468 63334,63363,63393,63423,63453,63482,63512,63541,63571,63600,%
7469 63630,63659,63689,63718,63747,63777,63807,63836,63866,63895,%
7470 63925,63955,63984,64014,64043,64073,64102,64131,64161,64190,%
7471 64220,64249,64279,64309,64339,64368,64398,64427,64457,64486,%
7472 64515,64545,64574,64603,64633,64663,64692,64722,64752,64782,%
7473 64811,64841,64870,64899,64929,64958,64987,65017,65047,65076,%
7474 65106,65136,65166,65195,65225,65254,65283,65313,65342,65371,%
7475 65401,65431,65460,65490,65520}
7476 \@namedef{bbl@ca@islamic-umalqura+}{\bbl@ca@islamcuqr@x{+1}}
7477 \@namedef{bbl@ca@islamic-umalqura}{\bbl@ca@islamcuqr@x{}}
7478 \@namedef{bbl@ca@islamic-umalqura-}{\bbl@ca@islamcuqr@x{-1}}
7479 \def\bbl@ca@islamcuqr@x#1#2-#3-#4\@@#5#6#7{%
7480 \ifnum#2>2014 \ifnum#2<2038
7481 \bbl@afterfi\expandafter\@gobble
7482 \fi\fi
7483 {\bbl@error{Year~out~of~range}{The~allowed~range~is~2014-2038}}%
7484 \edef\bbl@tempd{\fp_eval:n{ % (Julian) day
7485 \bbl@cs@jd{#2}{#3}{#4} + 0.5 - 2400000 #1}}%
7486 \count@\@ne
7487 \bbl@foreach\bbl@cs@umalqura@data{%
7488 \advance\count@\@ne
7489 \ifnum##1>\bbl@tempd\else
7490 \edef\bbl@tempe{\the\count@}%
7491 \edef\bbl@tempb{##1}%
7492 \fi}%
7493 \edef\bbl@templ{\fp_eval:n{ \bbl@tempe + 16260 + 949 }}% month~lunar
7494 \edef\bbl@tempa{\fp_eval:n{ floor((\bbl@templ - 1 ) / 12) }}% annus
7495 \edef#5{\fp_eval:n{ \bbl@tempa + 1 }}%
7496 \edef#6{\fp_eval:n{ \bbl@templ - (12 * \bbl@tempa) }}%
7497 \edef#7{\fp_eval:n{ \bbl@tempd - \bbl@tempb + 1 }}}
7498 \ExplSyntaxOff
7499 \bbl@add\bbl@precalendar{%
7500 \bbl@replace\bbl@ld@calendar{-civil}{}%
7501 \bbl@replace\bbl@ld@calendar{-umalqura}{}%
7502 \bbl@replace\bbl@ld@calendar{+}{}%
7503 \bbl@replace\bbl@ld@calendar{-}{}}

207
7504 h/ca-islamici

16 Hebrew
This is basically the set of macros written by Michail Rozman in 1991, with corrections and adaptions
by Rama Porrat, Misha, Dan Haran and Boris Lavva. This must be eventually replaced by
computations with l3fp. An explanation of what’s going on can be found in hebcal.sty

7505 h∗ca-hebrewi
7506 \newcount\bbl@cntcommon
7507 \def\bbl@remainder#1#2#3{%
7508 #3=#1\relax
7509 \divide #3 by #2\relax
7510 \multiply #3 by -#2\relax
7511 \advance #3 by #1\relax}%
7512 \newif\ifbbl@divisible
7513 \def\bbl@checkifdivisible#1#2{%
7514 {\countdef\tmp=0
7515 \bbl@remainder{#1}{#2}{\tmp}%
7516 \ifnum \tmp=0
7517 \global\bbl@divisibletrue
7518 \else
7519 \global\bbl@divisiblefalse
7520 \fi}}
7521 \newif\ifbbl@gregleap
7522 \def\bbl@ifgregleap#1{%
7523 \bbl@checkifdivisible{#1}{4}%
7524 \ifbbl@divisible
7525 \bbl@checkifdivisible{#1}{100}%
7526 \ifbbl@divisible
7527 \bbl@checkifdivisible{#1}{400}%
7528 \ifbbl@divisible
7529 \bbl@gregleaptrue
7530 \else
7531 \bbl@gregleapfalse
7532 \fi
7533 \else
7534 \bbl@gregleaptrue
7535 \fi
7536 \else
7537 \bbl@gregleapfalse
7538 \fi
7539 \ifbbl@gregleap}
7540 \def\bbl@gregdayspriormonths#1#2#3{%
7541 {#3=\ifcase #1 0 \or 0 \or 31 \or 59 \or 90 \or 120 \or 151 \or
7542 181 \or 212 \or 243 \or 273 \or 304 \or 334 \fi
7543 \bbl@ifgregleap{#2}%
7544 \ifnum #1 > 2
7545 \advance #3 by 1
7546 \fi
7547 \fi
7548 \global\bbl@cntcommon=#3}%
7549 #3=\bbl@cntcommon}
7550 \def\bbl@gregdaysprioryears#1#2{%
7551 {\countdef\tmpc=4
7552 \countdef\tmpb=2
7553 \tmpb=#1\relax
7554 \advance \tmpb by -1
7555 \tmpc=\tmpb
7556 \multiply \tmpc by 365
7557 #2=\tmpc
7558 \tmpc=\tmpb
7559 \divide \tmpc by 4

208
7560 \advance #2 by \tmpc
7561 \tmpc=\tmpb
7562 \divide \tmpc by 100
7563 \advance #2 by -\tmpc
7564 \tmpc=\tmpb
7565 \divide \tmpc by 400
7566 \advance #2 by \tmpc
7567 \global\bbl@cntcommon=#2\relax}%
7568 #2=\bbl@cntcommon}
7569 \def\bbl@absfromgreg#1#2#3#4{%
7570 {\countdef\tmpd=0
7571 #4=#1\relax
7572 \bbl@gregdayspriormonths{#2}{#3}{\tmpd}%
7573 \advance #4 by \tmpd
7574 \bbl@gregdaysprioryears{#3}{\tmpd}%
7575 \advance #4 by \tmpd
7576 \global\bbl@cntcommon=#4\relax}%
7577 #4=\bbl@cntcommon}
7578 \newif\ifbbl@hebrleap
7579 \def\bbl@checkleaphebryear#1{%
7580 {\countdef\tmpa=0
7581 \countdef\tmpb=1
7582 \tmpa=#1\relax
7583 \multiply \tmpa by 7
7584 \advance \tmpa by 1
7585 \bbl@remainder{\tmpa}{19}{\tmpb}%
7586 \ifnum \tmpb < 7
7587 \global\bbl@hebrleaptrue
7588 \else
7589 \global\bbl@hebrleapfalse
7590 \fi}}
7591 \def\bbl@hebrelapsedmonths#1#2{%
7592 {\countdef\tmpa=0
7593 \countdef\tmpb=1
7594 \countdef\tmpc=2
7595 \tmpa=#1\relax
7596 \advance \tmpa by -1
7597 #2=\tmpa
7598 \divide #2 by 19
7599 \multiply #2 by 235
7600 \bbl@remainder{\tmpa}{19}{\tmpb}% \tmpa=years%19-years this cycle
7601 \tmpc=\tmpb
7602 \multiply \tmpb by 12
7603 \advance #2 by \tmpb
7604 \multiply \tmpc by 7
7605 \advance \tmpc by 1
7606 \divide \tmpc by 19
7607 \advance #2 by \tmpc
7608 \global\bbl@cntcommon=#2}%
7609 #2=\bbl@cntcommon}
7610 \def\bbl@hebrelapseddays#1#2{%
7611 {\countdef\tmpa=0
7612 \countdef\tmpb=1
7613 \countdef\tmpc=2
7614 \bbl@hebrelapsedmonths{#1}{#2}%
7615 \tmpa=#2\relax
7616 \multiply \tmpa by 13753
7617 \advance \tmpa by 5604
7618 \bbl@remainder{\tmpa}{25920}{\tmpc}% \tmpc == ConjunctionParts
7619 \divide \tmpa by 25920
7620 \multiply #2 by 29
7621 \advance #2 by 1
7622 \advance #2 by \tmpa

209
7623 \bbl@remainder{#2}{7}{\tmpa}%
7624 \ifnum \tmpc < 19440
7625 \ifnum \tmpc < 9924
7626 \else
7627 \ifnum \tmpa=2
7628 \bbl@checkleaphebryear{#1}% of a common year
7629 \ifbbl@hebrleap
7630 \else
7631 \advance #2 by 1
7632 \fi
7633 \fi
7634 \fi
7635 \ifnum \tmpc < 16789
7636 \else
7637 \ifnum \tmpa=1
7638 \advance #1 by -1
7639 \bbl@checkleaphebryear{#1}% at the end of leap year
7640 \ifbbl@hebrleap
7641 \advance #2 by 1
7642 \fi
7643 \fi
7644 \fi
7645 \else
7646 \advance #2 by 1
7647 \fi
7648 \bbl@remainder{#2}{7}{\tmpa}%
7649 \ifnum \tmpa=0
7650 \advance #2 by 1
7651 \else
7652 \ifnum \tmpa=3
7653 \advance #2 by 1
7654 \else
7655 \ifnum \tmpa=5
7656 \advance #2 by 1
7657 \fi
7658 \fi
7659 \fi
7660 \global\bbl@cntcommon=#2\relax}%
7661 #2=\bbl@cntcommon}
7662 \def\bbl@daysinhebryear#1#2{%
7663 {\countdef\tmpe=12
7664 \bbl@hebrelapseddays{#1}{\tmpe}%
7665 \advance #1 by 1
7666 \bbl@hebrelapseddays{#1}{#2}%
7667 \advance #2 by -\tmpe
7668 \global\bbl@cntcommon=#2}%
7669 #2=\bbl@cntcommon}
7670 \def\bbl@hebrdayspriormonths#1#2#3{%
7671 {\countdef\tmpf= 14
7672 #3=\ifcase #1\relax
7673 0 \or
7674 0 \or
7675 30 \or
7676 59 \or
7677 89 \or
7678 118 \or
7679 148 \or
7680 148 \or
7681 177 \or
7682 207 \or
7683 236 \or
7684 266 \or
7685 295 \or

210
7686 325 \or
7687 400
7688 \fi
7689 \bbl@checkleaphebryear{#2}%
7690 \ifbbl@hebrleap
7691 \ifnum #1 > 6
7692 \advance #3 by 30
7693 \fi
7694 \fi
7695 \bbl@daysinhebryear{#2}{\tmpf}%
7696 \ifnum #1 > 3
7697 \ifnum \tmpf=353
7698 \advance #3 by -1
7699 \fi
7700 \ifnum \tmpf=383
7701 \advance #3 by -1
7702 \fi
7703 \fi
7704 \ifnum #1 > 2
7705 \ifnum \tmpf=355
7706 \advance #3 by 1
7707 \fi
7708 \ifnum \tmpf=385
7709 \advance #3 by 1
7710 \fi
7711 \fi
7712 \global\bbl@cntcommon=#3\relax}%
7713 #3=\bbl@cntcommon}
7714 \def\bbl@absfromhebr#1#2#3#4{%
7715 {#4=#1\relax
7716 \bbl@hebrdayspriormonths{#2}{#3}{#1}%
7717 \advance #4 by #1\relax
7718 \bbl@hebrelapseddays{#3}{#1}%
7719 \advance #4 by #1\relax
7720 \advance #4 by -1373429
7721 \global\bbl@cntcommon=#4\relax}%
7722 #4=\bbl@cntcommon}
7723 \def\bbl@hebrfromgreg#1#2#3#4#5#6{%
7724 {\countdef\tmpx= 17
7725 \countdef\tmpy= 18
7726 \countdef\tmpz= 19
7727 #6=#3\relax
7728 \global\advance #6 by 3761
7729 \bbl@absfromgreg{#1}{#2}{#3}{#4}%
7730 \tmpz=1 \tmpy=1
7731 \bbl@absfromhebr{\tmpz}{\tmpy}{#6}{\tmpx}%
7732 \ifnum \tmpx > #4\relax
7733 \global\advance #6 by -1
7734 \bbl@absfromhebr{\tmpz}{\tmpy}{#6}{\tmpx}%
7735 \fi
7736 \advance #4 by -\tmpx
7737 \advance #4 by 1
7738 #5=#4\relax
7739 \divide #5 by 30
7740 \loop
7741 \bbl@hebrdayspriormonths{#5}{#6}{\tmpx}%
7742 \ifnum \tmpx < #4\relax
7743 \advance #5 by 1
7744 \tmpy=\tmpx
7745 \repeat
7746 \global\advance #5 by -1
7747 \global\advance #4 by -\tmpy}}
7748 \newcount\bbl@hebrday \newcount\bbl@hebrmonth \newcount\bbl@hebryear

211
7749 \newcount\bbl@gregday \newcount\bbl@gregmonth \newcount\bbl@gregyear
7750 \def\bbl@ca@hebrew#1-#2-#3\@@#4#5#6{%
7751 \bbl@gregday=#3\relax \bbl@gregmonth=#2\relax \bbl@gregyear=#1\relax
7752 \bbl@hebrfromgreg
7753 {\bbl@gregday}{\bbl@gregmonth}{\bbl@gregyear}%
7754 {\bbl@hebrday}{\bbl@hebrmonth}{\bbl@hebryear}%
7755 \edef#4{\the\bbl@hebryear}%
7756 \edef#5{\the\bbl@hebrmonth}%
7757 \edef#6{\the\bbl@hebrday}}
7758 h/ca-hebrewi

17 Persian
There is an algorithm written in TeX by Jabri, Abolhassani, Pournader and Esfahbod, created for the
first versions of the FarsiTeX system (no longer available), but the original license is GPL, so its use
with LPPL is problematic. The code here follows loosely that by John Walker, which is free and
accurate, but sadly very complex, so the relevant data for the years 2013-2050 have been
pre-calculated and stored. Actually, all we need is the first day (either March 20 or March 21).
7759 h∗ca-persiani
7760 \ExplSyntaxOn
7761 hhCompute Julian dayii
7762 \def\bbl@cs@firstjal@xx{2012,2016,2020,2024,2028,2029,% March 20
7763 2032,2033,2036,2037,2040,2041,2044,2045,2048,2049}
7764 \def\bbl@ca@persian#1-#2-#3\@@#4#5#6{%
7765 \edef\bbl@tempa{#1}% 20XX-03-\bbl@tempe = 1 farvardin:
7766 \ifnum\bbl@tempa>2012 \ifnum\bbl@tempa<2051
7767 \bbl@afterfi\expandafter\@gobble
7768 \fi\fi
7769 {\bbl@error{Year~out~of~range}{The~allowed~range~is~2013-2050}}%
7770 \bbl@xin@{\bbl@tempa}{\bbl@cs@firstjal@xx}%
7771 \ifin@\def\bbl@tempe{20}\else\def\bbl@tempe{21}\fi
7772 \edef\bbl@tempc{\fp_eval:n{\bbl@cs@jd{\bbl@tempa}{#2}{#3}+.5}}% current
7773 \edef\bbl@tempb{\fp_eval:n{\bbl@cs@jd{\bbl@tempa}{03}{\bbl@tempe}+.5}}% begin
7774 \ifnum\bbl@tempc<\bbl@tempb
7775 \edef\bbl@tempa{\fp_eval:n{\bbl@tempa-1}}% go back 1 year and redo
7776 \bbl@xin@{\bbl@tempa}{\bbl@cs@firstjal@xx}%
7777 \ifin@\def\bbl@tempe{20}\else\def\bbl@tempe{21}\fi
7778 \edef\bbl@tempb{\fp_eval:n{\bbl@cs@jd{\bbl@tempa}{03}{\bbl@tempe}+.5}}%
7779 \fi
7780 \edef#4{\fp_eval:n{\bbl@tempa-621}}% set Jalali year
7781 \edef#6{\fp_eval:n{\bbl@tempc-\bbl@tempb+1}}% days from 1 farvardin
7782 \edef#5{\fp_eval:n{% set Jalali month
7783 (#6 <= 186) ? ceil(#6 / 31) : ceil((#6 - 6) / 30)}}
7784 \edef#6{\fp_eval:n{% set Jalali day
7785 (#6 - ((#5 <= 7) ? ((#5 - 1) * 31) : (((#5 - 1) * 30) + 6)))}}}
7786 \ExplSyntaxOff
7787 h/ca-persiani

18 Coptic and Ethiopic


Adapted from jquery.calendars.package-1.1.4, written by Keith Wood, 2010. Dual license: GPL
and MIT. The only difference is the epoch.
7788 h∗ca-coptici
7789 \ExplSyntaxOn
7790 hhCompute Julian dayii
7791 \def\bbl@ca@coptic#1-#2-#3\@@#4#5#6{%
7792 \edef\bbl@tempd{\fp_eval:n{floor(\bbl@cs@jd{#1}{#2}{#3}) + 0.5}}%
7793 \edef\bbl@tempc{\fp_eval:n{\bbl@tempd - 1825029.5}}%
7794 \edef#4{\fp_eval:n{%
7795 floor((\bbl@tempc - floor((\bbl@tempc+366) / 1461)) / 365) + 1}}%
7796 \edef\bbl@tempc{\fp_eval:n{%

212
7797 \bbl@tempd - (#4-1) * 365 - floor(#4/4) - 1825029.5}}%
7798 \edef#5{\fp_eval:n{floor(\bbl@tempc / 30) + 1}}%
7799 \edef#6{\fp_eval:n{\bbl@tempc - (#5 - 1) * 30 + 1}}}
7800 \ExplSyntaxOff
7801 h/ca-coptici
7802 h∗ca-ethiopici
7803 \ExplSyntaxOn
7804 hhCompute Julian dayii
7805 \def\bbl@ca@ethiopic#1-#2-#3\@@#4#5#6{%
7806 \edef\bbl@tempd{\fp_eval:n{floor(\bbl@cs@jd{#1}{#2}{#3}) + 0.5}}%
7807 \edef\bbl@tempc{\fp_eval:n{\bbl@tempd - 1724220.5}}%
7808 \edef#4{\fp_eval:n{%
7809 floor((\bbl@tempc - floor((\bbl@tempc+366) / 1461)) / 365) + 1}}%
7810 \edef\bbl@tempc{\fp_eval:n{%
7811 \bbl@tempd - (#4-1) * 365 - floor(#4/4) - 1724220.5}}%
7812 \edef#5{\fp_eval:n{floor(\bbl@tempc / 30) + 1}}%
7813 \edef#6{\fp_eval:n{\bbl@tempc - (#5 - 1) * 30 + 1}}}
7814 \ExplSyntaxOff
7815 h/ca-ethiopici

19 Buddhist
That’s very simple.

7816 h∗ca-buddhisti
7817 \def\bbl@ca@buddhist#1-#2-#3\@@#4#5#6{%
7818 \edef#4{\number\numexpr#1+543\relax}%
7819 \edef#5{#2}%
7820 \edef#6{#3}}
7821 h/ca-buddhisti

20 Support for Plain TEX (plain.def)


20.1 Not renaming hyphen.tex
As Don Knuth has declared that the filename hyphen.tex may only be used to designate his version
of the american English hyphenation patterns, a new solution has to be found in order to be able to
load hyphenation patterns for other languages in a plain-based TEX-format. When asked he
responded:
That file name is “sacred”, and if anybody changes it they will cause severe
upward/downward compatibility headaches.
People can have a file localhyphen.tex or whatever they like, but they mustn’t diddle with
hyphen.tex (or plain.tex except to preload additional fonts).
The files bplain.tex and blplain.tex can be used as replacement wrappers around plain.tex and
lplain.tex to achieve the desired effect, based on the babel package. If you load each of them with
iniTEX, you will get a file called either bplain.fmt or blplain.fmt, which you can use as
replacements for plain.fmt and lplain.fmt.
As these files are going to be read as the first thing iniTEX sees, we need to set some category codes
just to be able to change the definition of \input.

7822 h∗bplain | blplaini


7823 \catcode`\{=1 % left brace is begin-group character
7824 \catcode`\}=2 % right brace is end-group character
7825 \catcode`\#=6 % hash mark is macro parameter character

If a file called hyphen.cfg can be found, we make sure that it will be read instead of the file
hyphen.tex. We do this by first saving the original meaning of \input (and I use a one letter control
sequence for that so as not to waste multi-letter control sequence on this in the format).

7826 \openin 0 hyphen.cfg


7827 \ifeof0
7828 \else
7829 \let\a\input

213
Then \input is defined to forget about its argument and load hyphen.cfg instead. Once that’s done
the original meaning of \input can be restored and the definition of \a can be forgotten.
7830 \def\input #1 {%
7831 \let\input\a
7832 \a hyphen.cfg
7833 \let\a\undefined
7834 }
7835 \fi
7836 h/bplain | blplaini

Now that we have made sure that hyphen.cfg will be loaded at the right moment it is time to load
plain.tex.
7837 hbplaini\a plain.tex
7838 hblplaini\a lplain.tex
Finally we change the contents of \fmtname to indicate that this is not the plain format, but a format
based on plain with the babel package preloaded.
7839 hbplaini\def\fmtname{babel-plain}
7840 hblplaini\def\fmtname{babel-lplain}

When you are using a different format, based on plain.tex you can make a copy of blplain.tex,
rename it and replace plain.tex with the name of your format file.

20.2 Emulating some LATEX features


The file babel.def expects some definitions made in the LATEX 2ε style file. So, in Plain we must
provide at least some predefined values as well some tools to set them (even if not all options are
available). There are no package options, and therefore and alternative mechanism is provided. For
the moment, only \babeloptionstrings and \babeloptionmath are provided, which can be defined
before loading babel. \BabelModifiers can be set too (but not sure it works).
7841 hh∗Emulate LaTeXii ≡
7842 \def\@empty{}
7843 \def\loadlocalcfg#1{%
7844 \openin0#1.cfg
7845 \ifeof0
7846 \closein0
7847 \else
7848 \closein0
7849 {\immediate\write16{*************************************}%
7850 \immediate\write16{* Local config file #1.cfg used}%
7851 \immediate\write16{*}%
7852 }
7853 \input #1.cfg\relax
7854 \fi
7855 \@endofldf}

20.3 General tools


A number of LATEX macro’s that are needed later on.
7856 \long\def\@firstofone#1{#1}
7857 \long\def\@firstoftwo#1#2{#1}
7858 \long\def\@secondoftwo#1#2{#2}
7859 \def\@nnil{\@nil}
7860 \def\@gobbletwo#1#2{}
7861 \def\@ifstar#1{\@ifnextchar *{\@firstoftwo{#1}}}
7862 \def\@star@or@long#1{%
7863 \@ifstar
7864 {\let\l@ngrel@x\relax#1}%
7865 {\let\l@ngrel@x\long#1}}
7866 \let\l@ngrel@x\relax
7867 \def\@car#1#2\@nil{#1}
7868 \def\@cdr#1#2\@nil{#2}
7869 \let\@typeset@protect\relax

214
7870 \let\protected@edef\edef
7871 \long\def\@gobble#1{}
7872 \edef\@backslashchar{\expandafter\@gobble\string\\}
7873 \def\strip@prefix#1>{}
7874 \def\g@addto@macro#1#2{{%
7875 \toks@\expandafter{#1#2}%
7876 \xdef#1{\the\toks@}}}
7877 \def\@namedef#1{\expandafter\def\csname #1\endcsname}
7878 \def\@nameuse#1{\csname #1\endcsname}
7879 \def\@ifundefined#1{%
7880 \expandafter\ifx\csname#1\endcsname\relax
7881 \expandafter\@firstoftwo
7882 \else
7883 \expandafter\@secondoftwo
7884 \fi}
7885 \def\@expandtwoargs#1#2#3{%
7886 \edef\reserved@a{\noexpand#1{#2}{#3}}\reserved@a}
7887 \def\zap@space#1 #2{%
7888 #1%
7889 \ifx#2\@empty\else\expandafter\zap@space\fi
7890 #2}
7891 \let\bbl@trace\@gobble
7892 \def\bbl@error#1#2{%
7893 \begingroup
7894 \newlinechar=`\^^J
7895 \def\\{^^J(babel) }%
7896 \errhelp{#2}\errmessage{\\#1}%
7897 \endgroup}
7898 \def\bbl@warning#1{%
7899 \begingroup
7900 \newlinechar=`\^^J
7901 \def\\{^^J(babel) }%
7902 \message{\\#1}%
7903 \endgroup}
7904 \let\bbl@infowarn\bbl@warning
7905 \def\bbl@info#1{%
7906 \begingroup
7907 \newlinechar=`\^^J
7908 \def\\{^^J}%
7909 \wlog{#1}%
7910 \endgroup}

LATEX 2ε has the command \@onlypreamble which adds commands to a list of commands that are no
longer needed after \begin{document}.

7911 \ifx\@preamblecmds\@undefined
7912 \def\@preamblecmds{}
7913 \fi
7914 \def\@onlypreamble#1{%
7915 \expandafter\gdef\expandafter\@preamblecmds\expandafter{%
7916 \@preamblecmds\do#1}}
7917 \@onlypreamble\@onlypreamble

Mimick LATEX’s \AtBeginDocument; for this to work the user needs to add \begindocument to his file.

7918 \def\begindocument{%
7919 \@begindocumenthook
7920 \global\let\@begindocumenthook\@undefined
7921 \def\do##1{\global\let##1\@undefined}%
7922 \@preamblecmds
7923 \global\let\do\noexpand}

7924 \ifx\@begindocumenthook\@undefined
7925 \def\@begindocumenthook{}
7926 \fi
7927 \@onlypreamble\@begindocumenthook

215
7928 \def\AtBeginDocument{\g@addto@macro\@begindocumenthook}

We also have to mimick LATEX’s \AtEndOfPackage. Our replacement macro is much simpler; it stores
its argument in \@endofldf.

7929 \def\AtEndOfPackage#1{\g@addto@macro\@endofldf{#1}}
7930 \@onlypreamble\AtEndOfPackage
7931 \def\@endofldf{}
7932 \@onlypreamble\@endofldf
7933 \let\bbl@afterlang\@empty
7934 \chardef\bbl@opt@hyphenmap\z@

LATEX needs to be able to switch off writing to its auxiliary files; plain doesn’t have them by default.
There is a trick to hide some conditional commands from the outer \ifx. The same trick is applied
below.

7935 \catcode`\&=\z@
7936 \ifx&if@filesw\@undefined
7937 \expandafter\let\csname if@filesw\expandafter\endcsname
7938 \csname iffalse\endcsname
7939 \fi
7940 \catcode`\&=4

Mimick LATEX’s commands to define control sequences.

7941 \def\newcommand{\@star@or@long\new@command}
7942 \def\new@command#1{%
7943 \@testopt{\@newcommand#1}0}
7944 \def\@newcommand#1[#2]{%
7945 \@ifnextchar [{\@xargdef#1[#2]}%
7946 {\@argdef#1[#2]}}
7947 \long\def\@argdef#1[#2]#3{%
7948 \@yargdef#1\@ne{#2}{#3}}
7949 \long\def\@xargdef#1[#2][#3]#4{%
7950 \expandafter\def\expandafter#1\expandafter{%
7951 \expandafter\@protected@testopt\expandafter #1%
7952 \csname\string#1\expandafter\endcsname{#3}}%
7953 \expandafter\@yargdef \csname\string#1\endcsname
7954 \tw@{#2}{#4}}
7955 \long\def\@yargdef#1#2#3{%
7956 \@tempcnta#3\relax
7957 \advance \@tempcnta \@ne
7958 \let\@hash@\relax
7959 \edef\reserved@a{\ifx#2\tw@ [\@hash@1]\fi}%
7960 \@tempcntb #2%
7961 \@whilenum\@tempcntb <\@tempcnta
7962 \do{%
7963 \edef\reserved@a{\reserved@a\@hash@\the\@tempcntb}%
7964 \advance\@tempcntb \@ne}%
7965 \let\@hash@##%
7966 \l@ngrel@x\expandafter\def\expandafter#1\reserved@a}
7967 \def\providecommand{\@star@or@long\provide@command}
7968 \def\provide@command#1{%
7969 \begingroup
7970 \escapechar\m@ne\xdef\@gtempa{{\string#1}}%
7971 \endgroup
7972 \expandafter\@ifundefined\@gtempa
7973 {\def\reserved@a{\new@command#1}}%
7974 {\let\reserved@a\relax
7975 \def\reserved@a{\new@command\reserved@a}}%
7976 \reserved@a}%

7977 \def\DeclareRobustCommand{\@star@or@long\declare@robustcommand}
7978 \def\declare@robustcommand#1{%
7979 \edef\reserved@a{\string#1}%
7980 \def\reserved@b{#1}%
7981 \edef\reserved@b{\expandafter\strip@prefix\meaning\reserved@b}%

216
7982 \edef#1{%
7983 \ifx\reserved@a\reserved@b
7984 \noexpand\x@protect
7985 \noexpand#1%
7986 \fi
7987 \noexpand\protect
7988 \expandafter\noexpand\csname
7989 \expandafter\@gobble\string#1 \endcsname
7990 }%
7991 \expandafter\new@command\csname
7992 \expandafter\@gobble\string#1 \endcsname
7993 }
7994 \def\x@protect#1{%
7995 \ifx\protect\@typeset@protect\else
7996 \@x@protect#1%
7997 \fi
7998 }
7999 \catcode`\&=\z@ % Trick to hide conditionals
8000 \def\@x@protect#1&fi#2#3{&fi\protect#1}

The following little macro \in@ is taken from latex.ltx; it checks whether its first argument is part
of its second argument. It uses the boolean \in@; allocating a new boolean inside conditionally
executed code is not possible, hence the construct with the temporary definition of \bbl@tempa.

8001 \def\bbl@tempa{\csname newif\endcsname&ifin@}


8002 \catcode`\&=4
8003 \ifx\in@\@undefined
8004 \def\in@#1#2{%
8005 \def\in@@##1#1##2##3\in@@{%
8006 \ifx\in@##2\in@false\else\in@true\fi}%
8007 \in@@#2#1\in@\in@@}
8008 \else
8009 \let\bbl@tempa\@empty
8010 \fi
8011 \bbl@tempa

LATEX has a macro to check whether a certain package was loaded with specific options. The
command has two extra arguments which are code to be executed in either the true or false case.
This is used to detect whether the document needs one of the accents to be activated (activegrave and
activeacute). For plain TEX we assume that the user wants them to be active by default. Therefore the
only thing we do is execute the third argument (the code for the true case).

8012 \def\@ifpackagewith#1#2#3#4{#3}

The LATEX macro \@ifl@aded checks whether a file was loaded. This functionality is not needed for
plain TEX but we need the macro to be defined as a no-op.

8013 \def\@ifl@aded#1#2#3#4{}

For the following code we need to make sure that the commands \newcommand and \providecommand
exist with some sensible definition. They are not fully equivalent to their LATEX 2ε versions; just
enough to make things work in plain TEXenvironments.

8014 \ifx\@tempcnta\@undefined
8015 \csname newcount\endcsname\@tempcnta\relax
8016 \fi
8017 \ifx\@tempcntb\@undefined
8018 \csname newcount\endcsname\@tempcntb\relax
8019 \fi

To prevent wasting two counters in LATEX (because counters with the same name are allocated later
by it) we reset the counter that holds the next free counter (\count10).

8020 \ifx\bye\@undefined
8021 \advance\count10 by -2\relax
8022 \fi
8023 \ifx\@ifnextchar\@undefined
8024 \def\@ifnextchar#1#2#3{%
8025 \let\reserved@d=#1%

217
8026 \def\reserved@a{#2}\def\reserved@b{#3}%
8027 \futurelet\@let@token\@ifnch}
8028 \def\@ifnch{%
8029 \ifx\@let@token\@sptoken
8030 \let\reserved@c\@xifnch
8031 \else
8032 \ifx\@let@token\reserved@d
8033 \let\reserved@c\reserved@a
8034 \else
8035 \let\reserved@c\reserved@b
8036 \fi
8037 \fi
8038 \reserved@c}
8039 \def\:{\let\@sptoken= } \: % this makes \@sptoken a space token
8040 \def\:{\@xifnch} \expandafter\def\: {\futurelet\@let@token\@ifnch}
8041 \fi
8042 \def\@testopt#1#2{%
8043 \@ifnextchar[{#1}{#1[#2]}}
8044 \def\@protected@testopt#1{%
8045 \ifx\protect\@typeset@protect
8046 \expandafter\@testopt
8047 \else
8048 \@x@protect#1%
8049 \fi}
8050 \long\def\@whilenum#1\do #2{\ifnum #1\relax #2\relax\@iwhilenum{#1\relax
8051 #2\relax}\fi}
8052 \long\def\@iwhilenum#1{\ifnum #1\expandafter\@iwhilenum
8053 \else\expandafter\@gobble\fi{#1}}

20.4 Encoding related macros


Code from ltoutenc.dtx, adapted for use in the plain TEX environment.
8054 \def\DeclareTextCommand{%
8055 \@dec@text@cmd\providecommand
8056 }
8057 \def\ProvideTextCommand{%
8058 \@dec@text@cmd\providecommand
8059 }
8060 \def\DeclareTextSymbol#1#2#3{%
8061 \@dec@text@cmd\chardef#1{#2}#3\relax
8062 }
8063 \def\@dec@text@cmd#1#2#3{%
8064 \expandafter\def\expandafter#2%
8065 \expandafter{%
8066 \csname#3-cmd\expandafter\endcsname
8067 \expandafter#2%
8068 \csname#3\string#2\endcsname
8069 }%
8070 % \let\@ifdefinable\@rc@ifdefinable
8071 \expandafter#1\csname#3\string#2\endcsname
8072 }
8073 \def\@current@cmd#1{%
8074 \ifx\protect\@typeset@protect\else
8075 \noexpand#1\expandafter\@gobble
8076 \fi
8077 }
8078 \def\@changed@cmd#1#2{%
8079 \ifx\protect\@typeset@protect
8080 \expandafter\ifx\csname\cf@encoding\string#1\endcsname\relax
8081 \expandafter\ifx\csname ?\string#1\endcsname\relax
8082 \expandafter\def\csname ?\string#1\endcsname{%
8083 \@changed@x@err{#1}%
8084 }%

218
8085 \fi
8086 \global\expandafter\let
8087 \csname\cf@encoding \string#1\expandafter\endcsname
8088 \csname ?\string#1\endcsname
8089 \fi
8090 \csname\cf@encoding\string#1%
8091 \expandafter\endcsname
8092 \else
8093 \noexpand#1%
8094 \fi
8095 }
8096 \def\@changed@x@err#1{%
8097 \errhelp{Your command will be ignored, type <return> to proceed}%
8098 \errmessage{Command \protect#1 undefined in encoding \cf@encoding}}
8099 \def\DeclareTextCommandDefault#1{%
8100 \DeclareTextCommand#1?%
8101 }
8102 \def\ProvideTextCommandDefault#1{%
8103 \ProvideTextCommand#1?%
8104 }
8105 \expandafter\let\csname OT1-cmd\endcsname\@current@cmd
8106 \expandafter\let\csname?-cmd\endcsname\@changed@cmd
8107 \def\DeclareTextAccent#1#2#3{%
8108 \DeclareTextCommand#1{#2}[1]{\accent#3 ##1}
8109 }
8110 \def\DeclareTextCompositeCommand#1#2#3#4{%
8111 \expandafter\let\expandafter\reserved@a\csname#2\string#1\endcsname
8112 \edef\reserved@b{\string##1}%
8113 \edef\reserved@c{%
8114 \expandafter\@strip@args\meaning\reserved@a:-\@strip@args}%
8115 \ifx\reserved@b\reserved@c
8116 \expandafter\expandafter\expandafter\ifx
8117 \expandafter\@car\reserved@a\relax\relax\@nil
8118 \@text@composite
8119 \else
8120 \edef\reserved@b##1{%
8121 \def\expandafter\noexpand
8122 \csname#2\string#1\endcsname####1{%
8123 \noexpand\@text@composite
8124 \expandafter\noexpand\csname#2\string#1\endcsname
8125 ####1\noexpand\@empty\noexpand\@text@composite
8126 {##1}%
8127 }%
8128 }%
8129 \expandafter\reserved@b\expandafter{\reserved@a{##1}}%
8130 \fi
8131 \expandafter\def\csname\expandafter\string\csname
8132 #2\endcsname\string#1-\string#3\endcsname{#4}
8133 \else
8134 \errhelp{Your command will be ignored, type <return> to proceed}%
8135 \errmessage{\string\DeclareTextCompositeCommand\space used on
8136 inappropriate command \protect#1}
8137 \fi
8138 }
8139 \def\@text@composite#1#2#3\@text@composite{%
8140 \expandafter\@text@composite@x
8141 \csname\string#1-\string#2\endcsname
8142 }
8143 \def\@text@composite@x#1#2{%
8144 \ifx#1\relax
8145 #2%
8146 \else
8147 #1%

219
8148 \fi
8149 }
8150 %
8151 \def\@strip@args#1:#2-#3\@strip@args{#2}
8152 \def\DeclareTextComposite#1#2#3#4{%
8153 \def\reserved@a{\DeclareTextCompositeCommand#1{#2}{#3}}%
8154 \bgroup
8155 \lccode`\@=#4%
8156 \lowercase{%
8157 \egroup
8158 \reserved@a @%
8159 }%
8160 }
8161 %
8162 \def\UseTextSymbol#1#2{#2}
8163 \def\UseTextAccent#1#2#3{}
8164 \def\@use@text@encoding#1{}
8165 \def\DeclareTextSymbolDefault#1#2{%
8166 \DeclareTextCommandDefault#1{\UseTextSymbol{#2}#1}%
8167 }
8168 \def\DeclareTextAccentDefault#1#2{%
8169 \DeclareTextCommandDefault#1{\UseTextAccent{#2}#1}%
8170 }
8171 \def\cf@encoding{OT1}

Currently we only use the LATEX 2ε method for accents for those that are known to be made active in
some language definition file.
8172 \DeclareTextAccent{\"}{OT1}{127}
8173 \DeclareTextAccent{\'}{OT1}{19}
8174 \DeclareTextAccent{\^}{OT1}{94}
8175 \DeclareTextAccent{\`}{OT1}{18}
8176 \DeclareTextAccent{\~}{OT1}{126}

The following control sequences are used in babel.def but are not defined for plain TEX.
8177 \DeclareTextSymbol{\textquotedblleft}{OT1}{92}
8178 \DeclareTextSymbol{\textquotedblright}{OT1}{`\"}
8179 \DeclareTextSymbol{\textquoteleft}{OT1}{`\`}
8180 \DeclareTextSymbol{\textquoteright}{OT1}{`\'}
8181 \DeclareTextSymbol{\i}{OT1}{16}
8182 \DeclareTextSymbol{\ss}{OT1}{25}

For a couple of languages we need the LATEX-control sequence \scriptsize to be available. Because
plain TEX doesn’t have such a sofisticated font mechanism as LATEX has, we just \let it to \sevenrm.
8183 \ifx\scriptsize\@undefined
8184 \let\scriptsize\sevenrm
8185 \fi

And a few more “dummy” definitions.


8186 \def\languagename{english}%
8187 \let\bbl@opt@shorthands\@nnil
8188 \def\bbl@ifshorthand#1#2#3{#2}%
8189 \let\bbl@language@opts\@empty
8190 \ifx\babeloptionstrings\@undefined
8191 \let\bbl@opt@strings\@nnil
8192 \else
8193 \let\bbl@opt@strings\babeloptionstrings
8194 \fi
8195 \def\BabelStringsDefault{generic}
8196 \def\bbl@tempa{normal}
8197 \ifx\babeloptionmath\bbl@tempa
8198 \def\bbl@mathnormal{\noexpand\textormath}
8199 \fi
8200 \def\AfterBabelLanguage#1#2{}
8201 \ifx\BabelModifiers\@undefined\let\BabelModifiers\relax\fi

220
8202 \let\bbl@afterlang\relax
8203 \def\bbl@opt@safe{BR}
8204 \ifx\@uclclist\@undefined\let\@uclclist\@empty\fi
8205 \ifx\bbl@trace\@undefined\def\bbl@trace#1{}\fi
8206 \expandafter\newif\csname ifbbl@single\endcsname
8207 \chardef\bbl@bidimode\z@
8208 hh/Emulate LaTeXii

A proxy file:

8209 h∗plaini
8210 \input babel.def
8211 h/plaini

21 Acknowledgements
I would like to thank all who volunteered as β -testers for their time. Michel Goossens supplied
contributions for most of the other languages. Nico Poppelier helped polish the text of the
documentation and supplied parts of the macros for the Dutch language. Paul Wackers and
Werenfried Spit helped find and repair bugs.
During the further development of the babel system I received much help from Bernd Raichle, for
which I am grateful.

References
[1] Huda Smitshuijzen Abifares, Arabic Typography, Saqi, 2001.
[2] Johannes Braams, Victor Eijkhout and Nico Poppelier, The development of national LATEX styles,
TUGboat 10 (1989) #3, p. 401–406.
[3] Yannis Haralambous, Fonts & Encodings, O’Reilly, 2007.
[4] Donald E. Knuth, The TEXbook, Addison-Wesley, 1986.
[5] Jukka K. Korpela, Unicode Explained, O’Reilly, 2006.
[6] Leslie Lamport, LATEX, A document preparation System, Addison-Wesley, 1986.
[7] Leslie Lamport, in: TEXhax Digest, Volume 89, #13, 17 February 1989.
[8] Ken Lunde, CJKV Information Processing, O’Reilly, 2nd ed., 2009.
[9] Edward M. Reingold and Nachum Dershowitz, Calendrical Calculations: The Ultimate Edition,
Cambridge University Press, 2018
[10] Hubert Partl, German TEX, TUGboat 9 (1988) #1, p. 70–72.
[11] Joachim Schrod, International LATEX is ready to use, TUGboat 11 (1990) #1, p. 87–90.
[12] Apostolos Syropoulos, Antonis Tsolomitis and Nick Sofroniu, Digital typography using LATEX,
Springer, 2002, p. 301–373.
[13] K.F. Treebus. Tekstwijzer, een gids voor het grafisch verwerken van tekst, SDU Uitgeverij
(’s-Gravenhage, 1988).

221

You might also like