JHPCichosz Grabowski Pezik

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/369305194
Formulaic language in Old English prose: a corpus-driven functional analysis
Preprint · March 2023
CITATIONS READS
0 7
3 authors, including:
Anna Cichosz Piotr Pezik

University of Lodz University of Lodz
27 PUBLICATIONS 64 CITATIONS 63 PUBLICATIONS 641 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
PELCRA Learner English Corpus (PLEC) View project
Understanding translation through corpora View project
All content following this page was uploaded by Anna Cichosz on 17 March 2023.
The user has requested enhancement of the downloaded file.

Formulaic language in Old English prose:
a corpus-driven functional analysis1
ABSTRACT
Although there has been a plethora of research on formulaic language in contemporary English,
conducted with various purposes in mind (descriptive, applied and otherwise), studies of formulaic
phrasings in Old English texts are rare. In this paper, we employ selected corpus linguistics methods to
identify and explore the use and discoursal functions of recurrent multi-word items that contribute the
most to the formulaicity of homilies, chronicles and biblical translations, which are the Old English text
types under scrutiny. The findings of this primarily descriptive and exploratory research provide new
insights into the pragmatic functions of Old English recurrent phraseological units as well as into the
structure and communicative functions of the analysed text varieties. Finally, the study results cast some
new light on the role those formulaic phrasings play in Old English prose.
Keywords: formulaic language, corpus-driven approach, recurrent n-grams, discoursal functions, text
types, Old English prose
1. Introduction
The use of linguistic formulas associated with a number of different discourse functions is a well-known
phenomenon attested in various contemporary languages including English (Pawley & Syder 1983;
Wray 2002; 2008; Kuiper 2009; Schmitt 2010; Wood 2015; Buerki 2020; Sidtis 2021). Nevertheless,
we know quite little about the evolution of formulaic language over time as well as the form and
function of formulas in the earliest stages of English. Even though it is quite logical to assume that the
elements of the language system which undergo change are not only phonemes, morphemes, words and
grammatical structures but also formulaic sequences, the mechanisms involved in the formation,
modification and loss of formulas used for a specific set of discourse functions at the individual stages
of English are a largely uncharted territory. English appears to be a very good subject for a historical
study of formulaic language since its textual records cover a wide timespan of over 1200 years, and
currently, thanks to the available electronic corpora, the identification of textual formulas even in the
earliest English texts is technically possible.
In this exploratory and descriptive study, undertaken with the use of both quantitative and qualitative
methods offered by corpus linguistics, we aim to identify the most salient formulaic sequences attested
1
Study funded by NCN SONATA 13 nr 2017/26/D/HS2/00272 “The variation of syntactic and phraseological
constructions in Old English Prose” (2018-2022), grant holder: Anna Cichosz.
1
in three different genres of Old English (OE) prose, i.e., homilies, chronicles and biblical translations,
and to determine the discourse function(s) associated with the identified formulas in these particular
text types. As such, the study falls within the scope of corpus pragmatics, which is an emerging research
area that “integrates qualitative methodology typical of pragmatics with the quantitative methodology
predominant in corpus linguistics” (Ruehlemann & Clancy 2018: 241). We believe that this study is an
important step into the direction of function-oriented research on OE text types and discourse
organisation, and that the use of new corpus methods may shed some new light on the limited textual
records of OE.
The paper is divided into six sections, including this short introduction. Section 2 presents the basic
information on formulaic language and the relation between formulas and their discourse functions. In
Section 3 we discuss existing studies on OE discourse. Section 4 is a detailed overview of our
methodology, including the typology of discourse functions designed for the study. Results are
presented and discussed in Section 5. Section 6 summarises and concludes the study.
2. Corpus linguistics and formulaic language
Broadly speaking, formulaic language is an umbrella term used by linguists of various schools to
describe the phenomenon of linguistic prefabrication, that is, the use, processing, storage and retrieval
from mental lexicon of the many different types of multi-word items that are used in written texts or
speech as single wholes and with particular purposes in mind (Wray & Perkins 2000; Wray 2002, 2008;
Biber 2009; Wood 2015; Pęzik 2018; Buerki 2020). Apart from instances of creative language use, texts
also contain certain repeated and conventionalised formulaic phrasings, both single- and multi-word
items, performing specific discoursal and pragmatic functions and characteristic of particular speech
acts (Pawley 2007: 3-4). The fact that in order to express a given idea or perform a pragmatic function
language users employ various types of linguistic units – including semantically compositional and non-
compositional ones (e.g. lexical bundles vs. idioms or proverbs) – means that all these multi-word units
have all the hallmarks of formulaicity. The inventory of formulaic language also includes the so-called
pragmatemes, which are defined as “pragmatically constrained lexical entities” (Mel’cuk 2020: 18) or
as cliches “constrained by the speech act situation” (Mel’cuk & Milicevic 2020: 112), which means that
in practice they are “unexchangeable in specific contexts by any other synonymous expression”, e.g.,
best before, sincerely yours (Wanner 1996 :13). Although pragmatemes are typically associated with
spoken language, we believe that our study will reveal some salient formulas that perform important
pragmatic functions in OE discourses, notably in terms of text organization and genre
conventionalization.
Given the different aspects of formulaic language (cf. Wray 2002, 2008), the phenomenon in question
has been studied from various perspectives by sociolinguists, psycholinguists, neurolinguists,
computational linguists, etc. In more linguistically-oriented research, the criteria used for identification
2
of formulaic language range from frequency of occurrence, grammatical structure, degree of semantic
compositionality, degree of fixedness, phonological form, fluency of use to stress and articulation
patterns (Wray 2002: 25-39). In this study, we adopt a corpus-linguistic perspective for the
identification and exploration of formulaic language in OE prose, which means that criteria such as
frequency of occurrence and pattern variability come to the fore as defining ones. A research paradigm
like this has been described in specialized literature as frequency-driven, data-driven or distributional
phraseology (Granger & Meunier 2008), and the major approaches that have developed over the last
three decades or so within the said paradigm include clusters (Scott & Tribble 2006), lexical bundles
(Biber et al. 1999), Pattern Grammar (Hunston & Francis 2000), phrase frames (Fletcher 2002) and
concgrams (Cheng et al. 2006), some of them being also the names of the units of analysis.
One of the most popular approaches among corpus linguists to study phraseology has been the lexical
bundles approach, originally presented by Biber, Johansson, Leech, Conrad and Finegan (1999) in
Longman Grammar of Spoken and Written English, and refined in later publications (e.g., Biber et al.
2003, 2004, 2006, 2009). It combines identification of recurrent multi-word items in texts with the
qualitative investigation of their discoursal and pragmatic functions. More precisely, lexical bundles
were defined and operationalized as contiguous sequences of three or more wordforms (so called n-
grams) that occur frequently in natural discourse and constitute lexical building blocks or text chunks
used frequently by language users in different situational and communicative contexts, e.g., I don’t
think, as a result, do you want, the nature of the (Biber et al. 1999: 990-991). As a rule, lexical bundles
are identified using a pre-determined frequency (e.g., 10 or 20 occurrences per million words) and
distribution threshold (e.g. a particular number or per cent of texts from a given register), the latter one
employed to eschew idiosyncratic uses typical of a single author. In general, the lexical bundles
approach has enjoyed considerable popularity in studies of academic (e.g., Biber 2006; Hyland 2008;
Chen & Baker 2010; Ädel & Erman 2012; Salazar 2014; Cao 2019) and specialized discourses (e.g.,
Goźdź-Roszkowski 2011; Fuster-Marquez 2014; Grabowski 2015), and the findings showed that the
types and functions of recurrent multi-word items vary across registers, text types and specialized
domains of knowledge. As for the discoursal functions performed by lexical bundles, they largely
correspond to three broad functional categories, namely referential ones, discourse-organizers and
expressing stance (Biber et al. 1999; Biber 2006).2 In short, referential bundles help identify an entity
or topic as well as their particular attributes as especially important in texts; discourse-organizers help
introduce, focus, clarify and elaborate on the topics signalled in texts and enhance text organization;
stance bundles help express attitudes, assessment and evaluation of information signalled in texts (Biber
2006: 139-145). However, the research on particular genres and text types showed that these three
coarse-grained categories are further subdivided into more fine-grained subcategories in order to reflect
2 Biber (2006: 174) argues lexical bundles “provide a kind of frame for expressing stance, discourse organization, or referential
status, associated with a slot for the expression of new information relative to that frame.” This coarse-grained typology has
been further modified by researchers (e.g., Hyland 2008; Simpson-Vlach & Ellis; Chen & Baker 2010, Ädel & Erman 2012).
3
the specificity of the structure and communicative functions of analysed text varieties (e.g., Hyland
2008; Goźdz-Roszkowski 2011; Grabowski 2015).
The lexical bundles approach has provided inspiration for us to conduct a study like this one, based on
a custom-designed corpus of OE prose, which will be described in greater detail in the methodological
section. Due to specificity of the study corpus, notably a varying size of the sub-corpora of particular
text types, we will not apply a distribution threshold for identification of salient recurrent n-grams, the
term that we use throughout this study - interchangeably with recurrent multi-word items – with
reference to the unit of analysis. Instead, we will use an additional criterion of coverage (Forsyth 2015a,
2015b, 2021), which has already proven to be useful in fine tuning the lists of lexical bundles
(Grabowski & Jukneviciene 2016). Furthermore, we refer to such an approach to identification of
recurrent phraseologies as a corpus-driven (or data-driven) one primarily because we start our
exploration of the corpus without adopting any specific theoretical model of language description or
pre-defined grammatical (e.g., morphological, syntactic or semantic) categories. Hence, our findings,
that is, an inventory of recurrent n-grams with the largest coverage in the corpus, and functional
categories based on their textual and discoursal roles, emerge from the data rather than the other way
round. This should enable us to identify those recurrent textual patterns that contribute the most to the
formulaic nature of OE prose and, in turn, fill in the gap in data-driven studies on formulaic language
in OE texts, an area which has been relatively underexplored so far, as we show in the following section.
3. Studies of Old English texts from a discourse perspective
For an early medieval language, OE is relatively well recorded. The texts available for study include
the famous Anglo-Saxon alliterative poetry, the Anglo-Saxon chronicle, collections of laws and
documents, homilies, religious treatises and numerous translations and glosses of Latin texts. All in all,
the OE textual records amount to circa 4 million words, which is a very good result for a language
spoken more than a thousand years ago. Nevertheless, analysing OE has its typical limitations reflecting
the traditional “bad-data problem” (Labov 1972, 1994). The texts produced in Anglo-Saxon England
were all written down by educated male speakers of OE from a restricted age group and we have no
access to the language of simple farmers or craftsmen, women or children (Hogg 2006: 395).
Furthermore, Anglo-Saxon England was “an oral rather than a literate society” (Lenker 2012: 326),
while all the records of OE are naturally written, so our knowledge of OE is based on a very limited
material which may not represent the main linguistic habits of the speakers. Finally, the prose texts
produced in OE, even though they are not all translations, rely heavily on Latin models (Stanton 2002)
so “we cannot be sure whether the speech conventions recorded there echo actual OE speech interaction
or whether they were typical of the Latin discourse tradition or a hybrid Anglo-Saxon/Latin tradition”
(Lenker 2012: 326).
4
Regardless of these limitations, OE discourse has received some scholarly attention over the recent
years. Some of the existing studies identify and describe the functions of OE discourse markers such as
hwæt ‘what’, hwæt þa ‘what then’, nu ‘now’, soþlice ‘truly’ and witodlice ‘truly’ (Brinton 2017, Brinton
2010, Lenker 2000, Louviot 2018). Another line of research is focused on the narrative structure of OE
texts, mostly in relation to the functions of the narrative-sequencing adverb þa ‘then’ (Kemenade &
Links 2020, Enkvist 1972, Wårvik 2011, Wårvik 2013a, Wårvik 2013b) and the role of verb-initial
declarative clauses as transition markers (Petrova 2006, Calle-Martin & Miranda-Garcia 2010, Cichosz
2020). In addition, thanks to the inflectional system, OE sentence structure was rather flexible and
allowed for a wide range of possible word order patterns, which make this language an interesting topic
for the study of information structure. Since Bech (2001) showed that the exact order of the OE clause
reflects an interplay between syntactic constraints and pragmatic tendencies, numerous studies of the
impact of information value on the position of clause constituents have been published (Bech 2012,
Kemenade & Westergaard 2012, Pintzuk & Taylor 2012, Struik & Kemenade 2018, Los 2009).
Finally, let us note that even though some aspects of the OE discourse structure are relatively well
studied, one of them is clearly underexplored, and that is formulaicity. Despite the fact that Magoun
(1953) noted the existence of formulaic expressions in OE poetry more than half a century ago, recurrent
phraseological units in OE prose have not been identified and studied in a comprehensive and systematic
way. In fact, such studies are rare for all the historical stages of English3 and “it is still correct to speak
of a Cinderella status of historical (English) phraseology” (Knappe 2012: 183). Thus, we know nothing
about the nature and function of recurrent multi-word expressions in OE prose, most probably due to
technical limitations for such research. Since OE was an inflected language and its spelling was not
standardised, the identification of formulaic phrases is difficult without a lemmatised corpus where the
morphological and spelling differences could be eliminated, enabling the automatic extraction of
lexically recurrent units. This observation provided motivation to undertake a study like this one,
involving preparation of a suitable corpus for exploration of formulaic language, which will be
described, together with the scope and stages of the study, in the following methodological section.
4. Methodology
4.1. Research material
The study is based on the syntactically annotated YCOE corpus (Taylor et al. 2003), which contains
100 prose texts of different length, i.e. one copy of every existing OE prose text except word-for-word
glosses from Latin. The corpus constitutes the basic research tool for any corpus-based studies of OE
syntax thanks to part of speech tagging and syntactic annotation of phrase and clause types. The texts
3 One of the infrequent exceptions is Kopaczyk (2013).
5
included in the corpus represent various genres and amount to 1.5 million words in total, as shown in
Table 1. For the purposes of this study, we have decided to analyse homilies, historical texts and biblical
translations (even though lives of saints are very well-represented in the corpus, they seem stylistically
similar to homilies).
Text type Words (tokens) Texts

Homilies 390,925 9
Biography Lives 298,316 18
History 236,165 6
Bible 136,948 4
Religious treatise 129,993 18
Handbooks medicine 68,315 4
Philosophy 50,623 2
Rule 38,490 2
Laws 20,807 10
Apocrypha 19,867 5
Science 15,738 2
Charters and Wills 11,906 6
Ecclesiastical laws 11,309 4
Travelogue 7,271 1
Fiction 6,545 1
Preface 4,302 6
Geography 1,891 1
Epilogue 965 1
Total 145,0376 100
Table 1. The structure of the YCOE corpus.
Since the corpus is not lemmatised, it was necessary to align every word to its basic dictionary form
(nominative singular for nouns and their modifiers, infinitives for verbs, one basic spelling form for
uninflected parts of speech). This task was performed by the team working for our research project4.
All the word forms (ca. 82,500 in total) were extracted from YCOE by means of the LEXICON function
in CorpusSearch 2 (Randall et al. 2006). The process was conducted for each part of speech separately
and then every set was manually lemmatised with the help of The Dictionary of Old English: A to I
(diPaolo Healey et al.) and The Bosworth-Toller Anglo-Saxon Dictionary. Proper nouns received
separate treatment: since most of them were not included in either dictionary and they are mostly very
low-frequency items, we decided not to lemmatise them but to divide them into general sematic
categories (names, toponyms, demonyms and other). The process of lemmatisation was automated for
prefixed verbs (we used verbal lemmas as the basis), though extensive manual correction was still
4NCN SONATA 13 nr 2017/26/D/HS2/00272 “The variation of syntactic and phraseological constructions in

Old English Prose” (2018-2022), grant holder: Anna Cichosz.
6
necessary. The lemmatised data were then imported into a relational database and processed according
to the procedure described in subsection 4.3.
4.2. Research questions
This primarily descriptive and exploratory study on linguistic variation among recurrent text chunks in
Old English prose aims to provide answers to the following research questions:
1) What are the recurrent formulaic phrasings found in the custom-designed corpus of OE prose?
To what extent do they contribute to the formulaicity of the analysed text types?
2) What are the most important discoursal functions of such formulaic phrasings in OE prose? Are
those functions similar or different across the three text types (homilies, chronicles and biblical
translations)?
3) Has there been any impact of the formulaic phrasings onto contemporary English? Are there
any linguistic items that have been also been frequently described in corpus linguistic research
on formulaic language in contemporary texts?
In order to find answers to these questions, the study has been conducted following the methodology
detailed in the following section.
4.3. Research procedures and study stages
This primarily descriptive research has been conducted in two stages, namely identification of recurrent
text chunks with the largest coverage in the study corpus, followed by the functional, largely qualitative,
analysis of their discoursal functions. In order to identify salient formulas, we used the Formulex
method (Forsyth 2015b, 2021) that enables one to identify recurrent n-grams with the largest coverage
in the corpus. Earlier research (e.g., Grabowski & Jukneviciene 2016; Grabowski 2019) showed that
the Formulex method may provide complementary data with respect to conservative n-grams or lexical
bundles (Biber et al. 1999) approaches and that it helps one deal with overlapping n-grams or with
shorter n-grams that constitute fragments of longer ones. From a technical point of view, the Formulex
method treats coverage as a binary category. This implies that the number of n-grams that match a
particular text sequence is irrelevant: only the distinction between some and no coverage is considered.
Such binarization of coverage counts enables one to determine what lengths of n-grams are attested in
the data (Forsyth 2021: 33). In fact, what the program verifies is whether the text sequence is covered
or not (Forsyth 2015b: 13-14). The system first compiles an inventory of the most frequent n-grams of
a range of sizes (3- to 7-grams in the present case). Then it attempts to cover the main text with them.
For example, assume that the n-gram inventory contains both 4-grams below:
• hym andswarode and cwæð ‘him answered and said’

• se hælend hym andswarode ‘the Saviour him answered’
7
Then, when processing a sequence in the scrutinized text, such as se hælend hym andswarode and
cwæð ‘the Saviour him answered and said’ the first four and the last four words in the sequence of six
will be marked as covered. The middle 2 words (hym andswarode) are marked twice, by both 4-
grams, but this fact is ignored for the purpose of identifying a covered sequence. In this example a
covered sequence of six elements will be identified.
Based on this approach, the proportion of covered to uncovered word tokens (or characters, if chosen
as an option) in each text is calculated and, subsequently, the relative coverage for each text category,
in this study – the individual sub-corpora of homilies, chronicles and Bible respectively – is aggregated
(Forsyth 2015b: 13-14). The Formulex method was implemented into the Formulib software (Forsyth
2015a), a tailor-made collection of Python scripts, which we used to identify recurrent text chunks,
based on n-grams of 3 to 7 words, with the largest coverage in the study corpus as well as in the three
sub-corpora. In summary, the program compiles an inventory of n-grams of specified lengths, a so-
called formulexicon, and then uses overall coverage by elements of that inventory as an index of a text’s
formulaicity (Forsyth 2021: 33).
In our study, we will first produce an inventory of text chunks with the largest coverage in the entire
study corpus of OE prose, followed by a separate procedure for individual text types. The latter
formulexicons will be analysed in greater detail in terms of their discourse functions in each text type
under scrutiny in order to further explore the roles those formulaic phrasings play in the formation of
OE prose.
In view of the above, we will apply a synthetic functional typology of recurrent multi-word items known
in specialized literature as lexical bundles (Biber et al. 1999), which have been studied in a variety of
discourses found in contemporary English, e.g., academic (Biber et al. 2003, 2004; Biber 2006; Hyland
2008), legal (Goźdz-Roszkowski 2011; Kopaczyk 2012; Breeze 2013; Lehto 2018), pharmaceutical
(Grabowski 2015). These and other similar studies showed that recurrent multi-word items, due to their
unusually high frequency and dense distribution, may account for the centres of units of meaning in
texts, perform important discoursal functions and contribute to the texts’ formulaicity. Also, as it was
shown that the functional distinctions are genre- or register-specific, we also modified the functional
typology used in this study so that it better reflects the specificity of OE text types, notably their generic,
structural and communicative features. Another rationale behind this modification was also the fact that
no study like this one, i.e., a functional characteristics of recurrent OE multi-word items, has been
conducted before.
In general, the formulaic phrasings have been divided into three coarse-grained categories of referential,
discourse-organizing and stance expressions, the last one found to be rather marginal among the most
distinctive phrasings identified in this study. Thus, the two main categories of discoursal functions are
referential expressions and discourse-organizers. The former ones, which refer to key ideas, concepts,
8
actions etc. described in texts, have been further divided into the following, more fine-grained sub-
categories performing specific discoursal functions:
- concept-related (e.g., seo soþe lufu ‘the true love’, se halga wer ‘the holy man’), which refer to
ideas, concepts and themes broached upon in texts and which largely constitute noun phrases
and nominalizations;
- process-related (e.g., þa for he ‘then travelled he’, þa geseah he ‘then saw he’), which refer to
activities, actions and processes described in texts and which are typically verb-based
constructions;
- attributive markers (e.g., to þam halgan ‘to the holy’, to þam ecan ‘to the eternal’), which
identify specific attributes of the following head nouns;
- topic elaboration markers (e.g., mid þysum wordum ‘with these words’, and þæt folc ‘and the
people’), which are used to elaborate on a theme signalled earlier in the text;
- location markers (e.g., and be norþan ‘and to the north’, on heofenan rice ‘in heavenly
kingdom’), which are typically adjuncts that refer to specific locations described in texts;
- temporal markers (e.g., þy ilcan geare ‘the same year’, on þæm dagum ‘on the days’), which
are typically adjuncts that refer to time specifications.
The second category includes discourse-organizing phrasings, which have strictly pragmatic functions
of signalling coherence, cohesion or deixis as well as cause-and-effect relations. In other words, they
help signal relationships between different text segments, as argued by Biber (2006: 142), and they can
be provisionally divided into the following functional subcategories:
- reported-speech signals (e.g., he cwæð to ‘he said to’), which are markers of reported speech in
texts.
- focus markers (e.g., þe was gehatan ‘who was called’, þæt is on englisc ‘which is in English’),
which are used to more specific information on a theme signalled earlier in the text and hence
further narrow down the focus of information;
- transition markers (e.g., and eac se ‘and also the’), which are used to signal additive or
contrastive links between arguments presented in texts;
- interactional markers (e.g., ic eow secge ‘I say unto you’), which are used to emphasize what a
person is saying.
The assignment of discourse functions to formulaic phrasings with the largest coverage in the corpus
will be completed using qualitative analysis of concordances illustrating the use of the said phrasings
in their larger co-text and context. For each text type we analysed 50 recurrent n-grams with the greatest
coverage, so the proportions between discourse-organizers and referential units vary from genre to
genre. We have excluded the phrasings with the absolute frequency lower than 10 and we have only
9
taken into consideration sequences containing at least one content word (ignoring strictly functional
units such as complex conjunctions).
5. Empirical part: results
5.1. Homilies
The YCOE corpus contains a few sizeable collections of homilies: Ælfric's Catholic Homilies I and II
and his supplemental homilies (267,425 words in total), the anonymous Vercelli Homilies (45,674
words) and Blickling Homilies (42,506 words) as well as the homilies of Wulfstan (28,768 words). The
texts are classified as homilies in YCOE but in the medieval context the term is interchangeable with
sermon (Kohnen 2008: 142): while the main aim of a homily is to provide commentary to the scripture,
the sermon is meant to give instructions to the congregation with “no obligatory connection to a
liturgical event” (Heffernan 1984: 179 in Kohnen 2008: 142). This division is not clear-cut in such early
texts and when Ellison (1998:14) calls sermons “oral literature”, his observations may easily be
transferred to the OE homilies. It is plausible to assume that the homilies were meant to be delivered to
an actual audience, though of course it is impossible to determine if they were delivered prior to the
production of their written record or afterwards, how close the oral delivery was to the written form,
and whether they were ever actually delivered or only written down. For instance, Ælfric's Catholic
Homilies were carefully edited “to provide clergy with orthodox preaching material in the vernacular”
(Kleist 2001: 114), so they were planned as exemplary texts which priests could use as a model or
inspiration for their own homilies. Regardless of the actual mode of delivery, the homilies were directed
at a church audience, and the aim of this analysis is to check whether any formulaic phrases were used
by the preacher to draw his listeners’ attention to the message.
The data reveal that the recurrent lexical combinations with the largest coverage in the homilies
(0.8954%) are referential in function. The analysis identified 37 of them (15 with the highest coverage
are listed in Table 2), with the total frequency of 1,361. The coverage of discourse-organising bundles
is lower (0.3429%) and so is their type (13) and token frequency (468), cf. Table 3. Quite expectedly,
the recurrent concept-related units are usually centred around the nouns describing God, the holy trinity,
saints and the devil (se ælmihtiga god ‘the almighty god’, se halga gast ‘the holy ghost’, se halga wer
‘the holy man’, se ælmihtiga fæder ‘the almighty father’, seo halige þrynnys ‘the holy trinity’, se
awyrgda gast ‘the accursed spirit’) as in (1)-(2).
(1) se ælmihtiga god þurh his gife eow gescylde

the almighty god through his glory you protected
‘The almighty god protected you with his glory’ (cocathom2,+ACHom_II,_9:78.183.1567)
(2) se halga stephanus wearð ða afylled mid ðam halgum gaste
10
the holy Stephan became then filled with the holy ghost
‘St. Stephen was then filled with the holy spirit’ (cocathom1,+ACHom_I,_3:199.46.509)
Such concept-related phrases usually play the role of syntactic subjects or objects, or they appear in
prepositional phrases functioning as prepositional objects or adverbials of various kinds. A less typical
example is a very frequent recurrent phrase men þa leofestan ‘men the dearest’ presented in (3)-(4). Its
use is restricted to vocatives, which are clearly aimed at establishing some relation with the audience.
The phrase seems to have been treated by preachers as the basic form of address to the congregation,
though Ælfric used also an alternative vocative mine gebroðra þa leofostan ‘my brethren the dearest’
shown in (5).
(3) men þa leofestan, her sagaþ matheus se godspellere þætte

men the dearest here says Matthew the evangelist that
se hælend wære læded on westen
the saviour was led on desert
‘Men the dearest, here Matthew the Evangelist says that Jesus was led to the desert’
(coblick,HomS_10_[BlHom_3]:27.1.352)
(4) magon we nu ongitan, men þa leofestan, þætte ure ealra
may we now understand men the dearest that our all
ende swiðe mislice toweard nealæceð
end very swiftly towards approaches
‘Let us understand, men the dearest, that the end of us all is swiftly approaching’
(coverhom,HomS_36_[ScraggVerc_11]:89.1650)
(5) mine gebroðra þa leofostan ge gehyrdon on
my brethren the dearest you heard on
ðyssere godspellican rædinge. þæt ða synfullan
this gospel reading that the sinful
genealæhton to ðæs hælendes spræce
approached to the Saviour’s speech
‘My brethren the dearest, you have heard in the gospel reading that the sinful were drawn to the
words of Jesus’ (cocathom1,+ACHom_I,_2:191.42.340)
Since the homilies may be one of the few glimpses into the spoken language that the OE corpus offers,
it is interesting to note how often such interactive phrases were used in homilies in order to draw the
attention of listeners to an especially important fragment of the text. The occurrences of the phrase seem
to coincide with the places where the story from the Bible comes to an end and the homily moves on to
interpretation, so it is crucial for the hearers to focus on the message.
11
As far as location phrases are concerned, their main function is to signal contrast between life in this
world (on þysse worulde ‘on this world’, her on worulde ‘here on world’, on þisum middangearde ‘on
this world’, on þisum life ‘on this life’) and the afterlife (to þam ecan life ‘to the eternal life’, on heofenan
rice ‘in heavenly kingdom’, to heofena rice ‘to heavenly kingdom) as in (6) and (7) respectively. The
message is more than clear: getting to the kingdom of heaven should be one’s main aim in life, and the
alternative is eternal punishment, cf. (7).
(6) þa ðing þe we geseoð on þisum life: þa sind

the things that we see on this life these are
ateorgendlice
fleeting
‘The things we see in this life are fleeting’ (cocathom1,+ACHom_I,_18:321.126.3518
(7) ða rihtwisan nahwar syððan ne wuniað buton mid gode
the righteous nowhere later not dwell but with god
on heofenan rice & þa arleasan nahwar
on heavenly kingdom and the wicked nowhere
buton mid deofle on hellesuslum
but with devil in hell-house
‘The righteous then live nowhere but in heaven with God and the wicked in hell with the devil’
(cocathom1,+ACHom_I,_40:529.153.8009)
Finally, there is also a highly recurrent concept-related phrase seo ealde æ ‘the old law’, which in this
corpus is used almost exclusively by Ælfric, who very often emphasised the contrast between the old
(Jewish) law of the Old Testament and the new law described in the gospels as in (8). This need to
signal what belongs specifically to the old law may have been related to Ælfric’s reluctance towards
translating the Scripture into OE, based on the possibility of incorrect interpretation of the Old
Testament by an unlearned reader (Fox & Sharma 2012: 8).
(8) for þan ðe seo ealde .æ. wæs swilce scadu. & seo
because the old law was also shadow and the
niwe gecydnys. is soðfæstnyss: þurh hælendes gife
new testament is truth through saviour’s grace
‘Because the old law was darkness and the New Testament is the truth by the grace of Jesus’
(cocathom1,+ACHom_I,_25:382.96.4877)
N-gram Translation Coverage N Sub-type
se ælmihtiga god the almighty god 0.0666 76 concept-related
men þa leofestan men the dearest 0.0520 89 concept-related
12
se halga gast the holy ghost 0.0470 69 concept-related
on þysse worulde on this world 0.0452 62 location
se man þe the man who 0.0423 87 focus bundle
se halga wer the holy man 0.0386 61 concept-related
þæt ece lif the eternal life 0.0294 55 concept-related
her on worulde here on world 0.0292 40 location
on ðisum dæge on this day 0.0273 51 temporal
on þisum middangearde on this world 0.0254 29 location
on þisum life on this life 0.0219 41 location
to þam ecan life to the eternal life 0.0218 32 location
and mid miclum and with great 0.0218 32 topic elaboration
geond ealne middaneard throughout all world 0.0215 21 location
rihtum geleafan and right faith and 0.0207 25 concept-related
seo ealde æ the old law 0.0199 41 concept-related
Table 2. Referential n-grams in homilies.

Moving on to the discourse-organising phrases used in the homilies (cf. Table 3), it should be noted that
they are mostly reported speech signals (and þus cwæð ‘and thus said’, þa cwæð se ‘then said the’, þa
cwæð he ‘then said he’, etc.). Quoting appears in two contexts: the more frequent one is telling a story
where speech is reported, very often in the form of a longer dialogue with multiple exchanges as in (9).
The other context is providing commentary to a quote by an authority (God, Jesus, pope, a saint or a
prophet) as in (10).
(9) þa cwæð se hælend hyre to, gang clypa þinne wer,

then said the saviour her to go call your man
and cum hider þonne. heo cwæð him to andsware,
and come here then she said him to answer
næbbe ic nanne wer. þa cwæð se hælend …
not-have I no man then said the saviour
‘Then Jesus said to her: Go call your husband and come here again. She answered him: I have
no husband. Then Jesus said: …’ (coaelhom,+AHom_5:33.699- 5:35.703)
(10) þa cwæð he, ure hælend crist: nis min rice
then said he our saviour Christ not-is my kingdom
hionon of þyssum middangearde. wære hit, þonne campudun mine
hence of this world were it then fought my
13
þegnas for me, þæt ic nære iudeum seald. ac nu
thanes for me that I not-were Jews sold but now
nis min rice heonon. men þa leofostan, hwæt
not-is my kingdom hence men the dearest what
mænde ure dryhten mid þam wordum, þa he
meant our lord with these words when he
cwæð þæt his rice heonon ne wære of
said that his kingdom hence not were of
ðyssum middangearde?
this world
‘Then Jesus Christ said: “My kingdom is not from this world. If it were, then my thanes
would fight for me so that I would not be sold to the Jews, but my kingdom is not from
here.” Men the dearest, what did our lord mean with these words when he said that his
kingdom was not from this world?’
(coverhom,HomS_24_[ScraggVerc_1]:108.117- 111.119)
In addition, an interesting discourse-organising focus bundle with a clearly instructive function is þæt
is on englisc ‘that is in English’, which is used after a quote or phrase from Latin in the homilies, cf.
(11)-(12). Interestingly, this phrase is used only by Ælfric and Wulfstan, which shows their deeply
didactic approach encompassing not only theological but also linguistic issues.
(11) regnum dei intra uos est: þæt is on englisc, godes

[Latin] that is on English God’s
rice is betweox eow
kingdom is between you
‘Regnum dei intra uos est, that is in English: God’s kingdom is among you’
(coaelhom,+AHom_4:175.616)
(12) anticristus is on læden contrarius cristo, þæt is on
antichrist is on Latin [Latin] that is on
englisc, godes wið̠ersaca
English God’s enemy
‘Antichrist is contrarius cristo in Latin, that is in English God’s enemy’
(cowulf,WHom_1b:7.4)
Interestingly, when a Latin phrase or sentence is introduced into the anonymous homilies, the OE
translation follows the source text without this introductory phrase, cf. (13).
(13) Þa cwæð he, Pilatus, to Iudeum: Ecce rex uester,

then said he Pilate to Jews [Latin]
14
þis is eower cyning
this is your king
‘Then Pilate said to the Jews: Ecce rex uester, this is your king’
(coverhomE,HomS_24.1_[Scragg]:290.271)
Thus, while the anonymous homilies simply provide a translation, Ælfric and Wulfstan regularly name
the source and the target language, thus increasing the linguistic awareness of their audiences.
and þus cwæð and thus said 0.0431 59 reported-speech signal
þa cwæð se then said the X 0.0424 67 reported-speech signal
þæt is on englisc that is in English 0.0292 19 focus bundle
þe is gehaten which is called 0.0248 30 focus bundle
þa cwæð he then said he 0.0247 39 reported-speech signal
and het he and ordered he 0.0247 39 reported-speech signal
he cwæð to he said to 0.0247 39 reported-speech signal
swa swa se witega cwæð just as the prophet said 0.0219 18 reported-speech signal
his nama wæs his name was 0.0212 29 focus bundle
þa andwyrde se then answered the X 0.0210 27 reported-speech signal
cwæð to þam said to the X 0.0183 29 reported-speech signal
and eac se and also the X 0.0177 33 transition marker
Table 3. Discourse organising n-grams in homilies.
5.2. Chronicles
The YCOE texts classified as “history” include the OE translation of Venerable Bede's Historia
Ecclesiastica Gentis Anglorum (80,767 words), the adaptation of the Latin Historiae adversus paganos
by Paulus Orosius into OE (51,020 words) and four manuscripts of the Anglo-Saxon Chronicle: A
(14,583 words), C (22,463 words), D (26,691 words) and E (40,641 words).5
Similarly to the OE homilies, the more frequent recurrent phrases are referential in function: there are
42 types and 1,011 tokens in the analysed part of the corpus with the total coverage of 1.2130% (Table
4 shows the most frequent ones). The corresponding results for the discourse-organising phrases are
much lower: 8 types, 197 tokens and coverage at 0.2236% (cf. Table 5). Quite expectedly, chronicles
5
Since there is some overlap between the manuscripts, the token frequency of some recurrent phrases may be
increased.
15
rely on numerous recurrent temporal phrases (and þy ilcan geare ‘and the same year’, her on þissum
geare ‘here on this year’, on þæm dagum ‘on the days’, etc.), which specify when a given historical
even took place as in (14)-(15).
(14) her on þissum geare willelm cyngc geaf rodbearde eorle þone
here on this year William king gave Rodbeard earl the
ealdordom ofer norðhymbraland
earldom over Northumbria
‘This year king William gave the earldom of Northumbria to Rodbeard’
(cochronD,ChronD_[Classen-Harm]:1068.1.2317)
(15) þy ilcan geare swilce se halga godes wer Ecgbreht,
the same year also the holy God’s man Egbert
swa swa we beforan gemyngedon, þy seolfan eastordæge
so as we before reminded the self Easter
forþferde to drihtne
forth-went to Lord
‘Also the same year Egbert, the holy man of God, just like we said before, went forth
to the Lord on Easter itself’ (cobede,Bede_5:21.476.20.4782)
Interestingly, the adverb her ‘here’, whose prototypical function in OE is locative, in the Anglo-Saxon
Chronicle is used with a clearly temporal reference, cf. (14) above as well as (16) below.
What is more, it is worth noting that the concept-related bundles are based on the nouns cyning (king)
and bisceop (bishop), who are the most recurrent participants of various historical events (se arwurþ
biscop ‘the noble bishop’, se cyning and ‘the king and’, se cyning þone ‘the king the’ or se cyning him
‘the king him/them’). Interestingly, however, chronicles reveal a much higher proportion of process-
related lexical bundles built around verbs such as sendan ‘send’, cuman ‘come’, fon ‘take’ and forðfaran
‘die’, such as þa sende he ‘then sent he’, oð he com to ‘until he came to’ or cyning forðferde and ‘king
died and’ shown in (16). Not a single representative of this type has been identified in the homilies.
(16) her aldferþ norðanhymbra cyning forþferde & seaxuulf

here Aldferth Northumbrian king died and Seaxwulf
biscep
bishop
‘Here (this year) Aldferth the king of Northumbria died, and so did Seaxwulf the
bishop’ (cochronA-1,ChronA_[Plummer]:705.1.425)

and se cyning and the king 0.0812 74 topic elaboration
16
and þy ilcan geare and the same year 0.0626 47 temporal
her on þissum geare here on this year 0.0451 36 temporal
on þæm dagum on the days 0.0423 52 temporal
þa sende he then sent he 0.0357 35 process-related
to biscope gehalgod to bishop consecrated 0.0346 21 process-related
oð he com to until he came to 0.0341 29 process-related
and he siþþan and he afterwards 0.0340 31 topic elaboration
and þæt folc and the people 0.0338 36 topic elaboration
cyning forðferde and king died and 0.0329 20 process-related
to romana onwalde and hine hæfde to Roman power and it had 0.0316 13 process-related
and se arcebiscop and the archbishop 0.0298 20 topic elaboration
se arwurþ biscop the noble bishop 0.0296 21 concept-related
and þær ofsloh and there killed 0.0276 22 topic elaboration
her on þissum geare com here on this year came 0.0276 16 process-related
Table 4. Referential n-grams in chronicles.
The discourse-organising bundles, infrequent as they are (cf. Table 5), do not show the dominance of
reported speech signals like in the homilies; the most frequent ones are focus bundles which specify the
name of a given person or place as in (17) and (18). Interestingly, þæs noma wæs ‘whose name was’ is
used almost exclusively in Bede’s Historia (we have identified only 3 uses in the ASC), while þe man
hæt ‘which one calls’ appears predominantly in Orosius. The former phrase is used to describe people
as in (17), while the latter is used for inanimate objects as in (18). Apparently, even though both texts
are classified as “history”, Bede is more focused on people, while Orosius pays considerable attention
to geography as well.
(17) wæs sum arwyrðe mæssepreost, þæs noma wæs utta

was some honourable priest whose name was Utta
‘There was an honourable priest whose name was Utta’
(cobede,Bede_3:13.198.21.2012)
(18) & be eastan þæm lande is se wendelsæ, þe
and by east the land is the Mediterranean which
man hæt adriaticum
one calls Adriatic
‘And to the east of the land there is the Mediterranean sea, which is called the Adriatic’
(coorosiu,Or_1:1.20.3.395)
17
Finally, historical texts also rely on some recurrent transition signals with eac swylce ‘likewise’ and
reported speech signals, though to a much lesser extent than homilies.
þæs noma wæs whose name was 0.0406 37 focus bundles
þe man hæt which one calls 0.0336 33 focus bundles
se wæs biscop this/who was bishop 0.0293 22 focus bundles
and cwæð þæt he and said that he 0.0268 19 reported speech signals
and þus cwæð and thus said 0.0247 21 reported speech signals
þa het he then ordered he 0.0244 26 reported speech signals
and eac swilce and likewise 0.0223 19 transition signals
eac swilce se likewise the 0.0219 20 transition signals
Table 5. Discourse organising n-grams in chronicles.
5.4. The Bible
The biblical texts included in YCOE are the West-Saxon Gospels (71,104 words), which are a free
translation of the whole New Testament, the Heptateuch (59,524), which is a free translation of the first
seven books of the Old Testament, and two independent translations of the Book of Genesis (5,224
words) and Exodus (1,096). Glosses are not included in the corpus because of their heavy reliance on
the Latin source text.
In the biblical texts, discourse organising bundles are more frequent than the referential ones. There are
25 types and 582 tokens of the former category (total coverage at 1.2816%) as well as 23 types and only
358 tokens of the latter (total coverage at 0.6873%). Similar to the homilies, the discourse organising
bundles are dominated by reported speech signals (their coverage is 1.1483%), built mostly around the
basic OE quoting verb cweþan ‘say’ and, to a lesser extent, secgan ‘say’ and andswarian ‘answer’.
Interestingly, many examples of direct speech are isolated quotes, mostly from Jesus as in (19), where
the quoted statement is an element of the narration. Quoted dialogues, even though they do appear in
stories as in (20), are not as dominant as in the homilies.
(19) þa cwæð se hælend to him, fylig me & læt

then said the Saviour to him follow me and let
deade bebyrigean hyra deadan
dead bury their dead
‘Then Jesus said to them: Follow me and let the dead bury their dead’
(cowsgosp,Mt_[WSCp]:8.22.464)
18
(20) þa cwæð he, hu fela hlafa hæbbe ge; þa cwædon
then said he how many loaves have you then said
hig, seofon & feawa fixa
they seven and few fish
‘Then he said: How many loaves of bread do you have? Then they said: Seven, and
some fish’ (cowsgosp,Mt_[WSCp]:15.34.1053-1054)
There are also some recurrent focus n-grams aimed at specifying a certain discourse referent: þa þing
þe ‘the things which’, se man þe ‘the man who’ and þe is genemned ‘which is called’, but the most
interesting discourse organising phrase appearing only in the OE New Testament is ic eow secge ‘I say
unto you’, together with its word order variant ic secge eow, which are often a part of a longer recurrent
unit soðlice ic secge eow ‘verily I say unto you’, soð ic eow secge ‘verily I say unto you’ and nu ic
secge eow ‘now I say unto you’. The function of these expressions is clearly to focus the attention of
hearers and the crucial fact is that Jesus is the only discourse participant using the phrases exemplified
in (21)-(22). The effect achieved by ic secge eow is, first of all, to emphasise the importance of the
message, and to give credibility to the statement, using the authority of the son of God. It is noteworthy
that this phrase is the only recurrent unit which contains the pronoun ic ‘I’, generally avoided by the
predominantly anonymous, early medieval authors who preferred to remain in the background of the
texts. Expressing personal opinions was not a frequent phenomenon, and only God would be deemed
worthy of using such a personal phrase as ic secge eow. Let us recall that lexically recurrent phrases
with ic ‘I’ are completely absent from the OE homilies.
(21) nu ic eow secge, hebbað upp eowre eagan

now I you say lift up your eyes
‘Now I say unto you, lift up your eyes’ (cowsgosp,Jn_[WSCp]:4.35.6015)
(22) soðlice ic secge eow gif hwa mine spræce gehealt ne
truly I say you if who my speech keeps not
gesyhþ he deað næfre
sees he death never
‘Verily I say unto you, if someone heeds my words, he shall never see death’ ‘
(cowsgosp,Jn_[WSCp]:8.51.6477)
Finally, we need to remember that the biblical texts are translations from the Latin Vulgate, so ic secge
eow is a phraseological equivalent of the Latin dico vobis ‘I say unto you’. Hence, the use of this phrase
was directly triggered by the source text and the absence of such recurrent multi-word items in non-
translated prose records of OE confirms the reluctance of Anglo-Saxon writers to build recurrent
phrases around the first person singular pronoun.
19
þa cwæð he then said he 0.1534 82 reported speech signal
þa cwæð se hælend then said the Saviour 0.0979 34 reported speech signal
and þus cwæð and thus said 0.0950 44 reported speech signal
þa cwæð se hælend to then said the Saviour to 0.0823 22 reported speech signal
him him/them
and cwæð to him and said to him/them 0.0783 32 reported speech signal
þa cwæð se then said the X 0.0730 39 reported speech signal
and he cwæð and he said 0.0665 33 reported speech signal
þa cwæð he to him then said he to him/them 0.0574 21 reported speech signal
him and cwæð him/them and said 0.0524 26 reported speech signal
and he cwæð to him and said he to him/them 0.0461 26 reported speech signal
þa þing þe the things which 0.0443 28 focus bundle
þa cwæð dryhten to then said Lord to Moses 0.0427 11 reported speech signal
moyse
þa sæde he him then said he him/them 0.0391 17 reported speech signal
and cwæð ic and said I 0.0363 18 reported speech signal
and cwæð dryhten and said Lord 0.0355 13 reported speech signal
& secgað him and tell him/them (IMP) 0.0322 16 reported speech signal
he cwæð to him he said to him/her/them 0.0322 14 reported speech signal
se man þe the man who 0.0317 22 focus bundle
þe is genemned who is called 0.0311 12 focus bundle
þa cwæð moyses to then said Moses to 0.0288 10 reported speech signal
ic eow secge I say unto you 0.0262 14 interactional
and cwæð to and said to 0.0262 13 reported speech signal
þa andswarode he then answererd he 0.0245 10 reported speech signal
cwæð to him said to him/them 0.0243 13 reported speech signal
cwæð eft to said again to 0.0242 12 reported speech signal
Table 6. Discourse organising n-grams in biblical texts.
As far as the referential n-grams are concerned, it is quite interesting to observe that almost all of them
refer to material objects and simple activities. The lexical words appearing in these recurrent multi-
word units are sellan ‘give’, land ‘land’, dæg ‘day’, hælend ‘Jesus’, westen ‘desert’, wæter ‘water’,
20
hand ‘hand’, etc. The function of the recurrent phrases is to specify time (on þam dæge ‘on the day’, on
ðære tide ‘on the time’) or place (on egypta lande ‘on Egypt land’, on þam westene ‘on the desert’, ofer
eorþan and ‘over earth and’) and they are very simple in structure and transparent in meaning. Some of
them are process-related (e.g., þa com se ‘then came the’, ða genealæhton hig ‘then approached they’),
which makes the biblical texts quite similar to the OE chronicles.
and sealde him and gave him/them 0.0564 28 topic elaboration
on egypta lande on Egypt land 0.0540 25 location
on þam dæge on the day 0.0461 32 temporal
þa se hælend then/when the Saviour 0.0355 19 topic elaboration
and þæt folc and the people 0.0345 20 topic elaboration
se hælend him the Saviour him/her/them 0.0318 17 concept-related
þa com se then came the X 0.0311 18 process-related
his leorningcnihtas him his disciples him/her/them 0.0302 10 concept-related
and he eode and he went 0.0285 18 topic elaboration
on þam westene on the desert 0.0281 15 location
and se hælend and the Saviour 0.0262 13 topic elaboration
on ðære tide on the time 0.0259 18 temporal
ða genealæhton hig then approached they 0.0245 10 process-related
þara sacerda ealdras the chief priests 0.0245 10 concept-related
moyses and aaron Moses and Aaron 0.0245 10 concept-related
& þæt wæter and the water 0.0243 13 topic elaboration
Table 7. Referential n-grams in biblical texts.
Finally, there are also some attitudinal phrases centred around the sentence adverb soþlice ‘truly’, which
is very frequent in the biblical texts. As in (21)-(22), the anonymous authors of the OE translation of
the Bible do not find themselves worthy of expressing their own opinion in the holy text, but in the
fragments where Jesus Christ is quoted, attitudinal n-grams appear, giving us some idea about the
possible ways of expressing stance in OE, as illustrated in (23).
(23) soþlice þam þe hæfþ him byþ geseald & he hæfð,

truly him who has him is given and he has
soþlice se þe næfð & þæt þe he hæfð him bið
truly he who not-has and that what he has him is
21
ætbroden
taken
‘Truly, to the one who has it will be given and he shall have. Truly, from the one who
does not have what he has will be taken’ (cowsgosp,Mt_[WSCp]:13.12.821)
N-gram Translation Coverage N
soþlice hit is/wæs truly it is/was 0.0391 16
soþlice se þe truly this who 0.0242 12
Table 8. Attitudinal n-grams in OE biblical texts
6. Conclusions
In this paper we addressed three research questions that – in short – pertain to the degree the OE texts
under scrutiny are permeated by formulaic language and to the discoursal functions the most salient
formulaic phrasings perform in the analysed OE text types. We were also interested in the impact of the
Old English formulaic phrasings onto contemporary English.
The study findings revealed a high degree of variability among recurrent multi-word units in each of
the analysed text types of OE prose since the differences between their form and function in particular
text types are quite substantial. Referential expressions are predominant in homilies and chronicles,
while biblical translations are dominated by discourse-organizers, mostly signals of reported speech.
Interestingly, homilies proved to be the richest source of recurrent n-grams, giving us a lot of valuable
information on the textual characteristics of this OE text type. Next, it should be noted that the
referential expressions in homilies are mostly concept-related entities, whereas in chronicles we observe
a wide array of process-related units. What is more, it turned out that referential n-grams, whose
pragmatic potential is by nature rather limited, may perform some interesting discourse functions as
e.g., the vocative phrase men la leofstan in homilies. Most importantly, however, the main observation
based on these results is that while some OE referential expressions identified in this study (e.g. the
almighty God) are still used in English, the OE discourse-organising n-grams differ substantially from
lexical bundles of the same function identified in contemporary studies of English (e.g., Biber et al.
2003, 2004; Biber 2006, 2009). This means that OE discourse was organised by means of completely
different linguistic items. We have identified no recurrent story-openers, sequencing elements or
concluding markers resembling the Present-Day English on the other hand, what is more, etc., even
though many of such units are composed of lexical elements which existed in the language already in
the OE period. The only recurrent n-grams identified in the OE texts under scrutiny which have survived
represent biblical English, and the most notable example would be I say unto you, which is
unambiguously associated with the language of the Scripture. Other examples involve quotation
22
formulas (he said to him/them), though most of them were lost because they involve subject-verb
inversion (then said Jesus). This result suggests that in the history of English, discourse organisation is
less diachronically stable than other elements of the language system.
This study has a number of limitations. First off, it is primarily descriptive and, given the scope of
research methods used in it, we do not posit any explanatory hypotheses as to the text-external factors
that govern the use of the identified formulaic phrasings. For the same reason, the study is monofactorial
in nature as the very identification of salient recurrent n-grams is based on the analysis of their frequency
distributions in the analysed texts. Second, the inventory of formulaic phrasings with the largest
coverage in the study corpus of OE texts is based on the source set of n-grams with 3-7 words only.
Also, the corpus analysis of the recurrent phrasings represents a bottom-up concordance-based approach
to the study of their pragmatic aspects and roles in OE written texts, which would not be discernible
otherwise, in a top-down approach.
All in all, the study findings show that non-referential formulaic language has changed considerably
over the history of English and in the future it would be very interesting to see when the discourse
organising strategies used in contemporary English started to emerge. On the basis of the findings, we
may only conclude that they came into existence in later stages of English. Other possible directions for
further study include the analysis of other genres, diachronic studies of particular lexical bundles and
text types across various historical periods as well as the analysis of the influence of Latin on the
emergence of some formulas in translated texts.
References
Ädel, Annelie, and Britt Erman. 2012. Recurrent word combinations in academic writing by native and
non-native speakers of English: A lexical bundles approach. English for Specific Purposes 31.
81–92.
Bech, Kristin. 2012. Word order, information structure, and discourse relations: A study of Old and
Middle English verb-final clauses. In Anneli Meurman-Solin, Maria Lopez-Couso, & Bettelou
Los (eds.), Information structure and syntactic change in the history of English, 66-86. New
York: Oxford University Press.
Bech, Kristin. 2001. Word Order Patterns in Old and Middle English: A Syntactic and Pragmatic Study.
Bergen: University of Bergen Ph.D. dissertation.
Biber, Douglas. 2006. University Language. A corpus-based study of spoken and written registers.
Amsterdam/Philadelphia: John Benjamins.
Biber, Douglas. 2009. A corpus-driven approach to formulaic language in English: multi-word patterns
in speech and writing. International Journal of Corpus Linguistics 14 (3): 275–311.
Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad, and Edward Finegan. 1999. The
Longman Grammar of Spoken and Written English. London: Longman.
23
Biber, Douglas, Susan Conrad, and Viviana Cortes. 2003. Lexical bundles in speech and writing: An
initial taxonomy. In: Corpus Linguistics by the Lune: A Festschrift for Geoffrey Leech, ed. by
Andrew Wilson, Paul Rayson, and Anthony McEnery, 71–92. Frankfurt/Main: Peter Lang.
Biber, Douglas, Susan Conrad, and Viviana Cortes. 2004. If you look at…: Lexical bundles in university
teaching and textbooks. Applied Linguistics 25 (3). 371–405.
Breeze, Ruth. 2013. Lexical bundles across four legal genres. International Journal of Corpus
Linguistics 18 (2). 229–253.
Brinton, Laurel. 2017. The Evolution of Pragmatic Markers in English: Pathways of Change.
Cambridge: CUP.
Brinton, Laurel. 2010. Discourse Markers. In Historical Pragmatics, ed. by Andreas Jucker & Irma
Taavitsainen, 285–314. Berlin: Mouton de Gruyter.
Buerki, Andreas. 2020. Formulaic Language and Linguistic Change. A Data-Led Approach.
Cambridge: Cambridge University Press.
Calle-Martín Javier & Antonio Miranda-García. 2010. ‘Gehyrdon ge þæt gecweden wæs’ – a corpus-
based approach to verb-initial constructions in Old English. Studia Neophilologica 82. 49-57.
Cao, Feng. 2019. A comparative study of lexical bundles across paradigms and disciplines. Corpora 16
(1). 97–128.
Chen, Yu-Hua, and Paul Baker. 2010. Lexical bundles in L1 and L2 academic writing. Language
Learning and Technology 14 (2). 30–49.
Cheng, Winnie, Chris Greaves, and Martin Warren. 2006. From n-gram to skipgram to concgrams.
International Journal of Corpus Linguistics 11 (4). 411–433.
Cichosz, Anna. 2020. Negation and Verb-initial Order in Old English Main Clauses. Journal of English
Linguistics 48 (4). 355-381.
Ellison, Robert. 1998. The Victorian Pulpit: Spoken and Written Sermons in Nineteenth-Century
Britain. Selinsgrove: Susquehanna University Press.
Enkvist, Nils Erik. 1972. Old English adverbial þa– an action marker? Neuphilologische Mitteilungen
73. 90–96.
Fletcher, William. 2002. KfNgram. Annapolis, MD: USNA. Available at:
http://www.kwicfinder.com/kfNgram/kfNgramHelp.html
Forsyth, Richard. 2015a. Formulib: Formulaic Language Software Library. Available at:
http://www.richardsandesforsyth.net/zips/formulib.zip
Forsyth, Richard. 2015b. Formulib: Formulaic Language Software Library. User notes
http://www.richardsandesforsyth.net/docs/formulib.pdf
Forsyth, Richard. 2021. Cascading collocations: Collocades as correlates of formulaic language. In:
Formulaic Language: Theories and Methods, ed. by Aleksandar Trklja and Łukasz Grabowski,
31–52. Berlin: Language Science Press.
Fox, Michael & Manish Sharma. 2012. Old English Literature and the Old Testament. Toronto:
University of Toronto Press
Fuster-Marquez, Miguel. 2014. Lexical bundles and phrase frames in the language of hotel websites.
English Text Construction 7 (1): 84–121.
24
Goźdź-Roszkowski, Stanisław. 2011. Patterns of Linguistic Variation in American Legal English. A
Corpus-Based Study. Frankfurt am Main: Peter Lang Verlag.
Grabowski, Łukasz. 2015. Keywords and lexical bundles within English pharmaceutical discourse: a
corpus-driven description. English for Specific Purposes 38. 23-33.
Grabowski, Łukasz, and Rita Jukneviciene. 2016. Towards a refined inventory of lexical bundles: an
experiment in the Formulex method. Kalbu Studijos/Studies About Languages 29. 58-73.
Grabowski, Łukasz. 2019. Distinctive lexical patterns in Russian patient information leaflets: a corpus-
driven study. Russian Journal of Linguistics 23 (3). 659-680.
Granger, Sylviane, and Fanny Meunier. 2008. Introduction: The many faces of phraseology. In:
Phraseology: An interdisciplinary perspective, ed. by Sylviane Granger & Fanny Meunier, xix–
xxx. Amsterdam: John Benjamins.
Hogg, Richard. 2006. Old English Dialectology. In: The Handbook of the History of English, ed. by
Ans van Kemenade & Bettelou Los, 395-416. Oxford: Blackwell.
Hunston, Susan, and Gill Francis. 2000. Pattern Grammar: a corpus-driven approach to the lexical
grammar of English. Amsterdam: John Benjamins.
Hyland, Ken. 2008. As can be seen: Lexical bundles and disciplinary variation. English for Specific
Purposes 27: 4–21.
Kemenade, Ans van & Marit Westergaard. 2012. Syntax and Information Structure: Verb-Second
Variation in Middle English. In Anneli Meurman-Solin, Maria Jose Lopez-Couso & Bettelou Los
(eds.), Information structure and syntactic change in the history of English, 87-118. New York:
Oxford University Press.
Kemenade, Ans van & Meta Links. 2020. Discourse particles in early English: clause structure,
pragmatics and discourse management. Glossa: A Journal of General Linguistics 5 (1).
Kleist, Aaron J. 2001. Ælfric’s Corpus: A Conspectus. Florilegium 18(2). 113-164.
Knappe, Gabriele. 2012. Idioms and fixed expressions. In Historical Linguistics of English: An
International Handbook, ed. by Alexander Bergs & Laurel Brinton, 177-196. Berlin: de Gruyter.
Kohnen, Thomas. 2008. Directives in Old English: Beyond politeness? In Speech Acts in the History of
English, ed. by Andreas Jucker & Irma Taavitsainen, 27-44. Amsterdam/Philadelphia:
Benjamins.
Kopaczyk, Joanna. 2013. The Legal Language of Scottish Burghs: Standardization and Lexical Bundles
(1380-1560). Oxford: OUP.
Kopaczyk, Joanna. 2012. Long lexical bundles and standardisation in historical legal texts. Studia
Anglica Posnaniensia: International Review of English Studies 47 (2-3). 3–25.
Kuiper, Koenraad. 2009. Formulaic Genres. Berlin: Springer.
Labov, William. 1994. Principles of linguistic change. Oxford: Blackwell.
Labov, William. 1972. Sociolinguistic patterns. Philadelphia: University of Pennsylvania Press.
Lehto, Artu. 2018. Lexical bundles in Early Modern and Present-day English Acts of Parliament. In:
Applications of Pattern-Driven Methods in Corpus Linguistics, ed. by Joanna Kopaczyk and
Jukka Tyrkkö, 159–188. Amsterdam: John Benjamins.
25
Lenker, Ursula. 2012. Old English: Pragmatics and Discourse. In Historical Linguistics of English: An
International Handbook, ed. by Alexander Bergs & Laurel Brinton, 325–340. Berlin: de Gruyter.
Lenker, Ursula. 2000. Soþlice and witodlice: Discourse markers in Old English. In Pathways of
Change: Grammaticalization in English, ed. by Olga Fischer, Anette Rosenbach, & Dieter Stein,
229–249. Amsterdam: Benjamins.
Los, Bettelou. 2009. The consequences of the loss of verb-second in English: information structure and
syntax in interaction. English Language and Linguistics 13 (1). 97-125.
Louviot, Elise. 2018. Pragmatic uses of nu in Old Saxon and Old English. In New Trends in
Grammaticalisation and Language Change, ed. by Sylvie Hancil, Tine Breban and Jose Vincente
Lozano. Amsterdam: Benjamins.
Magoun, Francis Peabody. 1953. Oral-Formulaic Character of Anglo-Saxon Narrative Poetry.
Speculum 28. 446-484.
Mel’cuk, Igor. 2020. Cliches and pragmatemes. Neophilologica 32. 9-20.
Mel’cuk, Igor, and Jasmina Milićević. 2020. An Advanced Introduction to Semantics: A Meaning-Text
Approach. Cambridge: Cambridge University Press.
Pawley, Andrew. 2007. Developments in the study of formulaic language since 1970: A personal view.
In: Phraseology and culture in English, ed. by Paul Skandera, 3–34. Berlin: Mounton de Gruyter.
Pawley, Andrew, and Frances Hodgetts Syder. 1983. Two puzzles for linguistic theory: nativelike
selection and nativelike fluency. In: Language and Communication, ed. by Jack Richards &
Richard Schmidt, 191 –226. New York: Longman.
Petrova, Svetlana. 2006. A discourse-based approach to verb placement in early West-Germanic. In
Ishihara, S., Schmitz, M. & Schwarz, A. (eds.), Working Papers of the SFB632, Interdisciplinary
studies on information structure 5, 153-182. Potsdam: Universitätsverlag.
Pęzik, Piotr. 2018. Facets of prefabrication. Perspectives on modelling and detecting phraseological
units. Łódź: Wydawnictwo UŁ.
Pintzuk, Susan & Ann Taylor. 2012. The effect of information structure on object position in Old
English: a pilot study. In Information Structure and Syntactic Change in the History of English,
ed. by Anneli Meurman-Solin, María José López-Couso & Bettelou Los, 47–65. New York:
Oxford University Press.
Ruehlemann, Christoph, and Brian Clancy. 2018. Corpus Linguistics and Pragmatics. In Pragmatics
and its Interfaces, ed. by Cornelia Ilie and Neal R. Norrick. Amsterdam: John Benjamins, 241-
266.
Salazar, Danica. 2014. Lexical Bundles in Native and Non-native scientific writing. Amsterdam: John
Benjamins.
Schmitt, Norbert. 2010. Formulaic Language. In: Researching Vocabulary: A Vocabulary Research
Manual. 117–146.
Schmitt, Norbert, and Ronald Carter. 2004. Formulaic sequences in action: An introduction. In:
Formulaic Sequences: Acquisition, Processing and Use, ed. by Norbert Schmitt, 1–22.
Amsterdam: John Benjamins.
Scott, Mike, and Chris Tribble. 2006. Textual Patterns: keyword and corpus analysis in language
education. Amsterdam: Benjamins.
26
Sidtis, Diana. 2021 (in print). Foundations of Familiar Language: Formulaic Expressions, Lexical
Bundles, and Collocations at Work and Play. New York: Wiley Blackwell.
Simpson-Vlach, Rita, and Nick Ellis. 2010. An Academic Formulas List: New Methods in Phraseology
Research. Applied Linguistics 31 (4). 487–512.
Stanton, Robert. 2002. The culture of translation in Anglo-Saxon England. Cambridge: D.S. Brewer.
Struik, Tara & Ans van Kemenade. 2018. On the givenness of OV word order: a (re)examination of
OV/VO variation in Old English. English Language and Linguistics 19 (1): 49-81.
Taylor, Ann, Anthony Warner, Susan Pintzuk & Frank Beths. 2003. The York-Toronto-Helsinki Parsed
Corpus of Old English Prose (YCOE).
Wanner, Leo. 1996. Introduction. In Lexical Functions in lexicography and natural language
processing, ed. by Leo Wanner. Amsterdam: John Benjamins, 1-36.
Wårvik, Brita. 2013a. Participant continuity and narrative structure: Defining discourse marker
functions in Old English. Folia Linguistica Historica 34: 1-34.
Wårvik, Brita. 2013b. Peak-marking strategies in Old English narrative prose. Style 47.2: 168-184.
Wårvik, Brita. 2011. Connective or 'disconnective' discourse marker? Old English þa,
multifunctionality and narrative structuring. In Connectives in Synchrony and Diachrony in
European Languages, ed. by Anneli Meurman-Solin & Ursula Lenker.
http://www.helsinki.fi/varieng/journal/volumes/05/warvik/
Wood, David. 2015. Fundamentals of Formulaic Language. London: Bloomsbury.
Wray, Alison. 2002. Formulaic language and the lexicon. Cambridge: Cambridge University Press.
Wray, Alison. 2008. Formulaic language. Pushing the boundaries. Oxford: Oxford University Press.
Wray, Alison & Perkins, Michael. 2000. The functions of formulaic language: an integrated model.
Language and Communication 20. 1–28.
27
View publication stats

JHPCichosz Grabowski Pezik

Uploaded by

Copyright:

Available Formats

JHPCichosz Grabowski Pezik

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

JHPCichosz Grabowski Pezik

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Formulaic language in Old English prose: a corpus-driven functional analysis

Preprint · March 2023

Anna Cichosz Piotr Pezik

SEE PROFILE SEE PROFILE

PELCRA Learner English Corpus (PLEC) View project

Understanding translation through corpora View project

The user has requested enhancement of the downloaded file.

a corpus-driven functional analysis1

2. Corpus linguistics and formulaic language

3. Studies of Old English texts from a discourse perspective

4.1. Research material

3 One of the infrequent exceptions is Kopaczyk (2013).

Text type Words (tokens) Texts

4NCN SONATA 13 nr 2017/26/D/HS2/00272 “The variation of syntactic and phraseological constructions in

4.2. Research questions

4.3. Research procedures and study stages

• hym andswarode and cwæð ‘him answered and said’

5. Empirical part: results

(1) se ælmihtiga god þurh his gife eow gescylde

(3) men þa leofestan, her sagaþ matheus se godspellere þætte

(6) þa ðing þe we geseoð on þisum life: þa sind

N-gram Translation Coverage N Sub-type

se ælmihtiga god the almighty god 0.0666 76 concept-related

men þa leofestan men the dearest 0.0520 89 concept-related

on þysse worulde on this world 0.0452 62 location

se man þe the man who 0.0423 87 focus bundle

se halga wer the holy man 0.0386 61 concept-related

þæt ece lif the eternal life 0.0294 55 concept-related

her on worulde here on world 0.0292 40 location

on ðisum dæge on this day 0.0273 51 temporal

on þisum middangearde on this world 0.0254 29 location

on þisum life on this life 0.0219 41 location

to þam ecan life to the eternal life 0.0218 32 location

and mid miclum and with great 0.0218 32 topic elaboration

geond ealne middaneard throughout all world 0.0215 21 location

rihtum geleafan and right faith and 0.0207 25 concept-related

seo ealde æ the old law 0.0199 41 concept-related

Table 2. Referential n-grams in homilies.

(9) þa cwæð se hælend hyre to, gang clypa þinne wer,

(11) regnum dei intra uos est: þæt is on englisc, godes

(13) Þa cwæð he, Pilatus, to Iudeum: Ecce rex uester,

N-gram Translation Coverage N Sub-type

and þus cwæð and thus said 0.0431 59 reported-speech signal

þa cwæð se then said the X 0.0424 67 reported-speech signal

þæt is on englisc that is in English 0.0292 19 focus bundle

þe is gehaten which is called 0.0248 30 focus bundle

þa cwæð he then said he 0.0247 39 reported-speech signal

and het he and ordered he 0.0247 39 reported-speech signal

he cwæð to he said to 0.0247 39 reported-speech signal

his nama wæs his name was 0.0212 29 focus bundle

þa andwyrde se then answered the X 0.0210 27 reported-speech signal

cwæð to þam said to the X 0.0183 29 reported-speech signal

and eac se and also the X 0.0177 33 transition marker

Table 3. Discourse organising n-grams in homilies.

(16) her aldferþ norðanhymbra cyning forþferde & seaxuulf

N-gram Translation Coverage N Sub-type

(17) wæs sum arwyrðe mæssepreost, þæs noma wæs utta

N-gram Translation Coverage N Sub-type