A Typology of English Texts : Douglas Biber
A Typology of English Texts : Douglas Biber
A Typology of English Texts : Douglas Biber
DOUGLAS BIBER
Abstract
1. Introduction
Over the last several years, numerous studies have attempted to document
the nature and extent of linguistic similarities and differences among
various kinds of texts. A major goal of such research is to develop an
overall typology of texts, to provide a theoretical and empirical founda-
tion for comparative discourse research. This need has been emphasized
by Tannen (1982: 1):
Linguistic research too often focuses on one or another kind of data, without
specifying its relationship to other kinds. In order to determine which texts are
appropriate for proposed research, and to determine the significance of past and
nouns
word length
prepositions
type/token ratio
attributive adjectives
place adverbials
The label 'Abstract versus nonabstract style' can thus be proposed for
dimension 5.
In the same way that the frequency of nouns in a text might be called the
'noun score' of that text, 'dimension scores' can be computed to
characterize each text with respect to each dimension. First, the frequen-
cies of all linguistic features are normalized to a text length of 1,000 words
and standardized to a mean of 0.0 and a standard deviation of 1.0. On
such a scale, a score of 1.0 marks a value that is one standard deviation
higher than the overall mean score; a score of — 1.0 marks a value that is
one standard deviation below the mean.1 Standardized scores are used
because they set frequency counts to a single scale, making the frequencies
directly comparable across features.
After standardization, dimension scores are computed by summing, for
each text, the frequencies of the salient defining features of the dimension.
To illustrate, consider dimension 3 as given in the summary on page 8.
The dimension score representing dimension 3 is computed by adding
together the frequencies of WH relative clauses on object and subject
positions, pied-piping relative clauses, phrasal coordination, and nomi-
nalizations (the features with positive loadings), and subtracting the
frequencies of time adverbials, place adverbials, and general adverbs (the
features with negative loadings) — for each text
The linguistic relations among texts can be considered by comparing
their dimension scores, and the relations among text varieties can be
considered by comparing the mean dimension score of each variety. For
example, Figure 1 plots the mean dimension scores for nine English
genres with respect to dimension 1, 'Involved versus informational
production'. Face-to-face conversation has the highest value, marking it
as extremely involved and interactive; this high score reflects high
frequencies of present-tense verbs, private verbs, first- and second-
person pronouns, contractions, etc., together with markedly low fre-
quencies of nouns, prepositional phrases, long words, etc. Personal
letters and interviews have moderately high scores on dimension 1, while
general fiction and prepared speeches have intermediate values. Genres
like press reportage, academic prose, and official documents have the
lowest values on dimension 1, marking them as quite informational and
noninvolved; these low scores reflect very high frequencies of nouns,
prepositional phrases, etc., plus very low frequencies of private verbs,
contractions, etc.
35 + Face-to-face conversations
30
25
20 + Personal letters
Interviews
15 +
10 +
Prepared speeches
General fiction
-5 +
-10 + Editorials
O f f i c i a l documents
-20
Figure 1. Mean scores of dimension 1 ('Involved versus informational production') for nine
genres
The overall relations between any two texts or varieties can be analyzed
by consideration of their relative scores on all five dimensions. In previous
studies, I have used the dimensions to examine the relations among
various genre classes (see for example 1987, 1988; Biber and Finegan
1988); the present paper develops a typology of English texts with respect
to the five dimensions.
Genre Number
of texts
the circled number 5 on Figure 2 locates the text that has a dimension
score of 13 on dimension 1 (the horizontal axis) and a score of -5 on
dimension 3 (the vertical axis), and that belongs to cluster 5; these
dimension scores mark this text as moderately involved in focus and
moderately situated in reference.
Figure 2 shows relatively distinct groupings for clusters 1, 2, 5, 7, and 8,
while the remaining three clusters (3, 4, and 6) are less well distinguished
in terms of dimensions 1 and 3. The texts in cluster 1 (marked in the plot
by the numeral 1) are characterized by quite high scores on dimension 1
and relatively low scores on dimension 3; cluster 2 is similar except it has
lower scores on dimension 1. Clusters 5, 7, and 8, all have unmarked
scores on dimension 1; they differ from one another along dimension 3:
N CM «
.m ©
eo
eo
in
oo
in
in
m
ιηιη
in I
in oo in in
in in « m in .! 1
I I
o « oo in in in m
oo in oo oo in
in in in
in in
in m
m m
ω ο ω
")
ω to
(Ο (Ο (D
α
« ω ω (ο ω
m α) Φ
3ω ο
ω to α> ο α>
ν « «t to 5
•J5
*
o-eecw-oc n
'ε;
cluster 8 texts have markedly high scores, cluster 7 texts have extremely
low scores, cluster 5 texts have unmarked scores. Clusters 3, 4, and 6 all
have low scores on dimension 1; on dimension 3, cluster 4 texts tend to
have high scores, cluster 6 texts tend to have relatively low scores, and
cluster 3 texts have unmarked scores.
The asterisks on Figure 2 plot the 'centroids' (the central characteriza-
tions) for each cluster with respect to the two dimensions. An overall
summary of each cluster is given in Table 2, including the number of texts
in the cluster, the nearest cluster, and the 'distance' to the nearest cluster.
The 'distance' measures the cumulative difference between the cluster
centroids with respect to the five dimensions. Table 2 confirms the
impression given by Figure 2 that the types are not equally distinct in their
linguistic characterizations. In particular, this table shows that clusters 3,
4, and 6 are relatively nondistinct: both cluster 3 and cluster 6 have cluster
4 as the nearest cluster, with a distance of only 8.3 between cluster 3 and
cluster 4.
The cluster analysis identifies the 'core' text types in English: the
groupings that contain very high concentrations of texts. There is a group
of core texts and a group of peripheral texts associated with each cluster.
Core texts are very similar to the central linguistic characterization of a
cluster; peripheral texts are relatively dissimilar to the central cluster
characterization, but even more dissimilar to other clusters.4 Out of the
481 texts in this study, 345 are grouped into one of the core text types by
the cluster analysis; Figure 2 plots only these core texts. Peripheral texts,
however, are not aberrant; their existence rather reflects the fact that
textual variation is continuous. Texts do not divide into sharply distinct
'types' — instead there is a continuous range of variation in linguistic
form and use. The notion of 'text type' developed here is based on the
1 22 1 2 15.3
2 49 24 1 15.3
3 28 15 4 8.3
4 53 18 3 8.3
5 47 13 8 11.0
6 117 33 4 10.2
7 7 5 5 15.3
8 22 27 5 11.0
frequent and therefore typical clusterings of texts, which account for the
majority of texts in English. In a sense, these can be considered the text
'prototypes' of English. There are, however, other texts that fall in
between clusters, grading from one type to the next. I return to the
continuous nature of variation among texts in section 5.
The grouping of texts into clusters is determined on the basis of their
characterization with respect to all five dimensions. That is, texts that are
similar with respect to one dimension but very different with respect to
other dimensions are likely to be grouped into different clusters. Figure 2
shows the distribution of texts with respect to only two dimensions, but it
can be used as an illustration of the way texts are grouped into clusters.
For example, texts in clusters 1, 2, and 5 are very similar with respect to
their dimension 3 scores (the vertical axis); texts in all three clusters
generally have scores between 0 and — 8. With respect to their dimension
1 scores (the horizontal axis), however, the texts in these three clusters are
distinct: texts in cluster 1 have scores ranging from 40 to 54; texts in
cluster 2 range from 22 to 40; texts in cluster 5 range from — 3 to 15.
Similar comparisons can be made for clusters 8, 5, and 7: texts in these
clusters are quite similar with respect to their dimension 1 scores (ranging
generally between — 3 and 12), but quite distinct with respect to their
dimension 3 scores (cluster 8 ranging from 8 to — 1; cluster 5 ranging from
2 to — 8; cluster 7 ranging from — 10 to — 16). The picture given by Figure
2 is incomplete because only two dimensions are considered. When all five
dimensions are considered, it is possible to identify the salient distinguish-
ing characteristics of all eight clusters.
Figures 3 and 4 summarize the distinguishing characteristics of the
eight clusters, plotting the centroid score of each cluster with respect to
each dimension. These two figures present the same information: Figure 3
highlights clusters 1-4, while Figure 4 highlights clusters 5-8. The
information presented in these figures overlaps the information presented
in Figure 2; the centroid values for dimensions 1 and 3, which are given by
asterisks on Figure 2, are repeated on the respective scales of Figures 3
and 4.
On the basis of Figures 3 and 4, it is possible to describe the
distinguishing linguistic characteristics of each of the eight text types.
Using the interpretive dimension labels, cluster 1 is situated, nonabstract,
and extremely involved, but not marked for narrative concerns or
persuasion; cluster 2 is similar to cluster 1, except it is less involved.
Clusters 3 and 4 are also similar to each other: both are extremely
informational, highly elaborated, nonnarrative, and nonpersuasive. These
two clusters differ primarily with respect to dimension 5, where cluster 3 is
extremely abstract in style while cluster 4 is only moderately abstract.
-20 * CA
-30
40
30
20
10
0 -*^
-10
-20 * C4
-30
6 + 4 Religion (59%)*
4-h 2 Hobbies (43%)
34-2 Nonsports broadcasts (63%)*
3 + 0 Science fiction (50%)*
3 -l-1 Adventure fiction (31 %)
3 + 0 Mystery fiction (23%)
3-l· 0 Popular lore (21%)
2 + 3 Prepared speeches (35%)
2 + 0 Official documents (14%)
1 + 1 Professional letters (20%)
1 + 0 Romance fiction (8%)
0 + 2 Sports broadcasts (20%)
Text type 7. Situated reportage (7 core+ 5 peripheral texts)
7 + 1 Sports broadcasts (80%)*
0 + 1 Nonsports broadcasts (13%)
0 + 1 Science fiction (17%)
0 +1 Mystery fiction (8%)
0+1 Hobbies (7%)
Text type 8. Involved persuasion (22 core+ 27 peripheral texts)
5 + 4 Interviews (41%)
4 + 3 Spontaneous speeches (44%)
4 +1 Popular lore (36%)
2 + 2 Professional letters (40%)
2+1 Religion (18%)
2 + 0 Prepared speeches (14%)
1 +0 Telephone conversations — disparates (17%)
1+0 Humor (11%)
1 + 2 Editorial letters (11 %)
0 + 7 Academic prose (9%)
0 + 3 Hobbies (21%)
0 + 2 Personal letters (33%)
0+ 1 Nonsports broadcasts (13%)
0 + 1 General fiction (3%)
Cluster 2 texts:
Text sample 2 (LL:5.5; panel discussion)
Question: Do you think that there is any chance that the Labour
Party will provide an effective opposition in the forseeable future?
A: Christopher Chataway #
C: I've seldom heard a string of sentences # that I really do believe #
to [pause] contain quite so many # [pause] faulty analyses # of the
present situation # [long pause] I don't believe # that this country is
swinging to unilateralism #
A: Lord Boothby #
B: well # [pause] I don't think you know # that Tony Wedgwood
Benn can seriously say that personalities [pause] don't matter # [long
pause] because I think they do matter tremendously # in [pause] politics
today # and especially in the politics of the Left # [long pause] what has
happened is...
Text sample 3 (LL:1.1; face-to-face conversation between academic
colleagues, concerning student comprehensive exams)
A: well # [pause] may I ask # what goes into that paper now # because
I have to advise # [pause] a couple of people who are doing the [mm]
B: well what you do # is to [long pause] this is sort of between the two
of us # what you do # is to make sure that your own [pause] candidate
[mm] # is [pause] that your [pause] there's something that your own
candidate can handle # [long pause]
A: you mean that the the the papers are more or less set ad hominem
# are they # [pause]
B: [mm] [long pause] they shouldn't be # [long pause] but [mm]
[pause] I mean one # sets [long pause] one question # now I mean this
fellow's doing the language of advertising # [pause] so very well #
A: yeah#
B: give him one on
A: is this a spare paper (change of topic)
B: yeah....
Text sample 4 (LLrll.l; spontaneous speech — specifically a court
examination of a witness)
A: Mr Potter # did you # [long pause] arrive # about two o'clock # on
the [pause] Sunday # [pause] the date the will was [pause] signed #
[pause]
B: yes # [long pause]
A: and [pause] did you [pause] go # and see your mother straight away
Cluster 3:
Text sample 5 (LOB:J.8; physics journal article)
Thus the first few atomic layers deposited during the gettering period
are highly oxidized, and when the chamber has been 'cleaned up' the
deposit is more metallic. After the evaporation ceases, the deposited
film remains open to oxidation. Thus the deposited film is inhomogene-
ous and approximates to a sandwich layer of oxide/metal/oxide, in
which the outer layers are more highly oxidized than the inner layer.
The exact state of oxidation of the deposited film is unknown and a
further effect of oxidation can be observed upon baking in air. ...
Cluster 4:
Text sample 6 (LOB:J.27; sociology text)
Government in Spain continues to rest on the three institutions of an
hereditary monarchy (rejected by two short-lived republics), the parlia-
ment of the old Castilian Cortes, and an extensive Civil Service, with a
permanent staff except for its highest officials. Spain is at the moment a
kingdom without a king. The Franco regime has committed itself to the
maintenance of the monarchy as an institution by the 1947 Law of
Succession and the Referendum of the following year. Meanwhile the
regime, in its own words, is a representative, organic democracy in
which the individual participates in government through the natural
representative organs of the family, the city council and the syndicate.
Both of these samples show the characteristics of extreme informa-
tional production and explicit reference. Both have a very high concentra-
tion of nouns, prepositions, attributive adjectives, long words, and a quite
varied vocabulary — the bottom group of features on dimension 1; both
have essentially none of the top group of features on dimension 1, such as
2, the classification of texts into types 3 and 4 cuts across genre categories.
For example, several social science and humanities academic texts are
grouped into type 3 because they are relatively technical in content and
adopt the abstract and technical style ofthat type; conversely, a few natural
science and engineering academic texts are grouped into type 4, adopting
an active, nonabstract style in contrast to the norms for their subgenres.
Cluster 5:
Text sample 7 (LOB:L.12; Mystery fiction)
I'd finished making the bed by then. As I pushed it back against the
wall I heard something drop on the floor.
That was when the percolator in the living-room started making
bubbling noises. There was nothing on the floor that I could see. I told
myself it must've fallen down between the bed and the wall.
... Wasn't urgent anyway. Maybe my cigarette-case ... or Sonia's
powder compact ... I'd look for it later.
So I got up from my hands and knees, went into the living room and
fixed myself a cup of coffee.
Text sample 8 (LL:12.4b; prepared speech — court case)
A:
I have to decide in this case # [pause] what # [pause] if any maintenance
# [pause] should be paid # [pause] by the husband as I shall call him #
[pause] to the wife # [long pause] he's in fact § no longer the husband #
[long pause] he was originally petitioner # [pause] because there's been a
decree # [pause] absolute # [long pause] and he has remarried # [pause]
the decree # [long pause] was pronounced in favour # of the respondent
wife ft [pause] on the grounds of the husband's admitted adultery #
[pause] his charge # of adultery # [pause] against her # with the main
correspondent # [long pause] failed # after a [pause] somewhat lengthy
[pause] hearing # [pause] her charges of cruelty # against him # [pause]
likewise failed # [long pause]
study. As noted above, there is a total of 150 texts grouped into this
cluster; these texts represent 19 different genres, including press reportage,
press editorials, general fiction, biographies, humor, press reviews, aca-
demic prose, and religion. This is thus a very general type of exposition; it
is not markedly learned or technical, not markedly elaborated in reference
or abstract in style, and it often uses narration as part of its exposition.
Text samples 9, 10, and 11 illustrate the distinctive characteristics of
this cluster. Sample 9 is from an editorial and illustrates the use of
narrative forms to convey expository information. Sample 10 is from
press reportage, in which the information being conveyed comprises a
narration of past events. Finally, sample 11 is from a humor text and is
representative, of the fictional and biographical types of writing that use
the features of this cluster for entertainment purposes.
Cluster 6:
Text sample 9 (LOB:B.20; editorial letter)
Communism had little or nothing to do with the riots in South Africa
or the more recent disorders in Rhodesia. In fact, former leaders of the
Communist Party in the Union have left the country. Some are now in
the Rhodesian copper belt and at least one of them is in London.
In contrast, Moscow has embarked upon a special operation in
Ruanda-Urundi, which borders on the Belgian Congo. This state of
some 21,000 square miles and a population of 4,630,000 has been a
United Nations trust territory under the administration of Belgium, but
a few days ago she announced that she was giving up the trusteeship.
Text sample 10 (LOB:A.24; press reportage)
Four hundred angry Soccer fans chanted 'Sack the manager' outside
Newcastle United Football Club's ground yesterday.
United had just been thrashed 4-0 by Everton, and now look certain
to be relegated to the Football League's Division Two. Newcastle's
manager is ex-winger Charlie Mitten.
At half-time, with United two goals down, one disgusted fan climbed
the club's flagpole and hauled the Union Jack to half mast.
It was a riotous day for soccer....
Text sample 11 (LOB:R.2; humor)
He had long sensed injustice in the distinctions drawn between ordinary
wage-earners and those self-employed. By the time his monthly salary
arrived, the Inland Revenue had already taken their share, and there
were precious few reductions in tax except for wives, children, life-
insurances or any of the other normal encumbrances which Cecil had
so far avoided. He read the film star's sorry story and frowned at the
provisions of Schedule D taxation which not only allowed her to claim
relief on the most unlikely purchases, but also postponed demanding
the tax until her financial year was ended, audited and agreed by the
Inspector.
All three of these text samples illustrate the informational features
associated with dimension 1: frequent occurrences of the bottom features
(such as nouns, prepositional phrases, attributive adjectives) plus mark-
edly infrequent use of the upper features (such as private verbs, contrac-
tions). This is true of the editorial (sample 9), which is primarily
informative and expository in purpose, as well as the humor text (sample
11), which is primarily entertaining and narrative in purpose. Further,
despite the different purposes of these texts, they all use narrative forms
associated with dimension 2 (such as past-tense forms, perfect-aspect
verbs, third-person pronouns). This tendency is most pronounced in the
humor text sample, but it is found in all three samples. On the other three
dimensions, these samples illustrate the unmarked characterization of
cluster 6: not markedly 'elaborated' or 'situated' in reference, and not
marked with respect to persuasion or abstract style.
The text type represented by this cluster has a special place in the
present typology: it is the most general and nondistinct of the eight types.
Although the texts in this type share a general linguistic characterization,
having a carefully crafted, informational presentation and making rela-
tively frequent use of narrative forms, the linguistic characterization of
this type tends to be relatively unmarked on all five dimensions. As text
samples 9-11 show, the texts in this type can have different purposes,
although the underlying logical development used to achieve those
purposes seems relatively similar. That is, all of these samples use a
narrative line and careful informational elaboration to achieve their end.
In the case of editorials, that end is analysis of some political or social
situation; in the case of press reportage, that end is informing through a
factual report of events; in the case of humor (as well as fiction and
biography), that end is entertainment through a report of events. These
texts belong to the same type in their surface characterizations and, to a
lesser extent, in their underlying organizations; they show considerable
variation, however, with respect to their specific purposes. I will return to
the discussion of text type 6 relative to the other types in section 5.
Cluster 7:
Text sample 12 (LL:10.2; sports broadcast — soccer)
A:
Dunn ft down the line ft a bad one ft it's Badger that gets it ft he's got
time to control it ft [pause] he feeds in fact ft Tom Curry ft one of the
midfield players ahead of him ft [pause] Curry has got the ball ft on that
far side ft chips the ball down the centre ft [pause] again ft a harmless one
ft [pause] no danger ft out comes Stepney ft [pause] and now left-footed ft
his clearance ft [pause] is again ft a long [pause] high ft probing ball ft
down centrefield ft onto the head of [long pause] Flynn ft Flynn to
Badger ft Badger on the far side ft
Sample 12 illustrates the distinctive characteristics of cluster 7: neither
involved nor informational (that is, relatively few occurences of either
upper or bottom features from dimension 1), markedly nonnarrative
(no past-tense verbs, perfect-aspect verbs, or other features from dimen-
sion 2), nonpersuasive (none of the features associated with dimension 4),
and markedly nonabstract in style (none of the passive constructions
associated with dimension 5). Although these characterizations all repre-
sent features that are markedly infrequent, they reflect the very specialized
purpose and production situation of these texts: a speech event that
describes events actually in progress to a large audience that is not
present. For example, the distant relationship between broadcaster and
audience results in the lack of involvement features; the rapid on-line
production of text results in relatively few informational features; and the
reportage of events in progress results in the nonnarrative characteriza-
tion of these texts. The most distinctive positive characterization of type 7
texts is the extremely high use of expressions referring directly to the
physical and temporal situation of communication. Thus, sample 12
contains numerous expressions such as down the line, ahead of him, on that
far side, down the centre, which require direct reference to the playing field
for understanding. The very frequent use of these expressions results in
the extremely situated characterization of type 7 on dimension 3.
We might wonder why this text type is much more 'situated' than type
Cluster 8:
Text sample 13 (LL:11.4; spontaneous speech — MPs in Parliament —
interacting with each other and the Secretary of State)
Q: would he not agree that it is essential at the moment ft that more
[pause] should be free for exports and less absorbed within our public
sector ft [long pause]
A: well ft I think I would accept on the latter point ft that more of our
resources must go ft into [mm] into the balance of payments ft ....
Q: would he agree that [mm] ft [pause] an absence of such a statement
ft [pause] continues to generate uncertainty in the industry ft and
perhaps he might like to take this opportunity to [mm] ft re-emphasize
his support ft for the second force airline ft [long pause]
A: well I would certainly ft [pause] regret it if ft [pause] parts or ft
or indeed the whole of the [mm] review ft [pause] was to dribble out ft
that's not my intention at all ft [pause] we shall of course ft [pause]
indeed we are ft [pause] studying it [pause] very carefully ft [pause] ....
Text sample 14 (LL:12.1c; prepared speech — sermon)
A:
we must ft [long pause] have our corporate life together ft as a church ft
[long pause] .... we can fight ft [pause] and we must fight ft [pause]
against the world ft the flesh ft and the Devil ft [pause] as individuals ft
[pause] but we must also fight ft [pause] as the whole church of God ft
[long pause].... we must have God's guidance ft and grace ft [pause]....
we must go out realizing ft [pause] that without God's grace ft [pause]
we are utterly powerless ft [long pause]
Finally, it must be emphasized that the text types identified here are in
fact 'prototypes'. That is, these types represent the 'typical' text forms and
functions of English rather than absolute distinctions among texts. The
linguistic variation among texts was studied here in terms of a continuous
five-dimensional space, where the types are dense concentrations of texts
within that space. The types are based primarily on the areas of markedly
high density, the 'core' texts, and secondarily on groupings of 'peripheral'
texts. Because the peripheral texts do not occur in dense concentrations,
they are assigned to the closest type; they are sometimes relatively
dissimilar to that type, although they are even less similar to any other
type.
Even if we limit the discussion to the core text types, the analysis here
shows that the differences among types must be considered in relative
terms. I noted in the discussion of Table 2 in section 4.1 that the types are
not equally distinct. Some text types, like types 1, 2, and 7, are quite
distinct from the other types; others, like types 3 and 4, are relatively
similar to each other. In fact, there is a continuous range of variation
among texts. It is theoretically possible for a text to have any score on
each dimension, defining a continuous, multidimensional space of varia-
tion. It turns out, though, that there are regions that have very high
concentrations of texts within that space, and these regions are identified
as the text prototypes in English. In between these prototypes, there are
particular texts that combine functional emphases and linguistic forms in
complex and relatively idiosyncratic ways. These texts are not aberra-
tions; they rather reflect the fact that speakers and writers exploit the
linguistic resources of English in a continuous manner.
There are thus two complementary perspectives on linguistic variation
among texts. One perspective focuses on the continuous nature of text
variation; the other perspective, which forms the basis of the present
study, identifies the relatively few distinct types that are frequently used in
English. In theory, texts could be evenly distributed across possible
linguistic and functional characterizations. This is not the case, however.
Rather, the majority of texts are distributed across a few sets of linguistic
form/function classes, and these marked concentrations of texts are
interpreted as the major text 'types' of English. These types reflect marked
tendencies of speakers and writers to construct texts around a limited set
of functions and cooccurring linguistic forms. The typology thus gives
structure to the multidimensional space of textual variation, even though
it does not negate the continuous nature of that space.
Additional research on the dimensions of variation in English might
help identify other, more specialized text types. The typology developed
here, however, presents eight basic prototypes of texts in English. As such,
Notes
* I would like to thank Pat Clancy, Ed Finegan, and an anonymous Linguistics reviewer
for their many helpful comments on an earlier draft of this paper. Correspondence
address: Department of Linguistics, University of Southern California, University Park,
Los Angeles, CA 90089-1693, USA.
1. For example, past tense has a mean value of 40.1 and a standard deviation of 30.4 across
all of the texts, and thus an absolute frequency of 113 translates into a standardized
score of 2.4:
(113-40.1) / 30.4 * 2.4
That is, a frequency of 113 is 2.4 standard deviations from the mean of 40.1.
2. The same text corpus was used to determine the dimensions of variation and to develop
the present typology. The written texts are taken from the Lancaster-Olso-Bergen
Corpus of British English (known as the LOB Corpus); the spoken texts are taken from
the London-Lund Corpus of Spoken English. These two corpora are supplemented by
private collections of personal and professional letters.
3. The FASTCLUS procedure from S AS was used for the clustering. Disjoint clusters were
produced since there was no theoretical reason to expect a hierarchical structure. Peaks
in the cubic clustering criterion and the pseudo F statistic, both produced by the
FASTCLUS procedure, were used to determine the number of clusters to extract for
analysis. These statistics provide a measure of the similarities among texts within each
cluster in relation to the differences between the clusters. In the present case, both
measures showed a peak for the eight-cluster solution, indicating that this solution
provided the best fit to the data.
4. Core texts are those that have a distance of ten or less from their cluster centroid;
peripheral texts have distances greater than ten. This distance was chosen because it
excluded the major outliers in each cluster.
5. Clusters can have intermediate mean dimension scores for two reasons: the cluster is
characterized by frequent occurrences of both positive and negative linguistic features
on that dimension, or the cluster is characterized by the marked absence of both positive
and negative features. Either distribution of features results in an unmarked character-
ization with respect to the dimension in question.
6. Text samples are labeled as follows:
CORPUS:GENRE.TEXT-NUMBER
For example, text sample 1 is labeled LL:1.8, because it is from the London-Lund
Corpus, genre type 1 (face-to-face conversation), and text 8 number within that genre.
In the spoken-text samples, # marks intonation unit boundaries.
There is considerable overlap in the texts used in these two studies (approximately
60-70%), which biases the results in favor of converging typologies.
References
Besnier, Niko (1986). Register as a sociolinguistic unit: defining formality. In Social and
Cognitive Perspectives on Language, Jeff Connor-Linton, Christopher Hall, and Mary
McGinnis (eds.), 25-63. Los Angeles: University of Southern California.
Biber, Douglas (1986). Spoken and written textual dimensions in English: resolving the
contradictory findings, language 62, 384-414.
—(1987). A textual comparison of British and American writing. American Speech 62,
99-119.
—(1988). Variation across Speech and Writing. Cambridge: Cambridge University Press.
—, and Finegan, Edward (1986). An initial typology of English text types. In Corpus
Linguistics 77, Jan Aarts and Willem Meijs (eds.), 19-46. Amsterdam: Rodopi.
—, and Finegan, Edward (1988). The drift of English genres from the 18th to the 20th
centuries. Paper presented at the Georgetown University Round Table on Languages and
Linguistics, Georgetown. (To appear in conference proceedings, edited by Thomas J.
Walsh.)
Brown, Penelope, and Fräser, Colin (1979). Speech as a marker of situation. In Social
Markers in Speech, Klaus R. Scherer and Howard Giles (eds.), 33-62. Cambridge:
Cambridge University Press.
Chafe, Wallace L. (1982). Integration and involvement in speaking, writing, and oral
literature. In Spoken and Written Language: Exploring Orality and Literacy, Deborah
Tannen (ed.), 35-54. Norwood, N.J.: Ablex.
Connor-Linton, Jeff, Hall, Christopher, and McGinnis, Mary (eds.), (1986). Social and
Cognitive Perspectives on Language. Southern California Occasional Papers in Linguistics
11. Los Angeles: University of Southern California.
Ervin-Tripp, S. M. (1972). On sociolinguistic rules: alternation and co-occurence. In
Directions in Sociolinguistics, John J. Gumperz and D. Hymes (eds.), 213-250. New York:
Holt, Rinehart, and Winston.
Ferguson, Charles A. (1983). Sports announcer talk: syntactic aspects of register variation.
Language in Society 12, 153-172.
Finegan, Edward, and Biber, Douglas (1986). Two dimensions of linguistic complexity in
English. In Social and Cognitive Perspectives on Language, Jeff Connor-Linton, Christo-
pher Hall, and Mary McGinnis (eds.), 1-24. Los Angeles: University of Southern
California.
Grabe, William (1984). Towards defining expository prose within a theory of text
construction. Unpublished Ph.D. dissertation, University of Southern California.
Hymes, Dell (1972). Foundations of Sociolinguislics: An Ethnographic Approach. Philadel-
phia: University of Pennsylvania Press.
Longacre, Robert (1976). An Anatomy of Speech Notions. Lisse: de Ridder.
Redeker, Gisela (1984). On differences between spoken and written language. Discourse
Processes 7, 43-55.
Smith, Edward L. (1985). Text type and discourse framework. Text 5, 229-247.
Tannen, Deborah (1982). Oral and literate strategies in spoken and written narratives.
Language 58, 1-21.