0% found this document useful (0 votes)

11 views85 pages

Topics

The document discusses corpus linguistics, highlighting its significance in language studies through the analysis of large, computer-readable text collections. It outlines various types of corpora, methodologies such as corpus-based and corpus-driven linguistics, and the importance of data collection and annotation. Additionally, it addresses the benefits and limitations of corpus analysis, as well as the tools and techniques used for linguistic research.

Uploaded by

Breet Eyes

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views85 pages

Topics

Uploaded by

Breet Eyes

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 85

Lecture 7.

Corpus
linguistics
© Kushneruk Svetlana
Leonidovna

Doctor of Philology, Professor of

Chelyabinsk State University
1. The notion of a
corpus. Typology of
corpus linguistic
research

Corpus linguistics
revolutionized language
studies because it has
provided new ways of
analyzing and describing
the use of language
• Corpus linguistics
• a powerful methodology
that can be employed to
explore a wide variety of
issues related to the use of
vocabulary
• Corpus linguistics
• an area which focuses upon
a set of procedures, or
methods, for studying
language
Corpora can be defined as
large, principled and
computer-readable collections
of texts that allow analysis of
patterns of language use
across different contexts.

Corpora consist of texts

stored in an electronic format,
which enables researchers to
use special software to
conduct automatic searches
and gain insights into the
structure and regularity of
naturally occurring language.
it is empirical,
analyzing the actual
patterns of use in
natural texts

Important it utilizes a large

collection of natural
features of texts as the basis for
analysis
corpus-
based it makes extensive
use of computers for
analysis analysis

it depends on both
quantitative and
qualitative analytical
techniques
Typology of corpus linguistic research.
1.1. Mode of communication

corpora of written corpora of spoken

language language
www.publications.parliament.uk/pa/cm/cmhansrd.htm
ICE-GB is the British component of the
International Corpus of English (ICE)

www.ucl.ac.uk/english-usage/projects/ice-gb/
Video corpora
http://sourceforge.net/projects/thedrs
1.2. Corpus-based versus
corpus-driven linguistics

Corpus-based
• corpus linguistics is perceived as a methodology ⇒
corpus data are used to verify the existing theories of
language
Corpus-driven
• tends to view corpus linguistics as a theory which
offers a new way of looking at the creation of
meaning in a narrow sense and different aspects of
the use of language in a broader sense
Corpus-based
studies
use corpus data in order to
explore a theory or
hypothesis, established in
the current literature, in
order to validate it, refute
it or refine it

[McEnery & Hardie 2012: 6]

Corpus-driven
linguistics
Ø claims that the corpus
itself should be the sole
source of hypotheses
about language
Ø the corpus itself embodies
its own theory of language

[McEnery & Hardie 2012: 6].

1.3. Data collection
regime
The monitor corpus
approach
• seeks to develop a dataset which
grows in size over time and which
contains a variety of materials
The Bank of English (BoE)
many books on corpus linguistics
suggested that the BoE could be
used as a ‘monitor corpus’ to look
ongoing changes in English
The Web as Corpus
• It takes as its starting point a
massive collection of data that
is ever-growing, and uses it for
the study of language.
• The content of the web is not
divided by genre ⇒ the
material returned from a web
search tends to be an
undifferentiated mass, which
requires a great deal of
processing to sort into
meaningful groups of texts.
The sample corpus
approach
ü The sample corpora
represent a particular type of
language over a specific span
of time
ü A balanced corpus covers a
wide range of text categories
which are supposed to be
representative of the language
variety under consideration.
ü Representativeness refers to
the extent to which a sample
includes the full range of
variability in a population.
1.4. Annotated versus unannotated corpora

Corpus annotation is largely the process of

providing those analyses which a linguist
would carry out anyway on whatever data
they worked with.

Annotation is an umbrella term that refers

to procedures such as tagging and parsing
which are carried out to add linguistic
information to a corpus
Tony McEnery & Andrew Hardie
distinguish between three types of
information that can accompany a
corpus
• metadata
details about a given text such as the name
of the author
• textual markup
information about the formatting of the text
such as where italics starts and ends or when
a given speaker starts speaking
• linguistic annotation
assigning grammatical categories or tags to
all the words within a corpus
Layers of annotation

• part-of-speech (PoS)
tagging
• syntactic (grammatical)
parsing
• error annotation
• semantic annotation
• phonetic annotation
CLAWS
Constituent Likelihood Automatic Word-tagging System
http://ucrel.lancs.ac.uk/claws/
1.5. Total accountability versus
data selection

The principle of
total
accountability one way of
we must not satisfying
select a falsifiability is to
favourable subset use the entire
of the data corpus to test the
hypothesis
1.6. Multilingual versus
monolingual corpora

Many corpora are

monolingual
Ø may represent a range
of varieties and genres
of a particular language
Ø limited to that one
language
https://www.ucl.ac.uk/english-usage/projects/ice.htm
the English-Norwegian Parallel Corpus (ENPC)
https://www.hf.uio.no/ilos/english/services/knowledge-resources/omc/enpc/
Type A: Source texts in one language plus
translations into one or more other languages
• the Canadian Hansard
Ø consisting of debates from the Canadian Parliament published in
the country's official languages, English and French

• CRATER
Ø Corpus Resources and Terminology Extraction is a project involving
three languages: English, French and Spanish.
Ø consists entirely of technical texts from the International
Telecommunications Union ⇒ 5,5 million words
Ø texts are tagged with part-of-speech and morphological annotation
Type B: Pairs or groups of monolingual corpora
designed using the same sampling frame
https://www.lancaster.ac.uk/fass/projects/corpus/LCMC/
Type С: A combination of A and B

EMILLE
Enabling Minority Language Engineering
was a 3-year project at Lancaster University and
Sheffield University
Its end product was a 97 million word electronic
corpus of South Asian languages, especially those
spoken in the UK
http://www.emille.lancs.ac.uk/about.php
2. Providing data on linguistic
phenomena

Frequency and Lists of all

distribution of common words
Lexical specific words in a language or
and phrases genre
• processes involving
word formation
nouns formed with
suffixes *ism or
*ousness
• сontrasts in the use
of grammatical
alternatives
HAVE + proven/proved
sincerest/most sincere
High-frequency grammatical features ➮ modals,
passives, perfect or progressive aspect
➮ Less frequent grammatical variation
John started to walk / walking
She’d like (for) him to stay overnight
Phraseological patterns
• Collocational preferences
for specific words
true feelings, true story
• Constructions
[V NP into V-ing]
they talked him into staying
[V POSS way PREP]
he elbowed his way through
the crowd
• Collocates as a guide to meaning and usage
(n) highbrow, (adj) highbrow
• Semantic prosody
the types of words preceding the verb
outweigh
3. Corpus design

In 2005, Sinclair
proposed a set of
principles that should be
considered with regard to
the process of
developing a corpus

John Sinclair
(1933-2007)
• 1. The contents of a corpus should be selected without regard for
the language they contain, but according to their communicative
function in the community in which they arise.
• 2. Corpus builders should strive to make their corpus as
representative as possible of the language from which it is chosen.
• 3. Only those components of corpora which have been designed
to be independently contrastive should be contrasted.
• 4. Criteria for determining the structure of a corpus should be
small in number, clearly separate from each other and efficient as a
group in delineating a corpus that is representative of the language
variety under examination.
• 5. Any information about a text other than the alphanumeric
string of its words and punctuation should be stored separately from
the plain text and merged when required in applications.
• 6. Samples of language for a corpus should consist of entire
documents or transcriptions of complete speech events.
• 7. The design and composition of a corpus should be documented
fully with information about the contents and arguments in
justification of the decisions taken.
• 8. The corpus builder should retain, as target notions,
representativeness and balance. While these are not precisely
definable and attainable goals, they must be used to guide the
design of a corpus and the selection of its components.
• 9. Any control of subject matter in a corpus should be imposed by
the use of external, and not internal, criteria.
• 10. A corpus should aim for homogeneity in its components while
maintaining adequate coverage, and rogue texts should be avoided.
Corpus research

size balance

representativeness
• Representativeness concerns the issue of
how well a corpus represents a given
language or variety that is under study.
• Balance refers to the structure and type
of data used to build a corpus.
• A well-balanced corpus should consist of
several subsections that represent different
types of language use.
4. Benefits of corpus analysis
• 1. One can use corpus data to explore different aspects of
language.
• 2. Corpus linguistics is an empirical approach which relies on
frequency-based analyses.
• 3. Corpus linguistics focuses on the phraseological nature of
language.
• 4. Corpus investigations highlight different functions of language
and demonstrate the central role of context in the analysis of
linguistic behavior.
• 5. Corpus linguistics presents us with powerful tools for
exploring the distribution of specific linguistic features across a
wide range of domains of language use.
5. Limitations of corpus analysis

1. A corpus can show us only what it contains.

2. A corpus may be too small.
3. A corpus presents language out of its context.
4. A corpus cannot interpret data.
6. Types of
corpora
6.1. General and specialized corpora
General corpora consist of a wide range of
texts that represent natural language as it
is used across a variety of contexts.

Specialized corpora do not aim to

comprehensively represent a language as a
whole, but only specialized segments of it.
BNC https://www.english-corpora.org/bnc/
Michigan Corpus of Academic Spoken English
(MICASE) https://quod.lib.umich.edu/m/micase/
British Academic Written English (BAWE) Corpus of proficient
student writing
https://ota.bodleian.ox.ac.uk/repository/xmlui/handle/20.500.12024/2539#
Vienna-Oxford International Corpus of English,
VOICE (https:// www.univie.ac.at/voice/)
English as a Lingua Franca in Academic Settings Corpus,
ELFA. https://www.kielipankki.fi/corpora/elfa/
6.2. Written and spoken
corpora

The majority of corpora represent

written language

CANCODE (Cambridge and Nottingham

Corpus of Discourse in English) Corpus
large collection of spoken British English and it
has been used as a basis for a number of studies
into the specific nature of spoken language
Santa Barbara Corpus of Spoken American English
https://www.linguistics.ucsb.edu/research/santa-barbara-corpus
Hong Kong Corpus of Spoken English
http://rcpce.engl.polyu.edu.hk/HKCSE/default.htm
• Represent data from
specific historical
6.3. Historical periods and they are
(diachronic) particularly useful if
scholars are interested
corpora in the process of
language change
Corpus of Historical American English, COHA
https://www.english-corpora.org/coha/
ARCHER, A Representative Corpus of Historical English
Registers
https://www.projects.alc.manchester.ac.uk/archer/
• Often employed by
researchers working
in the area of
translation studies,
6.4. Parallel and who use them to
comparable make direct
comparisons
corpora between the same
texts written in
different languages
Oslo Multilingual Corpus
http://www.hf.uio.no/ilos/english/services/omc/
German, French and Finnish source texts, and their respective translations
Digital Corpus of the European Parliament
https://ec.europa.eu/jrc/en/language-technologies/dcep
6.5. Web as a corpus

A corpus can be
Internet is constantly
regularly updated,
growing and the
which makes it very
number of websites is
similar to monitor
increasing.
corpora.
WebCorp
Linguist’s
Search
Engine
http://wse1.webcor
p.org.uk/
is an example of an
interface that can
be used to explore
data found on the
web
Mark Davies’s Google Books interface
https://support.google.com/websearch/answer/9523832
https://books.google.com/ngrams/info
The example of a database composed of web-based
data is the NOW Corpus
https://www.english-corpora.org/now/
7. Corpus tools and types
of analysis
• Сorpora are computer-readable
collections of texts which enable
linguistic analysis by means of
special computer programs called
concordancers
• The most popular concordancers
are WordSmith tools, Sketch
Engine, MonoConc and AntConc
https://www.lexically.ne
t/wordsmith/
• https://www.laurenceanthony.net/software/antconc/
The most basic type of corpus
7.1. Frequency analysis is checking the frequency
analysis and of occurrence of a given word or a
concordancing phrase
a search by means of a web-based
interface
https://www.english-corpora.org/
• use a search box located on the left-hand side of the
interface
• type in a word or a phrase that you want to explore (a
node)
• ‘Chelyabinsk’
• word as a lemma = [chelyabinsk]
Information about the number of the
occurrences of ‘Chelyabinsk’ in the
whole corpus
Lines of texts which demonstrate how the
word is used in context: concordances
All frequency values for the word ‘Chelyabinsk’ across the
different portions of the corpus (Chart option)
Choose a different subcorpus
News on the Web
Examples of concordances for the
word ‘Chelyabinsk’
a standard format for displaying
corpus data
Key Word In Context
(KWIC)
one can easily analyze the co-
text of the node – all the words
that precede and follow it
• By analyzing the immediate
company of words, one can
explore patterns of co-
occurrence between words
and study how words tend to
form various kinds of
lexical, grammatical, lexico-
grammatical combinations
Frequency of words
across different sections
Frequency of
the word
‘Chelyabinsk’
by country
7.2. Wordlists

lists of words or phrases ranked according to their

frequency or the number of their occurrences in a
given corpus

wordlists are a powerful tool for making

comparisons between corpora that represent
different language uses

if we assume that the most frequent words are also the most useful
ones, language teachers can use this information to decide which words
should be addressed first where English is taught as a second/foreign
language
COCA
Corpus of Contemporary American English
search for words by meaning ‘industrial’
7.3. Word combinations and
n-gram analysis / cluster analysis

chunks n-grams lexical bundles

Words tend to co-occur and form collocations,

colligations and other examples of word
combinations

N-gram is a technical term used to denote word

combinations which consist of two or more words
that repeatedly occur consecutively in a corpus
Corpus software AntConc
https://www.laurenceanthony.net/software/antconc/
7.4. Keyness analysis
and keywords

• Keyword – a word which

occurs with unusual frequency
in a given text.
• Such words are useful because
they provide information
about the keyness or
specificity of a given corpus in
terms of what it is about.
Keywords tool on Lextutor
http://www.lextutor.ca/key/
Example .txt format files
Trump inaugural address
lists of keywords might be a starting point
for a qualitative analysis

concordance lines from the inaugural

address Corpus ➮
investigate how the words ‘america’
and ‘prosper’ are used in specific
contexts

Corpus 1
No ratings yet
Corpus 1
41 pages
Corpus Bases Language Studies
No ratings yet
Corpus Bases Language Studies
312 pages
McEnery Corpusit 2001
No ratings yet
McEnery Corpusit 2001
47 pages
WK 3 Key Issues For Corpora Selection
No ratings yet
WK 3 Key Issues For Corpora Selection
37 pages
Corpus 2
No ratings yet
Corpus 2
49 pages
2024 09+10 LDA Jung
No ratings yet
2024 09+10 LDA Jung
17 pages
Corpus Into, Evo, Types, Spoken
No ratings yet
Corpus Into, Evo, Types, Spoken
32 pages
The Basics of Corpus Linguistics: An Introduction For Beginners
No ratings yet
The Basics of Corpus Linguistics: An Introduction For Beginners
16 pages
Corpus Methods in Linguistics
No ratings yet
Corpus Methods in Linguistics
19 pages
Copia Di CORPUS LINGUISTICS
No ratings yet
Copia Di CORPUS LINGUISTICS
51 pages
Corpus Linguistics 1
No ratings yet
Corpus Linguistics 1
48 pages
Corpus Linguistics
No ratings yet
Corpus Linguistics
31 pages
00 General Handout
No ratings yet
00 General Handout
24 pages
Corpus-Based Studies of Legal Language For Translation Purposes
No ratings yet
Corpus-Based Studies of Legal Language For Translation Purposes
15 pages
Appiled Linguistics Corpus Linguistics
No ratings yet
Appiled Linguistics Corpus Linguistics
16 pages
Corpus Methods in Language Studies
No ratings yet
Corpus Methods in Language Studies
20 pages
What Is Corpus Linguistics
No ratings yet
What Is Corpus Linguistics
17 pages
Corpus Typology
No ratings yet
Corpus Typology
23 pages
Session 1
No ratings yet
Session 1
46 pages
CASS Gloss Final1 PDF
No ratings yet
CASS Gloss Final1 PDF
12 pages
Designing A Corpus
No ratings yet
Designing A Corpus
29 pages
The International Encyclopedia of Language and Social Interaction - 2015 - Vaughan
No ratings yet
The International Encyclopedia of Language and Social Interaction - 2015 - Vaughan
17 pages
8-CORPUS Analysis - Module 2-12-01-2024
No ratings yet
8-CORPUS Analysis - Module 2-12-01-2024
41 pages
11 - Corpus Linguistics
No ratings yet
11 - Corpus Linguistics
4 pages
Corpus Usage: Be Ata B. Megyesi
No ratings yet
Corpus Usage: Be Ata B. Megyesi
40 pages
Corpus
No ratings yet
Corpus
123 pages
Corpus Linguistics For ENG 411
No ratings yet
Corpus Linguistics For ENG 411
66 pages
Corpus and Discourse
No ratings yet
Corpus and Discourse
5 pages
Corpus Linguistics
No ratings yet
Corpus Linguistics
9 pages
1 Corpus Linguistics
No ratings yet
1 Corpus Linguistics
38 pages
RoutledgeHandbooks 9780367076399 Chapter4
No ratings yet
RoutledgeHandbooks 9780367076399 Chapter4
14 pages
Corpus Design and Types of Corpora
No ratings yet
Corpus Design and Types of Corpora
68 pages
Corpus Design and Types of Corpora
No ratings yet
Corpus Design and Types of Corpora
68 pages
Spelling Power Workbook, Grade 10 - Glencoe (PDFDrive)
No ratings yet
Spelling Power Workbook, Grade 10 - Glencoe (PDFDrive)
88 pages
Group Members:: Ayesha Azhar Bareera Akbar Irum Masood Maryam Ahmed Tahira Jabeen
No ratings yet
Group Members:: Ayesha Azhar Bareera Akbar Irum Masood Maryam Ahmed Tahira Jabeen
58 pages
Los Corpus Del Español: Javier Rodríguez Molina
No ratings yet
Los Corpus Del Español: Javier Rodríguez Molina
56 pages
Seminar 1
No ratings yet
Seminar 1
7 pages
Summary LC
No ratings yet
Summary LC
9 pages
Séquence 4 NEW PPDDFF
No ratings yet
Séquence 4 NEW PPDDFF
6 pages
Developing Linguistic Corpora A Guide To Good Practice
No ratings yet
Developing Linguistic Corpora A Guide To Good Practice
21 pages
Seminar 3
No ratings yet
Seminar 3
10 pages
Cospus Approaches in Discourse Analysis
No ratings yet
Cospus Approaches in Discourse Analysis
14 pages
Corpus Definitions. Last Year
No ratings yet
Corpus Definitions. Last Year
6 pages
Corpus Linguistics
100% (1)
Corpus Linguistics
13 pages
Berlitz Grammar Guide
100% (3)
Berlitz Grammar Guide
16 pages
Project Proposal
No ratings yet
Project Proposal
6 pages
CORPUS TYPES and CRITERIA
100% (2)
CORPUS TYPES and CRITERIA
14 pages
Corpus Linguistics: An Introduction
No ratings yet
Corpus Linguistics: An Introduction
43 pages
Cheng 2012 PP 3-8 Intro
No ratings yet
Cheng 2012 PP 3-8 Intro
6 pages
Corpus Linguistics
No ratings yet
Corpus Linguistics
23 pages
Corpus Linguistics and Corpus Analysis
No ratings yet
Corpus Linguistics and Corpus Analysis
7 pages
Corpus Linguistics
No ratings yet
Corpus Linguistics
17 pages
Corpus Linguistics
No ratings yet
Corpus Linguistics
25 pages
Corpus Linguistics Practical Introduction PDF
No ratings yet
Corpus Linguistics Practical Introduction PDF
32 pages
Corpus-Based Studies of Legal Language For Translation Purposes: Methodological and Practical Potential
No ratings yet
Corpus-Based Studies of Legal Language For Translation Purposes: Methodological and Practical Potential
15 pages
Dicción 1
No ratings yet
Dicción 1
52 pages
Corpus Based Language Studies PDF
20% (5)
Corpus Based Language Studies PDF
6 pages
Corpus Lingustics
No ratings yet
Corpus Lingustics
24 pages
A Course of English Grammar - Kechyan - The Verb
No ratings yet
A Course of English Grammar - Kechyan - The Verb
188 pages
Introduction To Corpus Linguistics PDF
No ratings yet
Introduction To Corpus Linguistics PDF
12 pages
3 English Half Yearly Exam KV
50% (2)
3 English Half Yearly Exam KV
6 pages
L'imparfait - The Imperfect Tense in French
No ratings yet
L'imparfait - The Imperfect Tense in French
4 pages
GTM - Detailed Lesson Plan in Teaching English Grade 3
No ratings yet
GTM - Detailed Lesson Plan in Teaching English Grade 3
12 pages
Put The Verbs Into The Correct Tense
No ratings yet
Put The Verbs Into The Correct Tense
5 pages
Symbolic Communication
No ratings yet
Symbolic Communication
8 pages
Bahns Jens Lexical Collocations A Contrastive View
No ratings yet
Bahns Jens Lexical Collocations A Contrastive View
8 pages
What Is A Choral Ensemble?
No ratings yet
What Is A Choral Ensemble?
4 pages
Quizz Lessons 1-6
No ratings yet
Quizz Lessons 1-6
23 pages
Summary Exp
100% (1)
Summary Exp
7 pages
Second
No ratings yet
Second
36 pages
PHDPresentatio
No ratings yet
PHDPresentatio
26 pages
Kendriya Vidyalaya Sangathan Class: Ii English April Name: L.1: First Day at School I. Complete The Sentences by Writing Ing' Form of Words Given
No ratings yet
Kendriya Vidyalaya Sangathan Class: Ii English April Name: L.1: First Day at School I. Complete The Sentences by Writing Ing' Form of Words Given
3 pages
(A2) Class Observation Report: Faculty of English Linguistics & Literature
No ratings yet
(A2) Class Observation Report: Faculty of English Linguistics & Literature
15 pages
Fclim 04 895950
No ratings yet
Fclim 04 895950
14 pages
1 s2.0 S0305750X18300664 Main
No ratings yet
1 s2.0 S0305750X18300664 Main
17 pages
TVNN
No ratings yet
TVNN
102 pages
Ilenia Skripsi
No ratings yet
Ilenia Skripsi
97 pages
Summary Chapter 10 Grammar
No ratings yet
Summary Chapter 10 Grammar
12 pages
Review U3-U6 GMF6 Vocab and Grammar
No ratings yet
Review U3-U6 GMF6 Vocab and Grammar
5 pages
Order of Operations Answers
No ratings yet
Order of Operations Answers
7 pages
Introduce Yourself. Ask People About Some Specific Information
No ratings yet
Introduce Yourself. Ask People About Some Specific Information
4 pages
Ayla Lillian Wing: Sample Lesson Plan 2
No ratings yet
Ayla Lillian Wing: Sample Lesson Plan 2
4 pages
Quiz 4 - Attempt Review
No ratings yet
Quiz 4 - Attempt Review
1 page
1 s2.0 S187704281631326X Main
No ratings yet
1 s2.0 S187704281631326X Main
5 pages
Dpapalia - Teaching Philosophy
No ratings yet
Dpapalia - Teaching Philosophy
2 pages
Anderson 2017 Context Analysis Practice-A Lesson Planning Model For Language Teacher Education
No ratings yet
Anderson 2017 Context Analysis Practice-A Lesson Planning Model For Language Teacher Education
5 pages
Adverb
No ratings yet
Adverb
9 pages
Report (AutoRecovered) 123
No ratings yet
Report (AutoRecovered) 123
3 pages
Definitions of Interference
No ratings yet
Definitions of Interference
4 pages
53-2015-Linking Words Exercise
No ratings yet
53-2015-Linking Words Exercise
8 pages
1. Definition of transformations.: The sun disappeared behind a cloud. - Сонце заховалося за хмарою
No ratings yet
1. Definition of transformations.: The sun disappeared behind a cloud. - Сонце заховалося за хмарою
7 pages
One Pager - Ir A Infinitive
No ratings yet
One Pager - Ir A Infinitive
1 page
English 1-2-1
No ratings yet
English 1-2-1
7 pages
Collins Cobuild English Grammar
From Everand
Collins Cobuild English Grammar
HarperCollins UK
4/5 (13)
Language, Linguistics, and Development Simplified
From Everand
Language, Linguistics, and Development Simplified
Narinder Mehra
No ratings yet
Analysis of a Medical Research Corpus: A Prelude for Learners, Teachers, Readers and Beyond
From Everand
Analysis of a Medical Research Corpus: A Prelude for Learners, Teachers, Readers and Beyond
Georgette Nicolas Jabbour
No ratings yet

Topics

Uploaded by

Topics

Uploaded by

Lecture 7.

Doctor of Philology, Professor of

Corpora consist of texts

Important it utilizes a large

corpora of written corpora of spoken

[McEnery & Hardie 2012: 6]

[McEnery & Hardie 2012: 6].

Corpus annotation is largely the process of

Annotation is an umbrella term that refers

Many corpora are

Frequency and Lists of all

1. A corpus can show us only what it contains.

Specialized corpora do not aim to

The majority of corpora represent

CANCODE (Cambridge and Nottingham

lists of words or phrases ranked according to their

wordlists are a powerful tool for making

chunks n-grams lexical bundles

Words tend to co-occur and form collocations,

N-gram is a technical term used to denote word

• Keyword – a word which

concordance lines from the inaugural

You might also like