0% found this document useful (0 votes)

201 views20 pages

Riassunto Using Corpora in Discourse Analysis Di Paul Baker

Uploaded by

robylove

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

201 views20 pages

Riassunto Using Corpora in Discourse Analysis Di Paul Baker

Uploaded by

robylove

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

USING CORPORA DISCOURS ANALYSIS

Paul Baker

1. INTRODUCTION

This book is about using corpora and corpus process in order to uncover linguistic patterns which
can enable us to moke sense of the ways that language is used in the construction of discourses.
Some people may know a lot about discourse analysis but not about corpus linguistic; for others
the
opposite may be the case, for others still, both areas might be equally opaque. We will begin by
giving a description of corpus linguistic and discourse.
Corpus linguistic
Corpus linguistic is the study of language based on example of real life language use. Corpora are
generally large (consisting of thousands or even millions of words), representative sample of a
particular type of naturally occurring language, so they can therefore be used as a standard
reference
with which claims about language can be measured. Electronic corpora are often annotated whit
additional linguistic information. Other types of information can be encoded within corpora, for
example in spoken corpora (containing transcript of dialogue) attributes such as sex, age, socio-
economic group and region can be encoded for each participant. This would allow language
comparasons to be made about different types of speakers. Up until the ’70 only a small number of
studies utilized corpus-based approaches and in the ’80 that corpus linguistics as a methodology
became popular. Between 1976-1991 corpus linguistic has been employed in a number of areas of
linguistic including dictionary creation, as an aid to interpretation of literary text, forensic linguistic,
language description, language variation studies and language teaching materials.

Discourse

The term discourse is used in social and linguistic research in a number of inter-related yet
different
ways. In traditional linguistic it is defined a language above the sentence or above the clause. The
term discourse is also sometimes applied to different types of language use or topic, for example,
we can talk about political discourse, colonial discourse, media discourse and environmental
discourse. A number of researchers have used corpora to examine discourse styles of people who
are learners of English. Discourse can also be defined as practices which systematically form the
objects of which they speak. In order to expand, discourse is a system of statements which
constructs an object as a set of meanings, metaphors, representations, images, stories,
statements
and so on that in some way together produce a particular version of events. Therefore, discourses
are not valid descriptions of people’s beliefs or opinions and they cannot be taken as representing
an
inner aspect of identify such as personality or attitude. They are connected to practices and
structures that are lived out in society from day to day. Discourses can therefore be difficult to pin
down or describe – they are constantly changing, interacting whit each other breaking off and
merging. One way that discourses are constructed is via language. Language is not the same as
discourse, but we can carry out analyses of language in texts in order to uncover traces of
discourses.
The shift to post-structuralism

Discourse analysts have used corpora in order to analyse data such as political texts, teaching
materials, scientific writing and newspaper articles. Such studies have shown how corpus analysis
can uncover ideologies and evidence for disadvantage. Corpus-based techniques have been
employed in studies which have attempted to analyse difference in language usage based on
identity. There are a small number of researchers who are applying corpus methodologies in
discourse analysis, this is still a cross-disciplinary field which is somewhat under-subscribed, and
appears to be subject to some resistance. All methods of research have associated problems
which
need to be addressed and are also limited in terms of what they can and can not achieve. One
criticism of corpus-based approaches is that they are too broad. More researches have
problematized corpora as constituting linguistic applied rather than applied linguistics for ex.
Widdowson claims that corpus linguistics only offers a partial account of real language because it
does not address the lack of correspondence between corpus finding and native speaker intuitions.
Others researchers should encourage corpus-based work which takes into account potential
problems, perhaps supplementing their approach whit other methodologies. There is no reason
why
corpus-based research on lexical items should not use diachronic corpora in order to track
changes
in word meaning and usage over time and several large-scale corpus building projects have been
carried out whit the aim of creating historic corpora from different time periods.
Corpus linguistics also tend to be conceptualized as a quantitative method of analysis. Before the
1980, corpus linguistics had struggled to make an impact upon linguistic research because
computers were not sufficiently powerful enough or widely available to put the theoretical
principles into practice. By the 1980, an alternative means of producing knowledge has become
available, roughly based around the concept of post-modernism and referred to as post-
structuralism
or social constructionism. Post-structuralists have developed close formulations between the
concepts of language, ideology and hegemony, based on the work of a lot of writers (for example
Gramsci). One area that corpus linguistics has excelled in has been in generating descriptive
grammars of languages based on naturally occurring language use, but focusing on language as
an
abstract system. Corpus linguistics approach can be perceived as equally time consuming. Large
numbers of texts must first be collected, while their analysis often requires learning how to use
computer programs to manipulate data: the access to corpora is not always easy and it is often
simply less effort to collect a smaller sample of data which can be transcribed and analysed by
hand, without the need to use computers or mathematical formulae.
Advantages of the corpus-based approach to discourse analysis

Reducing researcher bias (Bias: pregiudizio, predilezione/errore, falsità).

Older empirical views were concerned with the removal of reseracher bias in favour of empiricism
and objectivity, post.modern forms of research have argued that unbiased researcher is in itself a
“discourse of science”. Biases should be a prerequisite for carrying out reporting research.
The term critical realism is useful it outlines an approach to social research which accepts that we
perceive the world from a particular viewpoint, but the world act back on us to constrain the ways
that we can perceive it. A lot of academic discourse is written in an impersonal, formal style, so
introducing some sort of personal statement may still seem jarring, particularly in some disciplines.
If we declare our personal circumstances we may still end up being biased in ways which have
nothing to do whit who we are but are more concerned with the way that human being process
information. We also tend to succumb to other cognitive biases. The hostile media effect (Vallone
et al 1985) shows that ideological partisans tend to consistently view media coverage as being
biased against their particular side of the issue. People also tend to focus more on information that
they encounter at the beginning of an activity (primary effect). The presence of such cognitive
biases can be particularly problematic when carrying out discourse analysis.
By using a corpus we are able to place a number of restrictions on our cognitive biases. It
becomes
less easy to be selective about a single newspaper article when we looking at hundreds of articles.
Of course, we cannot remove bias completely. Corpus researchers can theoretically be just as
selective as anyone in choosing which aspects of their research to report or burry. Their
interpretations of the data can also reveal bias. With corpus analysis, there are usually a lot of
results, and sometimes, because of limitations placed on researchers, selectivity does come into
play. But at least with a corpus, we are starting from a position whereby the data itself has not
been
selected in order to confirm existing conscious biases. But, there are usually exceptions and it is
important to report these alongside the overall patterns or trends, but not over-report them either.
The incremental effect of discourse

As well as helping to restrict bias, corpus linguistic is a useful way to approach discourse analysis
because of the incremental effect of discourse. Discourses are circulated via language use and the
task of discourse analysis is to uncover how language is employed to reveal underlying discourses.
Becoming more aware of how language is drawn on to construct discourses or various ways of
looking at the world, we should be more resistant to attempts of texts to manipulate us by
suggesting to us what is “common-sense” or “accepted wisdom”. A single word, phrase or
grammatical construction may suggest the existence of a discourse but it can sometimes be
difficult
to tell whether such a discourse is typical or not. A word, phrase or construction may trigger a
cultural stereotype. A lot of human communication is not a matter of choice but is constrained by
normativities which are determined by patterns of inequality. Consulting a large corpus of genral
British English, we find the words “confined” and “wheelchair” have fairy strong patterns of co-
occurrence: the phrase “confined to a wheelchair” occurs 45 times in the corpus and the more
neutral term “wheelchair user(s) occurs 37 times. There are enough cases to suggest that one
discourse of wheelchair users constructs them as being deficient in a range of ways. Every time we
read or hear a phrase like “wheelchair bound” or “despite being in a wheelchair”, our perception of
wheelchair users are influenced in a certain way.
Resistant and changing discourses

The repeated patterns of language use demonstrate evidence of particular hegemonic discourses
or
majority “common-sense” ways of viewing the world, corpus data can reveal the opposite.
Discourses are not static, they continually shift position. A hegemonic discourse ten years ago may
be viewed as a resistant or unacceptable discourse today and this can be shown by looking at
changing frequencies of word use in a diachronic corpus, or by comparing more the one corpus
containing text from different time periods. If we compare 2 equal corpora of British English
containing written texts from the early 1960 and 1990 we see that in 1990 corpus there are various
type of words which occur much more frequently than in 1960 corpus. In addiction, we can find
that certain terms have become less frequent (girl, Mr, Mrs) were more popular in 1960 than they
were in 1990, suggesting that perhaps sexist discourses or formal ways of addressing people have
become less common. May be that a word is no more or less frequent than it used to be, but its
meanings have changed over time. For ex. in the early 1960 the word “blind” appears in a literal
sense referring to people or animals who cannot see, and in the 1990 corpus it being used in a
range
of more metaphorical (and negative) ways.
Triangulation

Tognini and Bonelli (2001) makes a useful distinction between corpus-based and corpus-driven
investigations. A corpus-driven analysis proceeds in a more inductive way (the corpus is the data
and the patterns in it are noted as a way of expressing regularities in language.
Triangulation is a term coined by Newby in 1977 and is now accepted by most researchers. There
are several advantages of triangulation: it facilitates validity check of hypotheses, it anchors finding
in more interpretation and explanations, and it allows researchers to respond flexibly to unforeseen
problems and respects of their research.

Some concerns

Corpus linguistics are a useful method of carrying out discourse analysis, there are still a few
concerns which are necessary to discuss.
First, corpus data is usually only language data (written or transcribed spoken) and discourses are
not confined to verbal communication. Discourses can be embedded within images, for ex. pictures
of heterosexual couples often occur in advertising. In many cases discourses can be produced via
interaction between verbal and visual texts. The social condition of production and interpretation of
texts are important in helping the researcher understand discourses surrounding them.
Researchers
may choose to interpret a corpus-based analysis of language in different ways, depending on their
own positions. For ex. people from socially disadvantages groups tend to use more non standard
language and taboo terms than those form more advantaged group: in this case terms helping to
show identity group membership.
A corpus-based analysis will tend to place focus on patterns. Frequent patterns of language do not
always necessarily imply underlying hegemonic discourses. The power of individual texts or
speakers in a corpus may not be evenly distributed. General corpora are often composed of data
from numerous sources (newspaper, novel, letters, etc.). We way be able to annotate texts in a
corpus to take into account aspects of production and reception, such as author occupation/status
or
readership, but this will not always be possible. A hegemonic discourse can be most powerful
when
it does not even have to be invoked, because it is just taken for granted.
A corpus based analysis of language is only one possible analysis out of many, and open to
contestation. It is an analysis which focuses on norms and frequent patterns within language.
There
can be analyses of language that go against the norms of corpus data. Corpus linguistic does not
provide a single way of analyzing data. There are numerous ways of making sense of linguistic
patterns: collocations, keywords, frequency list, clusters, dispersion plots, etc. We may decide, for
ex., to investigate for co-occurences in a corpus in relation to how discourses are formed.

2. CORPUS BUILDING

Introduction

One the potential problems with using corpora in the analysis of discourse is decontextualized
data.
The relationship between different texts in a corpus or between sentences in the same file may be
obscured in quantitative analyses. The process of finding and selecting texts, obtaining
permissions,
transferring to electronic format, checking and annotating files may also provide the researcher
with
initial hypotheses as certain patterns are noticed, and such hypotheses could form the basis for the
first stages of corpus research.

Some types of corpora

Corpora can be categorized into types. In term of discourse analyses the first most important type
of
corpus is called a “specialized corpus” used to study aspects of a particular variety or genre of
language. A good example of a specialized corpus would be the Michigan Corpus of Academic
Spoken English; the texts in this corpus consisting of transcript of spoken language recorded in
academic institutions across America.
It may be useful to make a distinction between corpora and text archives or database. An archive
is
generally defined as being similar to a corpus but the difference between an archive and a corpus
must be that the latter is designed for a particular “representative” function. An archive or database
is simply a text repository, often huge and opportunistically collected, and normally not structured.
Corpora tend having a more balanced. Archives or databases may contain all of the published
work
of a single author, or all of the edition of a newspaper from a given years. An aspect of traditional
corpus is in sampling: many corpora are composed of a variety of texts, of which samples are
taken.
This technique of sampling is in place to ensure that the corpus is not skewed by the presence of a
few very large single text taken from the same source. For the purpose of discourse analysis, it
may
a good idea to built a corpus which includes samples taken at different point from complete texts.
Regarding using corpora for discourse analysis, it is possible to carry out corpus-based analyses
on
much smaller amounts of data. If we are interested in examining a particular genre of language,
then
it is not usually necessary to built a corpus consisting of millions of words, especially if the genre is
linguistically restricted in some way. One consideration when building a specialized corpus in order
is how often we would expect to find that subject mentioned within it. Therefore, when building a
specialized corpus for the purposes of investigation a particular subject or set of subject, we way
want to be more selective in choosing our texts, meaning that the quality or content of the data
takes
equal or more precedence over issues of quantity.
An aspect of corpus based analysis than can often be extremely useful in terms of analyzing
discourses is the process of checking changes over time.
A diachronic corpus in a corpus which has been built in order to be representative of a language or
language variety over a particular period of time, making it is possible for researchers to track
linguistic changes within it. A diachronic corpus may not be able to fully take into account language
change, and it can introduce a more dynamic aspect into corpus based analysis.
A reference corpus is what purist would refer to when they use the term “corpus”. It consist of a
large corpus: usually consisting of millions of words from a wide range of texts.

Capturing data

There are also good reasons for building a specialized corpus. One of the easiest ways to collect
corpus texts is to use data which already exist in electronic format. For ex. the United Kingdom
Parliament website contains full transcripts of daily debates from the British House of Lords and
house of Commons. There are a lot of internet archived: Bibliomania, the Oxford Text Archive, the
Electronic text Centre, etc. It is also possible to save files in other format which retain the images,
styles and layout of the page. A problem with saving files is that we need to assume that all of the
language data we are collecting is going to be recognizable in plain text, which is not always the
case. One problem with saving the entire page from a website address is that we way end up with
unwanted text such as menus, titles or links to other pages. Once the site has been copied, it may
still be necessary to strip the files of unwanted text in any case, and some websites are
constructed
in order to prevent copiers from taking their content in this way.
Scanning and keying in

If text cannot be obtained from the internet, then there may be other ways that they can be
collected
electronically. For example British newspapers (The Guardian or The Independent) publish CD-
Rom archives of texts. If existing electronic sources are unavailable, then two other options present
themselves. The first involves converting paper document by running them through a scanner with
OCR (Optical character recognition) software. In general, the best types of texts that respond to
OCring are those which are published in a straightforward format. The usually last resort of the
corpus builder is to key in the text by hand. There are numerous companies which offer keying in
service.

Spoken texts

Certain types of texts will present problems to the corpus builder. Written data is generally much
easier to obtain than spoken data. Conversations and monologues will need to be transcribed by
hand and there is a range of information: prosodic information, paralinguistic information, non
linguistic data, pauses, etc. Sometimes archives containing transcripts of spoken data are already
in
existence. These transcripts may have been cleaned or glossed in order to remove or limit the
effect
of interruption as false starts, hesitations, etc. Scripted data always reflect how people really
speak.

Online texts

Different types of problems need to be overcome when collecting data from the internet,
particularly from of archived written texts that were originally published elsewhere. One growing
area of interest is consisting of the language which occurs in text messages, emails, chat rooms,
bulletin boards and newsgroups. It may be a good idea to save different version of the corpus, one
which retains everything in the format it originally occurred in, the other which only contains
unique “first time” entries.

Permissions

Before text are copied into a corpus database, compilers must seek and gain the permission of the
authors and published who hold copyright for the work, or the informed consent of individuals
whose right to privacy must be recognized. Often obtaining signed permissions can be a slow and
complex task, as individual permission must be gained for each text that is placed in the corpus.
Commercial corpora are often large and contain representative text from many sources. Different
publishers and funding bodies may vary in regard to their attitude towards the necessity of
permissions. In all cases to obtain permission may help to safeguard the researcher.

Annotation
It is usually recommended that corpus builders employ some from of annotation scheme to their
text files in order to aid analysis and keep track of the structure of the corpus. Because the
convention for representing typographical features in electronic texts can vary depending on the
software used to edit the text. One system in Standard Generalized Markup language (SGML)
created in the 1980 as a standard way of encoding electronic text by using codes to define
typeface,
page layout, etc. In general the codes are enclosed between less than and greater than symbols: <
>.
The Text Encoding Initiative (TEI) is a related system developed in the 1990 which specifies a set
of SGML codes which are to be specifically used for different types of text mark up. Corpus
analysis packages tend to be capable of handling SGML codes but they may be less equipped to
deal an ad hoc coding system created by a researcher working alone.
Headers can be useful form of record keeping, particularly if a corpus consist of many files from
different sources, created at different times. The headers may also contain information about the
author or genre of the file. Other meta-linguistic information that could appear in headers for
written files includes publication date, medium of text, level of difficulty, audience size, age and
sex of author and target audience.
Grammatical annotation is one procedure that is commonly assigned to corpora at some stage
towards the end of the building process; can be useful in that it enables corpus users to make
more
specific analyses.
The important point is that different form of annotation are often carried out on corpora and can
result in more sophisticated analyses of data but that this is not compulsory.

Using a reference corpus

Obtaining access to a reference corpus can be helpful for two reasons: first, reference corpora are
large and representative enough of a particular genre of language, that they can themselves be
used
to uncover evidence of particular discourses; secondly, a reference corpus acts as a good
benchmark
of what is normal in language by which your own data can be compared to. We can compare a
large
reference corpus to a smaller corpus in order to examine which words occur in the smaller text
more
frequently than we would normally expect them to occur by chance alone. The access to a
reference corpus is potentially useful for carrying out discourse analysis, even if the corpus itself is
not the main focus analysis. Perhaps more problematic issue is to do with gaining access to
corpora.
researchers will be at an advantage. some corpus builders allow user limited access for a trial
period
before buying a smaller sample of their corpus.

3 FREQUENCY AND DISPERSION

Introduction

Frequency is one of the most concept underpinning the analysis of corpora. Frequency list can be
employed to direct the researcher to investigate various parts of a corpus, how measures of
dispersion can reveal trends across texts and how frequency data can help to give the user a
sociological profile of a given word or phrase enabling greater understanding of its use in particular
contexts. Related to the concept of frequency is that of dispersion.
Join the club

Frequency and dispersion can be employed in a small corpus of data for example in a corpus
which
consists 12 leaflets advertising holidays published in 2005 with the goal to investigate discourses
of
tourism.
Holidays brochures are interesting text type to analyse because they are an inherently persuasive
from of discourse. their main aim is to ensure that potential customers will be sufficiently impressed
to book holiday.

Frequency counts

Using the corpus analysis software WordSmith, a word list of the 12 text files was obtained. A word
list is a list of all of the words in a corpus along with their frequencies and percentage contribution
that each word makes towards the corpus. The most frequent words in the corpus are grammatical
words: pronoums, determiners, conjunctions, prepositions. There are words describing holiday
residences (studios, facilities, apartments), and other attractions (beach, pool, club).

Considering clusters

We need to consider frequencies beyond single words. using WordSmith it is possible to derive
frequency lists for clusters of words. BAR and CLUB are the most frequent lexical lemmas in the
holiday corpus, they are also only lemmas in the top ten that relate to alcohol. We can consider
another class of words: verbs, that they play a particularly role in tourist discourse. In holiday
corpus the most frequently verbs are imperative verbs clusters.

Dispersion plots

Another way of looking at the word is to think about where it occurs within individual texts and
within the corpus as a whole. A dispersion plot gives a visual representation of where a search
term
occurs in the corpus. The plot has also been standardized so that each file in the corpus appears
to
be of the same length. this is useful in that it allows us to compare where occurrences of the
search
term appears, across multiple files.

Comparing demographic frequencies

By examining the frequency list the most frequent informal terms in the corpus were collected and
are presented in a table where it necessary to explore the context of some words in detail, in order
to
remove occurrences that were not used in a colloquial or informal way. Most of the terms occurred
more often in written, rather than spoken British English. the spoken text tend to contain of the
more informal meanings of the words. For the authors of the holiday leaflets to use informal
language in order to index youthful identities we need to assume that they believed that such
language was typical of this identity and that the target audience would also read the leaflets in the
same way. By using a form language which is strongly associated with youthful identities the
audience may feel that they are been spoken to in a narrative voice that they would find desirable
or
at least are comfortable with. The use of colloquialisms also contributes to normalization of certain
types of youthful identities. it suggest a shared way of speaking for young people, who do not use
informal language may be alerted to a discrepancy between their linguistic identities and those of
people featured in the brochure.

Conclusion

The analysis of frequent lexical lemmas revealed some of the most important concepts in the
corpus
and a more detailed analysis of clusters and individualize incidences containing these termes
revealed some if the ways that holidaymakers were constructed. By investigating how hight
frequency informal language occurred in a reference corpus of spoken British English, we were
able
to gain evidence in order to create hypotheses about how the readership of the holidays leaflets
were
constructed.

4 CONCORDANCES

Introduction

A concordance analysis is one of the most effective techniques which allows researchers to carry
out a sort of close examination. A concordance is simply a list of all the occurrences of a particular
search term in a corpus and is also sometimes referred to as key word in context or KWIC,
althought it should be noted that "key word in context" has a different meaning to the concept of
key words. Here key word simply means the word that is currently under examination and that can
be any word that takes the interest of the researcher. In order to demonstrate how concordances
can
be of use to discourse analysis we need to carry out an examination of a new set of data, a corpus
of
newspaper articles that are one of the easiest text types to collect. The relative ease in which
newspaper data can be appropriated for corpus use suggests that it should be employed whit care
rather than overused; newspaper data is very useful area of producing and reproducing
discourses.
Journalist are able to influence readers by producing their own discourses or helping to reshape
existing one. texts can only take on meaning when consumers intercat with them. Discourses
within newspapers are usually the result of collaboration between multiple contributors and single
articles may express a variety of views on the same object. When using a corpus of newspaper
articles it is important to bear in mind that the processes of production and reception of any
particular article are complex and multiple.

Investigation discourses of refugees

Refugees are a particularly interesting subject to analyses in term of discourse because they
consist
of one of the most relatively powerless group in society. One aspect of this conceptualization of
discourse relating to ways of looking at the world is that it enables or encourages a critical
perspective of language and society. The minority group are frequent topics of political talk and
text, but have very little control over their representations in political discourse.
In the media, refugees are rarely able to construct their own identities and discourses , but instead
have such identities and discourses constructed for them, by more powerful spoken people.
In order to construct the corpus of newspaper articles an internet based archive called Newbank
was
used, that contains articles from a large variety of British broadsheet and tabloid newspapers
including Daily Mail, Daily Mirror, The Guardian and The Times. Only articles which are
published in the year 2003 were considered, which included the words refugee or refugees. We
first
need to scan the concordances lines, trying to pick out similarities in language use, by looking at
the
words and phrase which occur to the left and right hand sides of the terms refugee and refugees.

Sorting concordances

The concordance lines are presented to us in order in which they occur in the corpus.
We could sort the list alphabetically one or more places to the left or right of the search term.
Refugees are also constructed in terms of metaphors which construct them as transported goods
or a
packages again, as a token of their dehumanization.

Analysing the remainder

Carrying out further sort on the concordance didn't reveal any ore interesting patterns or clues
about
discourses. One possible avenue of research is simply to consider all of concordance lines that
have
not already been used to demonstrate the discourses of refugees. Some of concordance lines are
longer that usual, meaning that more context needed to be taken into account before patterns of
meaning could be derived from the concordance.

Semantic preference and discourse prosody

Our concordances based analysis of the termes refugee and refugees in the small corpus of
newspaper articles has been useful in revealing a range of discourses: refugees as victims, as
recipients of official attempts to help, as a natural disaster and as a criminal nuisance. A
concordance analysis elucidates semantic preference. Semantic preference is the relation between
lemma or word form and a set of semantically related words. Semantic preference also occurs with
multi words units and is therefore related to the concept of collocation but focus on a lexical set of
semantic categories rather than a single word or a related set of grammatical words. However,
semantic preference is also related to the concept of discourse prosody where patterns in
discourse
can be found between a word, phrase or lemma and a set of related words that suggest a
discourse.
The difference between semantic preference and discourse prosody is not always clear-cut.
Semantic preference denotes aspects of meaning which are indipendent of speakers, whereas
discourse prosody focuses on the relationship of a word to speakers and hearers, and is more
concerned with attitude. Another term is semantic prosody which has been used by other
researchers in a way which makes it akin to discourse prosodies. We can use this term for
analysing
the language used in a type of phrases. A corpus-based approach is useful in that it helps to give
wider view of the range of possible ways of discussing refugees. Corpus data can help to establish
which sorts of languages strategies are most frequent or popular (for ex. the refugees as water
metaphor was found to be much more frequent other metaphors).

Points of concern

A concordance analysis is one of the more qualitative form of analysis associated with corpus
linguistics. It is responsability of the analyst to recognize linguistic patterns and also to explain why
they exist. One aspect of the concordance analysis that we need considered is that when carrying
out searches on a particular subject as well as euphemisms and similes for that subject, it might
also
be case that it is referred to numerous times with determiners or pronouns. However concordances
of pronouns and determiners are likely to include many irrelevant examples.
Step-by-step guide to concordance analysis

1. Build or obtain access to a corpus;

2. Decide on the search term (e.g. refugee) bearing in mind that search can be expanded to
include
plurals, euphemisms, anaphora and proper nouns of relevant individuals.
3. obtain a concordance of the search term(s);
4. Clean the concordances by removing repetitions or other lines are not relevant;
5. Sort he concordance repeatedly on different words to the left and right while looking for evidence
of grammatical, semantic or discourse patterns;
6. Look for further evidence of such patterns in the corpus;
7. Investigate the precence of particular terms;
8. When no more patterns can be found, carry out a close analysis of the remaining concordance
lines;
9. Note are or non-existent cases of discourses based on your own intuitions;
10. attempt to hypothesize why the patterns appear and relate this to issue of text production and
reception.

5 COLLOCATES

Introduction

Carrying out a close analysis of search terms via a concordance can be helpful in revealing traces
of
discourses within texts; concordance can be in some cases can consist of hundreds or even
thousands lines. Researchers can rely on sampling methods which are helpful in reducing the
lenght
of time spent on analysis, but a problem is that may also fail to reveal salient aspects of the
concordance. Another problem is that patterns are not always as clear-cut in a concordance as we
would like them to be. In the British National Corpus all words co-occur with each other to some
degree. When a word regularly appears near another word, and the relationship is statistically
significant then such co-occurences are referred to as collocates and the phenomena of certain
words frequently occurring next to or near each is collocation. Collocation is a way of
understanding meanings and associations between words which are otherwise difficult to ascertain
from a small-scale analysis if a single text. Words can take on meaning by the context that they
occur in. To Explore how discourse analysis can be carried out by focusing primarily on
collocation. In order to carry out a linguistic analysis it is useful to examine the usage of words in a
corpus. In the British National Corpus, a large corpus, we can view that it as being more or less
representative of general British English.

Deriving collocates

There are a number of different procedure of collocation calculated. The simplest is to count the
number of times a given word appears within, say a 5 words window to the left or right of a search
term. If we use this procedure we get a list of words. One the problem with this technique is that
hight frequency words generally tend to be function words which does not always reveal much of
interest, particularly in term of discourse. A number of statistical tests take into account the
frequency of words in a corpus and their relative number of occurrences both next to and away
from
each other. One such test is called Mutual Information (MI). Mutual information is calculated by
examining all of the places where two potential collocates occur in a text or corpus. An algorithm
then computes what the expected probability of these two words occurring near to each other.

Identifying discourses from collocates

The word bachelor occurs more frequently in the corpus, than more spinThe word bachelor occurs
more frequently in the corpus, than more spinster. Examining concordances which contain
bachelor
along with these collocates it is clear that they all relate to having a degree (e.g. bachelor of arts).
Here the meaning of bachelor (a type of degree) is different to the meaning we are concerned with
(a man who has not married). Homonyms are a rare and accidental phenomenon.
Polysemy, where two words with the same spelling have interrelated meanings are much more
common. While the collocates of bachelor which suggest a meaning of university education no
longer have the same association with bachelor as unmarried man, the two meanings are perhaps
due to historical polysemy rather than being accidental homonyms. What we seen with the
strongest
collocates of bachelor is a somewhat dualistic picture of discourse. A young bachelor receives a
positive discourse prosody connected to living a happy, possibility urban existence. This is
supported by an analysis of the collocates days, life, eligible, and party. The positive discourse
prosody is tied to the fact a bachelor life is expected to be a short-term situation. When bachelor
becomes a long-term state, then it is viewed as more problematic: repeatedly characterized in a
corpus by poverty, eccentricity, old age and loneliness. There is an implication than there is
something wrong or unfortunate about a man who goes through his whole life without marrying.

Resistant discourses

A collocational analysis has shown us some of the most salient discourses and different ways of
referring to bachelors and spinsters. A collocational analysis is useful for two reasons. First it
provides a focus for our initial analysis which is particularly helpful when a large number of
concordance lines need to be sorted multiple times in order to reveal lexical patterns. Secondly, it
gives us the most salient lexical patterns surrounding a subject from which a number of discorses
can be obtained. When two words frequently collocate, there is evidence that the discourses
surrounding them are particularly powerful perhaps to the point where even one half of the pair is
likely to prime someone who hears or reads that words to think of the other half. Collocates can act
as triggers, suggesting unconscious associations which are ways that discourses can be
maintained.
Corpus data gives us one way of understanding language, based on what is typical.
Collocates may also contain traces of resistant discourses, which are worth exploring in the
remaining concordance lines.

Collocational networks

We have tended to consider collocates individually or we have looked at groups of collocates

together because their meanings are similar (days/life/living). This methodology based on
researcher interpretation did prove to be productive.
Collocates are useful in that they help to summarize the most significant relationship between
words
in a corpus. This can be incredibly time-saving and give analysts a clear focus. Collocates are also
useful in helping to spell put mainstream discourses while a closer analysis of them can reveal
resistant discourses too. It is important that we do not over interpret collocational data. We should
check the context that collocates occur in by examining concordances in more detail. There are
different methods of calculating collocation and different results. We have considered collocates of
bachelor and spinster n a corpus of general British English and we focused on a particular genre of
text (novels, newspapers, etc).

Step-by-step guide to collocational analysis.

1. Build or obtain access to a corpus;

2 Decide on a search term (e.g. bachelor) bearing in mind that the terms can be expanded to
include
plurals or other forms, euphemisms, anaphora or relevant proper nouns;
3. obtain a list of collocates;
4. Decide how many collocates you want to look at and decide to “clean” the collocates lists by
removing proper nouns or grammatical words;
5. Can the collocates be groups semantically, thematically or grammatically a basis for the order;
6. Obtain concordances of the collocates and look for patterns within the context;
7. Consider contesting discourses;
8. Look at concordances lines of the search term that do not contain collocates;
9. Attempt to explain why particular discourse patterns appear around collocates and relate this is
to
issues of text production and reception and/or etymologies of particular words.

7 KEYNESS

Frequency revisited

A frequency list can help to provide researchers with the lexical foci of any given corpus.
Investigating the reason why a particular word appears so frequently in a corpus can help to reveal
the presence of discourses, especially those of a hegemonic nature. Compiling a frequency list is
an
important first step to giving un idea about what to focus on. Simple frequency possess limitations.
A new research is on political debates on fox hunting in the British House of Commons. Politicians
are aware that they are playing a language game with huge consequences, and that they must
appear
to speak with authority and conviction: they often developed a style of speaking which is opaque,
vague or empty. Has been build a corpus of parliamentary debates on the issue. The majority of
Commons members voted for the ban to the ahead, although in each debate a range of options
could
be debated and voted upon. The first procedure is create a word list of the "fox hunting" corpus.
The
corpus size was 129.798 words. The most frequent words tend to be grammatical item such as
determiners, prepositions and conjunctions. Sometimes grammatical items in themselves can be
indicative of particular discourse, for ex. if a conjunction like "and" is repeatedly used to stress a
connection between 2 objects of discussion. The most frequent lexical words are perhaps more
interesting, here we find words that we would have expected or guessed to appear. There are
terms
of adress associated with the context of a parliamentary debate (Mr, friend, right); other words
associated with the context parliament (house, minister) and words connected with the subject
under
discussion. On way of finding out what lexical items are interesting in a frequency list is to compare
more than one list together. If a word occurs comparatively more often in, say, a corpus of modern
English children's stories, when compared to British National Corpus we could conclude that such
a
word has high saliency in the genre of children's stories and is worth investigating in further detail.
The corpus has been split into two. The speech of all of the people who voted to ban fox hunting
and the speech of those who voted for hunting to remain. Some words appear in a list and almost
all
are connected to either the subject under discussion or the context where the debates took place:
parliament.

Introducing keyness

Using WordSmith it is possible to compare the frequencies in one wordlist against another in order
to determine which words occur statistically more often in wordlist A when compared with wordlist
B and vice versa. Then all of the words that do occur more often than expected in one file when
compared another are compiled together into another list, called a keyword list. This list gives a
measure of saliency whereas a simple word list only provide frequency. WordSmith takes into
account the size of each sub corpus and the frequencies of each word within them. Keyword list
tend to show up 3 types of words. First there are proper nouns, secondly nouns, verbs, adjectives,
adverbs and finally a hight frequency of grammatical words.

Analysis of keywords

The majority of the keywords found are the "aboutness" variety in both parts of the list. The word
"criminal" is used by those were opposed to a ban on hunting and it occurs 38 times in the
collective speech of the pro-hunters and only twice in the speech of the anti-hunters. It is
necessary
to examine individual keywords in more detail, by carrying out conconrdance of them and looking
at their collocates. When a concordance of "criminal" was carried out on the corpus data, it was
found that common phrases containing the word "criminal" included the criminal law, a criminal
offence, criminal santions, etc.
The lemma MAKE seem to be a relatively important collocate of "criminal". Looking at the
concordance of the word "criminal" there are other concordance lines which suggest a similar
pattern, but do not include MAKE.
Terms like "invoke" or "impose" are rhetorical strategies used to a particular discourse position.
The
word "dogs" occurs 182 times in the speech of the anti-hunters and 74 times in the speech of those
who want hunting to remain legal. A concordance of the "use of dogs" was carried out for the
whole
corpus. The keyword list has given us a small number of words to examine and once the proper
nouns have been discounted this leaves us with just 16 words in total. Finally consider another
used
by pro hunt speakers: practices. This word is interesting because it is difficult to determine exactly
what it means. it occurs as a plural (veterinary practises, slaugther practises ans livestock
practices,
etc.). This term is therefore used to refer to a multitude of technique connected to animals, or it is
also creates an association between non-lethal ways of dealing with animals.

Using a reference corpus

So far our keywords analysis has been based on the idea that there are 2 sides to the debate and
that
by comparing one side against another we are likely to find a list of keywords which will then act as
signpost to the underlying within the debate on fox-hunting. Our analysis so far has uncovered
some interesting difference between the 2 sides of the debate. We need to separating all of the
speech in the different debates into different files. the task of creating these files can be off-putting
and in any case not always necessary.
In term of proportions taking into account the relative size of the sub-corpora the anti-hunt
speakers
actually used, for ex. the term "cruelty" less than pro-hunters. Examining this word in more detail, it
becomes apparent that although it occurs with a frequency on each side of the debate.
Comparing a smaller corpus or set of texts to a larger reference corpus, is therefore a useful way
of
determining key concepts across the smaller corpus as a whole. For many studies where the text
or
set of text under scrutiny is relatively uniform, using a reference corpus may be all that is needed.
using a reference corpus may be useful in revealing those words that are under represented in the
data. When comparing a smaller corpus with a reference corpus, WordSmith also gives a list of all
the negative keywords and this list doesn't take into account word which appeared zero times in
the
small corpus. Negative keywords can help to show topics or words of style which are not favoured
in a corpus, which in itself can be illuminating.

Key clusters

Another way of spotting words which occur frequently in 2 comparable sets of text but may be used
for different purposes is to focus on key clusters of words. using WordSmith it is possible to derive
wordlists of clusters of words, rather than single words. WordSmith allows the user to specify the
size of the cluster under examination generally the larger the cluster size we specify the fewer the
number of key cluster that are produced. Taking a cluster size of three, a list of key clusters was
obtained by comparing the speech of pro-hunters with those were against hunting. This list
contained some interesting cluster. When reporting the analysis of keyness, it is worth mentioning
dispersion, particularly in cases like this where dispersion brings up something unexpected. This
requires a more close analysis of words and phrases in the corpus, rather than simply recounting
frequencies from wordlists.

Key categories

A simple key list will reveal differences between sets of texts or corpora, it is sometimes the case
that lower frequency words will not appear in the list because they do not occur often enough to
make a sufficient impact. This may be a problem as low frequency synonyms tend to be
overlooked
in a keyword analysis. Finding key categories could help to point the existence of particular
discourse types, they would be a useful way of revealing discourse prosodies. In order for such
analyses to be carried out it is necessary to undertake the appropriate forms of annotation. The
automatic sematic annotation system used to tag the fox-hunting corpus was the USAS (UCREL
Semantic Analysis System). Tags can be assigned a number of plus or minus codes to show
where
meaning resides on a binary or linear distinction. Once the semntic annotation had been carried
out,
word lists of the sides of the fox-hunting debate were created and compared with each other to
create a keyword list. From this list, the relevant key semantic tags were singles out for analysis.

Possible uses of keywords

A keywords analysis can therefore be used to compare 2 or more sides of an argument as in

political
debates or it could simply be used to compare the linguistic styles of different speakers. A keyword
analysis can also be carried out on texts which are from different genres. Keywords taken from
comparing 2 sets of party political texts were examined in order to diachronic change between
traditional Labour and the values of the New Labour party headed by Tony Blair in the UK. New
Labour keywords included "partnership, new, deal, business, etc. suggesting a more managerial
style of government which focused on business interests and competition. British English contain
more time and order oriented keywords: afterwards, yesterday, again, last, secondly. Keywords not
only point to the existence of the discourse, but they help to reveal the rhetorical techniques that
are
used in order to present discourses as common sense or the correct ways of thinking.

Conclusion

A keyword list is a useful tool for directiong researchers to significant lexical differences between
texts. Carrying out comparisons between 3 or more sets of data, grouping infrequent keywords
according to discursive similarity, showing awareness of keywords or dispersion plots, carrying out
analyses on key cluster will enable researchers to obtain a more accurate picture of how keywords
function in texts. Keywords can reveal a great deal about frequencies in texts which is unlikely to
be
matched by researchers intuition. As with all statistical methods, how the researchers choose to
interpret the data is ultimately the most important aspect of corpus-based research.

7 BEYOND COLLOCATION

Introduction

In this chapter we focus on aspects of discourse analysis which are more concerned with
grammatical rather than lexical patterns. We will be considering a number of ways that a more
grammar based analysis can be of value to researchers looking at discourse via corpora. We look
at
a single term, the lemma ALLENGE and its forms. The analysis of this lemma was inspired by
reading of an article on a news website about an alleged rape. The article acted as a springboard,
raising a number of questions about ALLENGE. A corpus analysis would help us to establish
whether or not the patterns of language found in the article are typical or atypical of general
English
usage. The verb allege and its related forms, is therefore a key aspect in the discursive
construction
of stories about rape.

Nominalization

Nominalization involves a process being converted from a verb or adjective into a noun or a multi-
noum compound (e.g. discover --> discovery, solve --> solution). Nominalizations often involve
reductions or deletetions in some way.
In the BNC, the word “allenge, allenging, alleged, alleges, allegendly, allegation and allegations”
collectively occur domain. They also occur much more often in written to be spoken texts than
written or spoken texts. The lemma "allege" is associated also with a variety of forms of news
reporting.

Searching with grammatical tags

It is important that we distinguish between the verb and adjectival forms of the word alleged. The
BNC is grammatically tagged: the tag AJ0 stands for adjective so we can find only the adjectival
uses of alleged by carrying out a search on allegend=AJ0; the tag VVD contains the past tense,
the
tag VVN a past participle. The categorization of grammatical forms of alleged in the BNC is made
even more complicated by the presence of portmanteau tags.
The adjectival form alleged is also quite popular in the corpus and this term occur 1687 times.

Collocation of grammatical forms

The forms of allege are usually collocate with adverbial form (allegedly) but also with verbs
suggesting crimes: abducted, tortured, murdered, committed, etc. The adjectival form alleged
collocates with words specifying groups of people who are accused: accomplices, perpetrators,
collaborators as well as crime: infringements, atrocities, etc. The nominalized allegation (s) form
has a discourse prosody for denial which is not found with any of the other forms of allege.
Allegations is a word which shown a semantic preference for the concept of denial in general
English, coupled with the adjective "ludicrous". The word ludicrous was used by the spokeswoman
in the article to describe the allegations, to obtain an idea of what other thing are commonly
ascribed as ludicrous. Most of th other collocates are adverbs: almost, rather, most, too, so, how,
even which are relate to the scale of which something is described as ludicrous.
The pairing of ludicrous and allegations is a particularly powerful language strategy as both words
contain strong associations of untruth or unreality embedded them.

Modality
The analysis of allege in the BNC has shown how it has a strong association with news reporting
and how the nominalized form has a strong discourse prosody for denial, while the non
nominalized
forms do not. Modality relates to speaker or writer authority and is based around the use of a range
of modal auxiliary verbs. Such a strong presence or absence of certain modal verbs is an
indication
of power relationship and status relatively powerful seem to be paired with modal verbs which give
them more freedom and choice, while more controlling modal verbs are used with less powerful
groups. Using the BNC, we carried out searches on the noun, adjective, verb and adverb forms of
allege and then counted the number of times modal verbs appeared within three spaces to left and
right of them.
The most popular construction are:
- Noun forms: allegation would, could or will;
- Verb forms: may, would or could allege;
- Adjective forms: alleged .. should/will;
- Adverb form: would allegedly.
In general British English, the modal verbs would, will, can and could are most popular in overall
language usage, while modal like shall, ought and need are much less common. "Could" appears
more often in conjunction with allege while can occurs less often than expected. The adjective
dorm
alleged tends to be connected to 2 of the most certain modal verbs: should and will.
The nominalized forms of allege tended to strongly correlate with words related to denial and this
suggest that these forms of denial are made more certain modal verbs.

Attribution

The presence (or absence) of different types of actors in narratives of rape can have consequence
which relate to both the focus of the story and the way the agency of those involved is represented.
In the news article both the person who is accused of raping, and the person who claim to be
raped
are mentioned several times and in a variety of ways. If we only carry out a search, in the corpus,
on
sentences which contain allege and rape, we may miss additional information which comes later or
earlier in the text.
Metaphor

We have seen that the word chois allegations is of particular salience to the news article, because
it
carried a strong association with denial. In the article it was used in a direct quote by
spokeswoman
for the person who the allegations are being made about, but it was not used by the actual
narrative
voice of the article, with the more neutral term alleged occuring 3 times instead. We have used a
reference corpus to look at collocation of the word allegations, as well as patterns or modal use
and
the presence or absence of various types of actor associated with allegations rape.
Another way of understanding some of the hidden associations of the word allegation is to consider
it in terms of metaphor. Metaphors are a particularly revealing way of helping to reveal discourses
surrounding a subject. Looking at the precence of metaphors in a corpus and noting their relative
frequencies to each other, should provide researchers with a different way of focusing on
discourse.
There isn't a simple way of carrying out a metaphor based analysis on a corpus: the researcher
carries out a close reading of a sample of text in order to identify candidate metaphors; corpus
context are examined to determine whether keywords are metaphoric or literal. In our corpus
abstract concepts are often constructed via metaphors which reference concrete entities, and it is
the
case that allegation(s) will have metaphor in common with similar terms like "accusation" or
"claim".
The corpus not only helps to uncover the possible metaphors surrounding a word or concept, but it
can also be useful in revealing how that metaphor works in a range of other cases, enabling
researchers to gain a greater understanding of its meaning. We see allegations referred to in terms
of
heavy, weight, violence, penetration, waste, fire, flight and horses. Some of these metaphors
appear
to be more frequent than others. The term allegation is found in a range of general metaphorical
patterns in British English it is not possible to say that any single metaphor dominates the way than
we think of the term.

Further directions
We could have expanded our analysis of the term allegations to consider other linguistic
phenomena
like a range of lexical, semantic and grammatical features or we may also to consider co-
ordination.
There are some techniques in critical discourse analysis which are more difficult to carry out on
corpora. At present, a great deal of corpus bases discourse analysis is still focused at the lexical
level. The challenge to future researchers is to find ways to make grammar and semantic based
analysis of corpora a more feasible proposition.

8 CONCLUSION

This book identified some of the most some of the most useful methodological techniques of
corpus.based research (frequencies, collocations, keywords, concordance, dispertions) and show
how they can be effectively used in the analysis of discouse. The main points about language and
dicourse that our corpus based analysis have revealed:
- Corpus based discourse analysis is not simply a quantitative procedure but one which involves a
great deal of human choice at every stage: research questions, designing and building corpora,
deciding which techniques to use, interpreting the results and framing explanations for them.
- Attitudes and discourse are embedded in language via our cumulative, lifelong exposure to
language patterns and choises: collocations, semantic and discourse prosodies.
- We are often unconscious of the patterns of language we encounter across our lifetime, but
corpora are useful in identifyng them: they emulate and reveal this cumulative exposure.

Corpus building

The design and availablility of corpora are paramount to its analysis. Diachronically, language and
society are constantly changing and discourses are changing as well. there is an urgent need to
build
more up-to-date corpora in order to reflect this passing of time. Some aspects of language use do
not change as rapidly as others. The contents of the BNC are a testament to the way that people
wrote and spoke in the early 1990. using corpora of texts that were created decades or centuries
ago
will help researchers to explore the ways that language was once used, shedding light on the
reason
behind current meanings, collocations and discourse prosodies of particular words phrases or
grammatical constructions. Comparing a range of corpora from different historic time periods will
give us a series of linguistic "snap-shot" which will allow discourses to appear to come to life. An
aspect of corpus building which is particularty relevant for discourse analysis is the fact that context
is so important. Corpora that include both the electronic text only with annotation from and the
original texts would be useful for making sense of individual texts within them. In the case of
newspaper or magazine articles it would be useful to make references back to the original page(s)
so
we could note aspect such as font size and style. colors, layout and visuals.

Corpus analysis

It is important that a corpus based analysis will not give researchers a list of discourses around a
subject. The analysis will point to patterns in language which must then be interpreted in order to
suggest the existence of discourse. The corpus based analysis can only show what is in the
corpus,
although it may be a far reaching analysis, it can never be exhaustive. Corpora are so large and
we
may be tempted to think that our analysis has covered every potential discursive construction
around a given subject. The wide variety of altenative statistical avaible to the corpus user might
mean that data an be subtly massaged in order to reveal results that are interestnig, controversial
or
confirm our suspicions. When using a general corpus, issue surrounding the variety types of
reduction and reception for all of the texts within, can become highly problematic. One option could
be recognize that the general corpus consist of a multitude of voice and to use such data sparingly
instead carrying out the analysis of discourses on more specialized corpora, where issues of
production and reception can be more easily articulated. Another possibility could simply be to
argue from a perspective that society is inter-connected and all texts influence each other. A
corpus
based analysis of discourse affords the researchers with the patterns and trends in language.
People
are not computers though and their ways of interacting with texts are very different, both from
computers and from each other. Corpus based discourse analysis should play an important role in
term s of removing bias, testing hypotheses, identifying norms and outiliers and raising new
research questions.

FortiGate 7.4 Operator Exam - Attempt Review
75% (4)
FortiGate 7.4 Operator Exam - Attempt Review
15 pages
Flowerdew 2012
No ratings yet
Flowerdew 2012
30 pages
Stephanowitsch2020 Chapter 2
No ratings yet
Stephanowitsch2020 Chapter 2
39 pages
DAMOS Package 2020 List
No ratings yet
DAMOS Package 2020 List
530 pages
Iso 161-1 PDF
100% (1)
Iso 161-1 PDF
8 pages
Teaching Speaking For Ielts Through Discourse
No ratings yet
Teaching Speaking For Ielts Through Discourse
11 pages
Corpus Methods in Language Studies
No ratings yet
Corpus Methods in Language Studies
20 pages
Discourse Analysis Lectures
No ratings yet
Discourse Analysis Lectures
23 pages
2024 09+10 LDA Jung
No ratings yet
2024 09+10 LDA Jung
17 pages
What Is Corpus Linguistics
No ratings yet
What Is Corpus Linguistics
17 pages
Corpus Introduction-Chap 1
No ratings yet
Corpus Introduction-Chap 1
17 pages
2 5373127769469227960
No ratings yet
2 5373127769469227960
6 pages
Carolakie,+1098 2603 1 CE
No ratings yet
Carolakie,+1098 2603 1 CE
20 pages
Object Language and The Language Subject: On The Mediating Role of Applied Linguistics
No ratings yet
Object Language and The Language Subject: On The Mediating Role of Applied Linguistics
13 pages
Definition and Features of A Corpus
No ratings yet
Definition and Features of A Corpus
23 pages
A Critical Discourse Analysis of Construction of Social Identity Through Autobiographical Treatises A Study Two African Writers
No ratings yet
A Critical Discourse Analysis of Construction of Social Identity Through Autobiographical Treatises A Study Two African Writers
45 pages
LECTURE Functionalism Applied Linguistic Corpora Linguistics 1
No ratings yet
LECTURE Functionalism Applied Linguistic Corpora Linguistics 1
7 pages
A Corpus Based Approach To Discourses of
No ratings yet
A Corpus Based Approach To Discourses of
30 pages
4 Corpus Linguistics Outcomes and Applications in
No ratings yet
4 Corpus Linguistics Outcomes and Applications in
12 pages
Dsicourse PDF
No ratings yet
Dsicourse PDF
34 pages
Lecture 7 Applied Linguistics 101
No ratings yet
Lecture 7 Applied Linguistics 101
4 pages
1 - M33 - S6P2, Discourse Analysis, Lecture 1, Pr. Koubali
No ratings yet
1 - M33 - S6P2, Discourse Analysis, Lecture 1, Pr. Koubali
3 pages
Literature Review by Maxkamova Dilnoza
No ratings yet
Literature Review by Maxkamova Dilnoza
3 pages
Seminar 1
No ratings yet
Seminar 1
7 pages
Corpus Linguistics: Prepared By: Elona Bardhi
No ratings yet
Corpus Linguistics: Prepared By: Elona Bardhi
8 pages
Lesson 16
No ratings yet
Lesson 16
6 pages
Discourse Analysis After The Computational Turn - A Mixed Bag
No ratings yet
Discourse Analysis After The Computational Turn - A Mixed Bag
14 pages
Session 1
No ratings yet
Session 1
46 pages
Discourse Analysis
No ratings yet
Discourse Analysis
7 pages
What Is Meant by Discourse Analysis?: English Studies
No ratings yet
What Is Meant by Discourse Analysis?: English Studies
3 pages
The Redefinition of Applied Linguistics: Modernist and Postmodernist Views
No ratings yet
The Redefinition of Applied Linguistics: Modernist and Postmodernist Views
23 pages
Chapter1 Wodak
No ratings yet
Chapter1 Wodak
59 pages
Group 2 - Introduction Discourse
No ratings yet
Group 2 - Introduction Discourse
29 pages
523 Subjective
No ratings yet
523 Subjective
27 pages
Huang 2015
No ratings yet
Huang 2015
5 pages
Corpus Linguistics in Analyzing Film Discourse: A Diachronic Perspective
No ratings yet
Corpus Linguistics in Analyzing Film Discourse: A Diachronic Perspective
6 pages
National College of Business Administration and Economics: Topic: Short History of Discourse and Its Major Contributions
67% (3)
National College of Business Administration and Economics: Topic: Short History of Discourse and Its Major Contributions
5 pages
Discourse, Discourse Analysis and C.D.A
No ratings yet
Discourse, Discourse Analysis and C.D.A
59 pages
Discourse Analysis
No ratings yet
Discourse Analysis
27 pages
Discourse Analysis Summary (By Dororo Hyakimaru Luffy)
No ratings yet
Discourse Analysis Summary (By Dororo Hyakimaru Luffy)
8 pages
Corpus Linguistic1
No ratings yet
Corpus Linguistic1
6 pages
Freshservice Product Deck
100% (2)
Freshservice Product Deck
50 pages
Null
No ratings yet
Null
52 pages
The Study of Discourse
100% (1)
The Study of Discourse
14 pages
The Notion of Text: Speech and Writing
No ratings yet
The Notion of Text: Speech and Writing
8 pages
Discourse Analysis and Appled Linguistics
100% (2)
Discourse Analysis and Appled Linguistics
11 pages
CHP 2..... Similarity Index
No ratings yet
CHP 2..... Similarity Index
22 pages
Introduction 1 Fairclough
No ratings yet
Introduction 1 Fairclough
8 pages
Mahmood Research
No ratings yet
Mahmood Research
8 pages
The Differences of Discourse Analysis and Pragmatics
No ratings yet
The Differences of Discourse Analysis and Pragmatics
8 pages
Computing With MATLAB
No ratings yet
Computing With MATLAB
96 pages
Corpus Approach To Analysing Gerund Vs Infinitive
No ratings yet
Corpus Approach To Analysing Gerund Vs Infinitive
16 pages
Discourse Analysis
No ratings yet
Discourse Analysis
27 pages
Resúmen Lengua 3
No ratings yet
Resúmen Lengua 3
28 pages
TM - Reasearch
No ratings yet
TM - Reasearch
17 pages
Gears of War Manual
No ratings yet
Gears of War Manual
18 pages
Name: David Elkharis Larosa Class: B Subject: Discourse Analysis Corpus Approaches To Discourse Analysis A. What Is A Corpus?
No ratings yet
Name: David Elkharis Larosa Class: B Subject: Discourse Analysis Corpus Approaches To Discourse Analysis A. What Is A Corpus?
6 pages
8051 Question and Answer Bank
No ratings yet
8051 Question and Answer Bank
15 pages
Discourse Analysis. The State of The Art (Article) PDF
No ratings yet
Discourse Analysis. The State of The Art (Article) PDF
53 pages
Rajasthan Police Constable Question Paper 6 November 2020 2nd Shift
No ratings yet
Rajasthan Police Constable Question Paper 6 November 2020 2nd Shift
29 pages
Houghton Mifflin California Math Homework and Problem Solving Book Grade 4
100% (1)
Houghton Mifflin California Math Homework and Problem Solving Book Grade 4
8 pages
Commerce Accountancy PDF
No ratings yet
Commerce Accountancy PDF
13 pages
Use of Discourse Analysis in Various Disciplines
No ratings yet
Use of Discourse Analysis in Various Disciplines
9 pages
Tutorial Letter 101/0/2025: Workstation Technical Skills
No ratings yet
Tutorial Letter 101/0/2025: Workstation Technical Skills
24 pages
Employee Record System Final
No ratings yet
Employee Record System Final
46 pages
Relevance Discourse Analysis To Education Research
No ratings yet
Relevance Discourse Analysis To Education Research
23 pages
Icmlp 1502
No ratings yet
Icmlp 1502
2 pages
Online Formats For Images and Texts
No ratings yet
Online Formats For Images and Texts
18 pages
TwishaAhuja 16010421129 Exp5
No ratings yet
TwishaAhuja 16010421129 Exp5
14 pages
Architecture of Dbms
No ratings yet
Architecture of Dbms
20 pages
Behaviour Changes in Android 13
No ratings yet
Behaviour Changes in Android 13
6 pages
EE203 01 Digital and Number Systems
No ratings yet
EE203 01 Digital and Number Systems
34 pages
Cos1511 103 1 2018
No ratings yet
Cos1511 103 1 2018
30 pages
Unsupervised Single-Image Dehazing Via Self-Guided Inverse-Retinex GAN
No ratings yet
Unsupervised Single-Image Dehazing Via Self-Guided Inverse-Retinex GAN
14 pages
Manual Linha VW Quadrado (Gol CL, GL, GTS, Gti, Voyage, Parati, Saveiro) by Zanatta
No ratings yet
Manual Linha VW Quadrado (Gol CL, GL, GTS, Gti, Voyage, Parati, Saveiro) by Zanatta
2 pages
My Telelogger Sp80: TCK - Solutions SDN BHD
No ratings yet
My Telelogger Sp80: TCK - Solutions SDN BHD
2 pages
Arun - Data Engineer - Resume
No ratings yet
Arun - Data Engineer - Resume
2 pages
Reference AIO Machine Lapto Tablet
No ratings yet
Reference AIO Machine Lapto Tablet
3 pages
Headlines: Genshin Impact' 1.4 Preview Gives Out New Primogem Gift Codes, Reveals New Banners
No ratings yet
Headlines: Genshin Impact' 1.4 Preview Gives Out New Primogem Gift Codes, Reveals New Banners
7 pages
Updated BSSC Application Form
No ratings yet
Updated BSSC Application Form
2 pages
File 50 PDF
No ratings yet
File 50 PDF
4 pages
Jeq 2 J
No ratings yet
Jeq 2 J
2 pages
A Career As A Computational Chemical Engineer
No ratings yet
A Career As A Computational Chemical Engineer
2 pages
QB2 Concentrator TECK 18 Racks DRAFT - Rack - Elevation - Diagram Feb 04 Abril - 2024 VERSION en TERRENO-V16
No ratings yet
QB2 Concentrator TECK 18 Racks DRAFT - Rack - Elevation - Diagram Feb 04 Abril - 2024 VERSION en TERRENO-V16
1 page
The Speech Chain: The Physics And Biology Of Spoken Language
From Everand
The Speech Chain: The Physics And Biology Of Spoken Language
Dr. Peter B. Denes
4/5 (9)
The Geometry of Meaning: Semantics Based on Conceptual Spaces
From Everand
The Geometry of Meaning: Semantics Based on Conceptual Spaces
Peter Gardenfors
No ratings yet
THe Idiolect, Chaos and Language Custom Far from Equilibrium: Conversations in Morocco
From Everand
THe Idiolect, Chaos and Language Custom Far from Equilibrium: Conversations in Morocco
Joseph W. Kuhl
No ratings yet
Analysis of a Medical Research Corpus: A Prelude for Learners, Teachers, Readers and Beyond
From Everand
Analysis of a Medical Research Corpus: A Prelude for Learners, Teachers, Readers and Beyond
Georgette Nicolas Jabbour
No ratings yet
Language: an Introduction to the Study of Speech
From Everand
Language: an Introduction to the Study of Speech
Edward Sapir
No ratings yet
The Physical Foundation of Language: Exploration of a Hypothesis
From Everand
The Physical Foundation of Language: Exploration of a Hypothesis
Robin Allott
No ratings yet

Riassunto Using Corpora in Discourse Analysis Di Paul Baker

Uploaded by

Riassunto Using Corpora in Discourse Analysis Di Paul Baker

Uploaded by

USING CORPORA DISCOURS ANALYSIS

Reducing researcher bias (Bias: pregiudizio, predilezione/errore, falsità).

Some types of corpora

Using a reference corpus

3 FREQUENCY AND DISPERSION

Comparing demographic frequencies

Investigation discourses of refugees

Analysing the remainder

Semantic preference and discourse prosody

1. Build or obtain access to a corpus;

Identifying discourses from collocates

We have tended to consider collocates individually or we have looked at groups of collocates

Step-by-step guide to collocational analysis.

1. Build or obtain access to a corpus;

Using a reference corpus

Possible uses of keywords

A keywords analysis can therefore be used to compare 2 or more sides of an argument as in

Searching with grammatical tags

Collocation of grammatical forms

You might also like