Corpus 2
Corpus 2
Theories
Frequency of X
(e.g. freq of word)
Total opportunities
for X
(e.g. Corpus size)
• Imagine, for example, that you are
investigating a word that occurs 52 times in
Corpus 1, which has 50,000 tokenws in total;
but occurs57 times in Corpus 2, which
is 75,000 tokens in size. Obviously, this word is
noticeably rarer, in relative terms, in Corpus 2;
but is the difference significant?
• Enter the figures into the web-form above to
conduct the log-likelihood test of significance!
Don't include any commas in the numbers you
type in.
• You should get results that look like this:
• Item O1 %1 O2 %2 LL
Word 52 0.10 57 0.08 + 2.65
Here's how to interpre3t this result:
• O1 and O2 are observed frequencies, the
numbers you entered
• %1 and %2 are the observed frequencies in
normalised (percentage) form
• The + sign indicates that the word is more
frequent, on average, in Corpus 1 (a minus sign
would indicate it is more frequent in Corpus 2)
• The LL score is the log-likelihood, which tells us
whether the result can be treated as significant
• The higher the LL is, the less likely it is that the
result is a random fluke. The LL must be above
3.84 for the difference to be significant at
the p < 0.05 level (also called the 95% level).
So this difference is not statistically significant.
• A keyword analysis basically consists of doing
this analysis for every word-type in the
corpus!
Feeling????…..