Lec 2
Lec 2
Corpus-Based Work
Text corpora are usually big, often represent
samples of some population of interest. For
example, the Brown Corpus collected by Kucera
and Francis was designed as a representative
sample of written American English. Balance of
subtypes (e.g., genre) is often desired.
Corpus work involves collecting a large number
of counts from corpora that need to be
accessed quickly.
There exists some software for processing
corpora
Lebensversicherungsgesellschaftsange
steller
Mar 16, 2025 Natural Language Processing 13
Morphology: What Should I Put
in My Dictionary?
Speech Corpora
Morphology
Stemming
The idea is to extract the root of the word
and use it for other purposes.
Not that helpful in English (from an IR point
of view)
Perhaps more useful for other languages or
in other contexts