Hunspell 3
Hunspell 3
NAME
hunspell - spell checking, stemming, morphological generation and analysis
SYNOPSIS
#include <hunspell/hunspell.hxx> /* or */
#include <hunspell/hunspell.h>
˜Hunspell();
char * get_dic_encoding();
2011-02-01 1
hunspell(3) hunspell(3)
Hunspell_destroy constructor and destructor, and an extra HunHandle parameter (the allocated object) in
the wrapper functions (see in the C header file hunspell.h).
The basic spelling functions, spell() and suggest() can be used for stemming, morphological generation and
analysis by XML input texts (see XML API).
Constructor and destructor
Hunspell’s constructor needs paths of the affix and dictionary files. See the hunspell(4) manual page for
the dictionary format. Optional key parameter is for dictionaries encrypted by the hzip tool of the Hunspell
distribution.
Extra dictionaries
The add_dic() function load an extra dictionary file. The extra dictionaries use the affix file of the allocated
Hunspell object. Maximal number of the extra dictionaries is limited in the source code (20).
Spelling and correction
The spell() function returns non-zero, if the input word is recognised by the spell checker, and a zero value
if not. Optional reference variables return a bit array (info) and the root word of the input word. Info bits
checked with the SPELL_COMPOUND, SPELL_FORBIDDEN or SPELL_WARN macros sign compound
words, explicit forbidden and probably bad words. From version 1.3, the non-zero return value is 2 for the
dictionary words with the flag "WARN" (probably bad words).
The suggest() function has two input parameters, a reference variable of the output suggestion list, and an
input word. The function returns the number of the suggestions. The reference variable will contain the
address of the newly allocated suggestion list or NULL, if the return value of suggest() is zero. Maximal
number of the suggestions is limited in the source code.
The spell() and suggest() can recognize XML input, see the XML API section.
Morphological functions
The plain stem() and analyze() functions are similar to the suggest(), but instead of suggestions, return
stems and results of the morphological analysis. The plain generate() waits a second word, too. This extra
word and its affixation will be the model of the morphological generation of the requested forms of the first
word.
The extended stem() and generate() use the results of a morphological analysis:
char ** result, result2;
int n1 = analyze(&result, "words");
int n2 = stem(&result2, result, n1);
The morphological annotation of the Hunspell library has fixed (two letter and a colon) field identifiers, see
the hunspell(4) manual page.
char ** result;
char * affix = "is:plural"; // description depends from dictionaries, too
int n = generate(&result, "word", &affix, 1);
for (int i = 0; i < n; i++) printf("%s0, result[i]);
Memory deallocation
The free_list() function frees the memory allocated by suggest(), analyze, generate and stem() functions.
Other functions
The add(), add_with_affix() and remove() are helper functions of a personal dictionary implementation to
add and remove words from the base dictionary in run-time. The add_with_affix() uses a second word as a
model of the enabled affixation of the new word.
The get_dic_encoding() function returns "ISO8859-1" or the character encoding defined in the affix file
with the "SET" keyword.
The get_csconv() function returns the 8-bit character case table of the encoding of the dictionary.
The get_wordchars() and get_wordchars_utf16() return the extra word characters definied in affix file for
tokenization by the "WORDCHARS" keyword.
2011-02-01 2
hunspell(3) hunspell(3)
2011-02-01 3