@@ -2615,18 +2615,41 @@ SELECT plainto_tsquery('supernova star');
2615
2615
</para>
2616
2616
2617
2617
<para>
2618
- To create an <application>Ispell</> dictionary, use the built-in
2619
- <literal>ispell</literal> template and specify several parameters:
2618
+ To create an <application>Ispell</> dictionary perform these steps:
2620
2619
</para>
2621
-
2620
+ <itemizedlist spacing="compact" mark="bullet">
2621
+ <listitem>
2622
+ <para>
2623
+ download dictionary configuration files. <productname>OpenOffice</>
2624
+ extension files have the <filename>.oxt</> extension. It is necessary
2625
+ to extract <filename>.aff</> and <filename>.dic</> files, change
2626
+ extensions to <filename>.affix</> and <filename>.dict</>. For some
2627
+ dictionary files it is also needed to convert characters to the UTF-8
2628
+ encoding with commands (for example, for norwegian language dictionary):
2622
2629
<programlisting>
2623
- CREATE TEXT SEARCH DICTIONARY english_ispell (
2630
+ iconv -f ISO_8859-1 -t UTF-8 -o nn_no.affix nn_NO.aff
2631
+ iconv -f ISO_8859-1 -t UTF-8 -o nn_no.dict nn_NO.dic
2632
+ </programlisting>
2633
+ </para>
2634
+ </listitem>
2635
+ <listitem>
2636
+ <para>
2637
+ copy files to the <filename>$SHAREDIR/tsearch_data</> directory
2638
+ </para>
2639
+ </listitem>
2640
+ <listitem>
2641
+ <para>
2642
+ load files into PostgreSQL with the following command:
2643
+ <programlisting>
2644
+ CREATE TEXT SEARCH DICTIONARY english_hunspell (
2624
2645
TEMPLATE = ispell,
2625
- DictFile = english,
2626
- AffFile = english,
2627
- StopWords = english
2628
- );
2646
+ DictFile = en_us,
2647
+ AffFile = en_us,
2648
+ Stopwords = english);
2629
2649
</programlisting>
2650
+ </para>
2651
+ </listitem>
2652
+ </itemizedlist>
2630
2653
2631
2654
<para>
2632
2655
Here, <literal>DictFile</>, <literal>AffFile</>, and <literal>StopWords</>
@@ -2642,6 +2665,56 @@ CREATE TEXT SEARCH DICTIONARY english_ispell (
2642
2665
example, a Snowball dictionary, which recognizes everything.
2643
2666
</para>
2644
2667
2668
+ <para>
2669
+ The <filename>.affix</> file of <application>Ispell</> has the following
2670
+ structure:
2671
+ <programlisting>
2672
+ prefixes
2673
+ flag *A:
2674
+ . > RE # As in enter > reenter
2675
+ suffixes
2676
+ flag T:
2677
+ E > ST # As in late > latest
2678
+ [^AEIOU]Y > -Y,IEST # As in dirty > dirtiest
2679
+ [AEIOU]Y > EST # As in gray > grayest
2680
+ [^EY] > EST # As in small > smallest
2681
+ </programlisting>
2682
+ </para>
2683
+ <para>
2684
+ And the <filename>.dict</> file has the following structure:
2685
+ <programlisting>
2686
+ lapse/ADGRS
2687
+ lard/DGRS
2688
+ large/PRTY
2689
+ lark/MRS
2690
+ </programlisting>
2691
+ </para>
2692
+
2693
+ <para>
2694
+ Format of the <filename>.dict</> file is:
2695
+ <programlisting>
2696
+ basic_form/affix_class_name
2697
+ </programlisting>
2698
+ </para>
2699
+
2700
+ <para>
2701
+ In the <filename>.affix</> file every affix flag is described in the
2702
+ following format:
2703
+ <programlisting>
2704
+ condition > [-stripping_letters,] adding_affix
2705
+ </programlisting>
2706
+ </para>
2707
+
2708
+ <para>
2709
+ Here, condition has a format similar to the format of regular expressions.
2710
+ It can use groupings <literal>[...]</> and <literal>[^...]</>.
2711
+ For example, <literal>[AEIOU]Y</> means that the last letter of the word
2712
+ is <literal>"y"</> and the penultimate letter is <literal>"a"</>,
2713
+ <literal>"e"</>, <literal>"i"</>, <literal>"o"</> or <literal>"u"</>.
2714
+ <literal>[^EY]</> means that the last letter is neither <literal>"e"</>
2715
+ nor <literal>"y"</>.
2716
+ </para>
2717
+
2645
2718
<para>
2646
2719
Ispell dictionaries support splitting compound words;
2647
2720
a useful feature.
@@ -2663,6 +2736,65 @@ SELECT ts_lexize('norwegian_ispell', 'sjokoladefabrikk');
2663
2736
</programlisting>
2664
2737
</para>
2665
2738
2739
+ <para>
2740
+ <application>MySpell</> format is a subset of <application>Hunspell</>.
2741
+ The <filename>.affix</> file of <application>Hunspell</> has the following
2742
+ structure:
2743
+ <programlisting>
2744
+ PFX A Y 1
2745
+ PFX A 0 re .
2746
+ SFX T N 4
2747
+ SFX T 0 st e
2748
+ SFX T y iest [^aeiou]y
2749
+ SFX T 0 est [aeiou]y
2750
+ SFX T 0 est [^ey]
2751
+ </programlisting>
2752
+ </para>
2753
+
2754
+ <para>
2755
+ The first line of an affix class is the header. Fields of an affix rules are
2756
+ listed after the header:
2757
+ </para>
2758
+ <itemizedlist spacing="compact" mark="bullet">
2759
+ <listitem>
2760
+ <para>
2761
+ parameter name (PFX or SFX)
2762
+ </para>
2763
+ </listitem>
2764
+ <listitem>
2765
+ <para>
2766
+ flag (name of the affix class)
2767
+ </para>
2768
+ </listitem>
2769
+ <listitem>
2770
+ <para>
2771
+ stripping characters from beginning (at prefix) or end (at suffix) of the
2772
+ word
2773
+ </para>
2774
+ </listitem>
2775
+ <listitem>
2776
+ <para>
2777
+ adding affix
2778
+ </para>
2779
+ </listitem>
2780
+ <listitem>
2781
+ <para>
2782
+ condition that has a format similar to the format of regular expressions.
2783
+ </para>
2784
+ </listitem>
2785
+ </itemizedlist>
2786
+
2787
+ <para>
2788
+ The <filename>.dict</> file looks like the <filename>.dict</> file of
2789
+ <application>Ispell</>:
2790
+ <programlisting>
2791
+ larder/M
2792
+ lardy/RT
2793
+ large/RSPMYT
2794
+ largehearted
2795
+ </programlisting>
2796
+ </para>
2797
+
2666
2798
<note>
2667
2799
<para>
2668
2800
<application>MySpell</> does not support compound words.
0 commit comments