Re: Fuzzy substring searching with the pg_trgm extension

Поиск

Список

Период

Сортировка

От	Artur Zakirov
Тема	Re: Fuzzy substring searching with the pg_trgm extension
Дата	1 февраля 2016 г. 17:12:17
Msg-id	[email protected] обсуждение исходный текст
Ответ на	Re: Fuzzy substring searching with the pg_trgm extension (Artur Zakirov <[email protected]>)
Ответы	Re: Fuzzy substring searching with the pg_trgm extension
Список	pgsql-hackers

Дерево обсуждения

On 29.01.2016 18:58, Artur Zakirov wrote:
> On 29.01.2016 18:39, Alvaro Herrera wrote:
>> Teodor Sigaev wrote:
>>>> The behavior of this function is surprising to me.
>>>>
>>>> select substring_similarity('dog' ,  'hotdogpound') ;
>>>>
>>>>   substring_similarity
>>>> ----------------------
>>>>                   0.25
>>>>
>>> Substring search was desined to search similar word in string:
>>> contrib_regression=# select substring_similarity('dog' ,  'hot
>>> dogpound') ;
>>>   substring_similarity
>>> ----------------------
>>>                   0.75
>>>
>>> contrib_regression=# select substring_similarity('dog' ,  'hot dog
>>> pound') ;
>>>   substring_similarity
>>> ----------------------
>>>                      1
>>
>> Hmm, this behavior looks too much like magic to me.  I mean, a substring
>> is a substring -- why are we treating the space as a special character
>> here?
>>
>
> I think, I can rename this function to subword_similarity() and correct
> the documentation.
>
> The current behavior is developed to find most similar word in a text.
> For example, if we will search just substring (not word) then we will
> get the following result:
>
> select substring_similarity('dog', 'dogmatist');
>   substring_similarity
> ---------------------
>                      1
> (1 row)
>
> But this is wrong I think. They are completely different words.
>
> For searching a similar substring (not word) in a text maybe another
> function should be added?
>

I have changed the patch:
1 - trgm2.data was corrected, duplicates were deleted.
2 - I have added operators <<-> and <->> with GiST index supporting. A
regression test will pass only with the patch
http://www.postgresql.org/message-id/CAPpHfdt19FwQXarYjkzxb3oxmv-KAn3FLuZrooARE_U3H3CV9g@mail.gmail.com
3 - the function substring_similarity() was renamed to subword_similarity().

But there is not a function substring_similarity_pos() yet. It is not
trivial.

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Вложения

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Fuzzy substring searching with the pg_trgm extension

Вложения