SphinxSearchTutorial 1
SphinxSearchTutorial 1
1/5 .molecularsciences.org/book/eport/html/405
Brought to you by molecularsciences.org.
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 License.
This publication may not be redistributed without this notice.
Sphin Search
Sphinx is a Iree standalone Iull-text search engine. It is Iast, eIIicient, and easily integrates with SQL databases
and programming major programming languages. It is ideal Ior use with MySQL, PostgreSQL, PHP, Python,
Perl and Ruby. Although Sphinx works well with Java but Lucene is better option Ior Java.
Following is a summary oI Ieatures (copied Irom http://www.sphinxsearch.com)
high indexing speed (upto 10 MB/sec on modern CPUs)
high search speed (avg query is under 0.1 sec on 2-4 GB text collections)
high scalability (upto 100 GB oI text, upto 100 M documents on a single CPU)
provides good relevance ranking through combination oI phrase proximity ranking and statistical
(BM25) ranking
provides distributed searching capabilities
provides document exceprts generation
provides searching Irom within MySQL through pluggable storage engine
supports boolean, phrase, and word proximity queries
supports multiple Iull-text Iields per document (upto 32 by deIault)
supports multiple additional attributes per document (ie. groups, timestamps, etc)
supports stopwords
supports both single-byte encodings and UTF-8
supports English stemming, Russian stemming, and Soundex Ior morphology
supports MySQL natively (MyISAM and InnoDB tables are both supported)
supports PostgreSQL natively
Donload and Install Sphin
Sphinx can be downloaded Irom http://www.sphinxsearch.com/. Installing Sphinx is pretty straightIorward.
Following are the steps. --preIix deIines the installation directory.
$ cd /he/e/c
$ ge h://hieach.c/dad/hi-0.9.8.a.g
$ a -f hi-0.9.8.a.g
$ cd hi-0.9.8
$ di //ca/hi
$ ./cfige --efi //ca/hi --ih-
$ ae
$ ae ia
Troubleshooting
Ads b Google MSQL Sphin Waing Kids Bible Bible Book
03/02/12 Sphin Search
2/5 .molecularsciences.org/book/eport/html/405
If ge a e ie:
hi.h:54:19: e: .h: N ch fie diec
I i becae d hae -dee iaed e. T fi hi eed ia -dee
ad -ib, if d' aead hae i iaed.
If ae ig Feda, e:
$ ia -dee
If ae ig Ub:
$ a-ge ia -dee
Sample Daabae
The e e i cfige ad e Shi. Hee e ceae a ae daabae a hgh cfigai
ad eig.
mysql> create table phonebook (
-> id int(10) not null auto_increment primary key,
-> name varchar(15) not null,
-> phone varchar(20) not null
-> );
mysql> describe phonebook;
+-------+-------------+------+-----+---------+----------------+
Field Type Null Key Default Extra
+-------+-------------+------+-----+---------+----------------+
id int(10) NO PRI NULL auto_increment
name varchar(15) NO
phone varchar(20) NO
+-------+-------------+------+-----+---------+----------------+
mysql> insert into phonebook VALUES (NULL,'John','212-123-0987');
mysql> insert into phonebook VALUES (NULL,'Jake','718-123-0987');
mysql> insert into phonebook VALUES (NULL,'Kate','987-123-2322');
mysql> insert into phonebook VALUES (NULL,'Khan','987-893-2322');
mysql> insert into phonebook VALUES (NULL,'Mike','829-893-2322');
mysql> select * from phonebook;
+----+------+--------------+
id name phone
+----+------+--------------+
1 John 212-123-0987
2 Jake 718-123-0987
3 Kate 987-123-2322
4 Khan 987-893-2322
5 Mike 829-893-2322
+----+------+--------------+
Ne ha Shi eie a ie iege ideifie (ia e) f each
03/02/12 Sphin Search
3/5 .molecularsciences.org/book/eport/html/405
Configuring and Testing Sphin
Once o hae inalled Sphin, o need o confige i o ha i can acce o daabae, be acceible
fom o cip. Configaion file ae oed in he ec dieco of he inallion dieco.
$ cd /usr/local/sphinx/etc
A defal cop of phin.conf file e need o edi i called phin.conf.di. Cop hi file o phin.conf and
a ediing.
$ cp sphinx.conf.dist sphinx.conf
$ sphinx.conf contains the config info for sphinx
$ vi sphinx.conf
Specificall, e need o pecif he daa oce, inde o indice, indee eing, and each daemon
(eachd) eing.
source define he daabae paamee necea o connec and elec
inde ell phin ha o inde and in hich foma
indeer i a ili hich ceae flle indice. We can define paamee ch a memo limi.
searchd i a daemon hich enable applicaion o each hogh flle indice. Make e he po i
coec.
Hee i a ample phin.conf file
1 # define your data source here.
2 source phonesource
3 {
4 type = mysql
5 sql_host = localhost
6 sql_user = type_your_database_username_here
7 sql_pass = type_your_password_here
8 sql_db = type_your_database_name_here
9 sql_port = 3306 # optional, default is 3306
10
11 # define the primary fetch query. You can define up
12 # to 32 full-text fields but the first field must be
13 # unique unsigned positve integer.
14 sql_query = select id, name, phone from dg_phonebook;
15
16 # display information on each selected id
17 # only used for search CLI
18 sql_query_info = SELECT name, phone FROM phonebook WHERE id=$id;
19
20
21 # define an index
22 index phoneindex
23 {
24 source = phonesource
25 path = /usr/local/sphinx/var/data/phone
26 morphology = none
27
03/02/12 Sphin Search
4/5 .molecularsciences.org/book/eport/html/405
28 # for stemming
29 min_word_len = 3
30 min_prefix_len = 0
31 min_infix_len = 3
32 enable_star = 1
33 # * means any
34
35
36 # indexer settings
37 indexer
38 {
39 recommended 256M to 1024M
40 mem_limit = 1024M
41
42
43 # search daemon settings
44 searchd
45 {
46 port = 3312
47 log = /usr/local/sphinx/var/log/searchd.log
48 query_log = /usr/local/sphinx/var/log/query.log
49 read_timeout = 5
50 max_children = 30
51 pid_file = /usr/local/sphinx/var/log/searchd.pid
52 max_matches = 1000
53 seamless_rotate = 1
54 preopen_indexes = 0
55 unlink_old = 1
56
Oce e hae cfige hi.cf, e eed he idee.
$ /usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/sphinx.conf --all
Fiall e e
$ /usr/local/sphinx/bin/search John
$ /usr/local/sphinx/bin/search J*
$ /usr/local/sphinx/bin/search *e
Error: sql_fetch_row: Lost connection to
MSQL server during quer
Error
$ //lcal/hi/bi/idee --cfig //lcal/hi/ec/hi.cf --all --ae
Shi 0.9.8-c2 (1234)
Cigh (c) 2001-2008, Ade Akff
ig cfig file '//lcal/hi/ec/hi.cf'...
ideig ide 'hdb'...
ERROR: ide 'hdb': l_fech_: L ceci MSQL ee dig e.
03/02/12 Sphin Search
5/5 .molecularsciences.org/book/eport/html/405
a 2087306 dc, 18785754 be
a 2462.060 ec, 7630.10 be/ec, 847.79 dc/ec
Solution
T e hi be hae chice.
1. Reduce memor limit in sphin.conf
I hi.cf fie, e e_ii = 256M
2. increase ait_timeout in m.cnf
Lg i ig he ad.
use msql;
show variables like '%wait_timeout%';
update variables set wait_timeout = 30000;