天天看點

sphinx +laravel+sngrl\SphinxSearch 實時增量索引總結

這兩天在做sphinx全文索引的項目,研究了兩天了終于把它搞定了,下面來總結一下

1.安裝sphinx(我這裡用的是macOS,linux後文使用的大部分指令都相容)

mkdir /usr/local/sphinx

cd /usr/local/spinx

wget http://sphinxsearch.com/files/sphinx-2.2.11-release.tar.gz

tar -zvxf sphinx-2.2.11-release.tar.gz

cd sphinx-2.2.11

./configure

sudo make && make install

測試是否安裝成功

searchd -h //有提示即為成功

安裝過程碰到的錯誤

configuring Sphinx

checking for CFLAGS needed for pthreads… none

checking for LIBS needed for pthreads… -lpthread

checking for pthreads… found

checking whether to compile with MySQL support… yes

checking for mysql_config… mysql_config

checking for mysql_real_connect… no

checking for mysql_real_connect… no

checking MySQL include files… configure: error: missing include files.

**

ERROR: cannot find MySQL include files.

解決辦法:sudo apt-get install libmysql++

2.sphinx.conf配置(相關配置參數詳見sphinx官網)

# Minimal Sphinx configuration sample (clean, simple, functional)
           

#

source main_src

{

type = mysql

sql_host                = 192.168.1.221
    sql_user                = root
    sql_pass                =root.remote
    sql_db                  = caiban
    sql_port                = 3306  # optional, default is 3306
    sql_sock                =/tmp/mysql.scok
    sql_query_pre           =SET NAMES utf8
    sql_query_pre           =SET SESSION query_cache_type=OFF
    sql_query_pre           =replace into sph_counter select 1,max(id) from register_enterprise_extends
    sql_query               = \
    SELECT  id,company_name,trademark,legal_person_name, UNIX_TIMESTAMP(created_at) AS created_at,reg_address,reg_number,business_scope,linkman,reg_organs,operating_period,views,reg_capital FROM register_enterprise_extends where id<=(select max_doc_id from sph_counter where counter_id=1)
    sql_attr_uint           =id
    sql_field_string        =company_name
    sql_field_string        =trademark
    sql_field_string        =legal_person_name
    sql_attr_timestamp      =created_at
    sql_field_string        =reg_address
    sql_field_string        =reg_number
    sql_field_string        =business_scope
    sql_field_string        =reg_organs
    sql_field_string        =operating_period
    sql_field_string        =reg_capital
    sql_field_string        =views
           

}

source delta_src: main_src{

sql_ranged_throttle=100

sql_query_pre=SET NAMES utf8

sql_query_pre=SET SESSION query_cache_type=OFF

sql_query= SELECT id,company_name,trademark,legal_person_name, UNIX_TIMESTAMP(created_at) AS created_at,reg_address,reg_number,business_scope,linkman,reg_organs,operating_period,views,reg_capital FROM register_enterprise_extends where id>(select max_doc_id from sph_counter where counter_id=1)

sql_attr_uint =id

sql_field_string =company_name

sql_field_string =trademark

sql_field_string =legal_person_name

sql_attr_timestamp =created_at

sql_field_string =reg_address

sql_field_string =reg_number

sql_field_string =business_scope

sql_field_string =reg_organs

sql_field_string =operating_period

sql_field_string =reg_capital

sql_field_string =views

}

index main{

source =main_src

path=/usr/local/sphinx/main

docinfo =extern

min_word_len =1

charset_type=utf-8

min_prefix_len=0

min_infix_len =1

ngram_len =1

charset_table = U+FF10..U+FF19->0..9, 0..9, U+FF41..U+FF5A->a..z, U+FF21..U+FF3A->a..z,A..Z->a..z, a..z, U+0149, U+017F, U+0138, U+00DF, U+00FF, U+00C0..U+00D6->U+00E0..U+00F6,U+00E0..U+00F6, U+00D8..U+00DE->U+00F8..U+00FE, U+00F8..U+00FE, U+0100->U+0101, U+0101,U+0102->U+0103, U+0103, U+0104->U+0105, U+0105, U+0106->U+0107, U+0107, U+0108->U+0109,U+0109, U+010A->U+010B, U+010B, U+010C->U+010D, U+010D, U+010E->U+010F, U+010F,U+0110->U+0111, U+0111, U+0112->U+0113, U+0113, U+0114->U+0115, U+0115,U+0116->U+0117,U+0117, U+0118->U+0119, U+0119, U+011A->U+011B, U+011B, U+011C->U+011D,U+011D,U+011E->U+011F, U+011F, U+0130->U+0131, U+0131, U+0132->U+0133, U+0133,U+0134->U+0135,U+0135, U+0136->U+0137, U+0137, U+0139->U+013A, U+013A, U+013B->U+013C,U+013C,U+013D->U+013E, U+013E, U+013F->U+0140, U+0140, U+0141->U+0142, U+0142,U+0143->U+0144,U+0144, U+0145->U+0146, U+0146, U+0147->U+0148, U+0148, U+014A->U+014B,U+014B,U+014C->U+014D, U+014D, U+014E->U+014F, U+014F, U+0150->U+0151, U+0151,U+0152->U+0153,U+0153, U+0154->U+0155, U+0155, U+0156->U+0157, U+0157, U+0158->U+0159,U+0159,U+015A->U+015B, U+015B, U+015C->U+015D, U+015D, U+015E->U+015F, U+015F,U+0160->U+0161,U+0161, U+0162->U+0163, U+0163, U+0164->U+0165, U+0165, U+0166->U+0167,U+0167,U+0168->U+0169, U+0169, U+016A->U+016B, U+016B, U+016C->U+016D, U+016D,U+016E->U+016F,U+016F, U+0170->U+0171, U+0171, U+0172->U+0173, U+0173, U+0174->U+0175,U+0175,U+0176->U+0177, U+0177, U+0178->U+00FF, U+00FF, U+0179->U+017A, U+017A, U+017B->U+017C,U+017C, U+017D->U+017E, U+017E, U+0410..U+042F->U+0430..U+044F,U+0430..U+044F,U+05D0..U+05EA, U+0531..U+0556->U+0561..U+0586, U+0561..U+0587, U+0621..U+063A, U+01B9,U+01BF, U+0640..U+064A, U+0660..U+0669, U+066E, U+066F, U+0671..U+06D3, U+06F0..U+06FF,U+0904..U+0939, U+0958..U+095F, U+0960..U+0963, U+0966..U+096F, U+097B..U+097F,U+0985..U+09B9, U+09CE, U+09DC..U+09E3, U+09E6..U+09EF,U+0A05..U+0A39, U+0A59..U+0A5E,U+0A66..U+0A6F, U+0A85..U+0AB9, U+0AE0..U+0AE3,U+0AE6..U+0AEF, U+0B05..U+0B39,U+0B5C..U+0B61, U+0B66..U+0B6F, U+0B71, U+0B85..U+0BB9,U+0BE6..U+0BF2, U+0C05..U+0C39,U+0C66..U+0C6F, U+0C85..U+0CB9, U+0CDE..U+0CE3,U+0CE6..U+0CEF, U+0D05..U+0D39, U+0D60,U+0D61, U+0D66..U+0D6F, U+0D85..U+0DC6,U+1900..U+1938, U+1946..U+194F, U+A800..U+A805,U+A807..U+A822, U+0386->U+03B1,U+03AC->U+03B1, U+0388->U+03B5, U+03AD->U+03B5,U+0389->U+03B7, U+03AE->U+03B7,U+038A->U+03B9, U+0390->U+03B9, U+03AA->U+03B9,U+03AF->U+03B9, U+03CA->U+03B9,U+038C->U+03BF, U+03CC->U+03BF, U+038E->U+03C5,U+03AB->U+03C5, U+03B0->U+03C5,U+03CB->U+03C5, U+03CD->U+03C5, U+038F->U+03C9,U+03CE->U+03C9, U+03C2->U+03C3, U+0391..U+03A1->U+03B1..U+03C1,U+03A3..U+03A9->U+03C3..U+03C9, U+03B1..U+03C1,U+03C3..U+03C9, U+0E01..U+0E2E,U+0E30..U+0E3A, U+0E40..U+0E45, U+0E47, U+0E50..U+0E59, U+A000..U+A48F, U+4E00..U+9FBF,U+3400..U+4DBF, U+20000..U+2A6DF, U+F900..U+FAFF,U+2F800..U+2FA1F, U+2E80..U+2EFF,U+2F00..U+2FDF, U+3100..U+312F, U+31A0..U+31BF,U+3040..U+309F, U+30A0..U+30FF,U+31F0..U+31FF, U+AC00..U+D7AF, U+1100..U+11FF, U+3130..U+318F, U+A000..U+A48F,U+A490..U+A4CF

ngram_chars =U+4E00..U+9FBF, U+3400..U+4DBF, U+20000..U+2A6DF, U+F900..U+FAFF,U+2F800..U+2FA1F, U+2E80..U+2EFF, U+2F00..U+2FDF, U+3100..U+312F, U+31A0..U+31BF,U+3040..U+309F, U+30A0..U+30FF, U+31F0..U+31FF, U+AC00..U+D7AF, U+1100..U+11FF, U+3130..U+318F, U+A000..U+A48F, U+A490..U+A4CF

}

index delta: main{

source=delta_src

path =/usr/local/sphinx/delta

}

indexer

{

mem_limit = 128M

}

searchd{

listen = 9312

listen = 9306:mysql41

log = /usr/local/sphinx/log/searchd.log

query_log = /usr/local/sphinx/log/query.log

read_timeout = 5

max_children = 30

pid_file = /usr/local/sphinx/log/searchd.pid

seamless_rotate = 1

preopen_indexes = 1

unlink_old = 1

max_matches = 1000

workers = threads # for RT to work

binlog_path = /usr/local/sphinx/data

}

3.生成索引,開啟sphinx程序

/usr/local/bin/indexer -c /usr/local/sphinx/sphinx.conf –all

/usr/local/bin/searchd -c /usr/local/sphinx/sphinx.conf

# 檢視程序是否已經開啟

ps aux|grep searchd

4.計劃任務執行shell腳本,定期更新增量索引(新增資料的索引)和主索引

vim delta_index.sh

#/bin/sh
#停止sphinx服務,将輸出重定向
/usr/local/bin/indexer -c /usr/local/sphinx/sphinx.conf delta --rotate  >> /usr/local/sphinx/log/deltaindex.log;

/usr/local/bin/indexer --merge main delta --rotate -c /usr/local/sphinx/sphinx.conf >> /usr/local/sphinx/log/deltaindex.log
           

:wq

vim main_index.sh

#!/bin/sh
#停止正在運作的searchd
/usr/local/bin/indexer -c /usr/local/sphinx/sphinx.conf main --rotate >> /usr/local/sphinx/log/mainindex.log
           

:wq

crontab -e

#插入以下内容
*/1 * * * *  /bin/sh /usr/local/sphinx/build_delta_index.sh > /dev/null >&
  * * * /bin/sh /usr/local/sphinx/build_main_index.sh > /dev/null >&
           

:wq

特别注意,這裡有個坑,我在寫腳本的時候,無數次一個字不差地敲完腳本代碼,然而發現并沒有正常運作,打開日志發現,一直在報錯:–merge無法識别的參數。我當時内心糾結,到底是哪裡出了錯,我找了一天都沒找出來,後來我看了indexer –help中的指令執行個體,于是我複制了一條指令,除了改了最後的索引檔案名,什麼都沒動,結果亮瞎我的雙眼,一個字不差,我自己敲的不行,複制過去的完美運作,ubantu的vim編輯器有毒。

5.laravel中sngrl插件使用相關(sngrl\SphinxSearch項目源碼位址)

5.1.1完成了前面的幾個步驟sphinx索引基本上所需環境已經搭建完畢,下面就是sngrl插件簡單使用方法

在composer.json中“require”選項中加入

"require": {
        /*** Some others packages ***/
        "sngrl/sphinxsearch": "dev-master",
    },
           

執行composer install或者composer update,個人建議使用composer install,很多依賴包都是國外的,用更新的方式安裝過程會比較漫長而且可能出現更新中斷的情況

5.1.2直接運作使用composer指令行方式

5.2 在app.conf中”provider”選項中加入

'providers' => array(
        /*** Some others providers ***/
        'sngrl\SphinxSearch\SphinxSearchServiceProvider',
    ),
           

5.3生成元件所需配置檔案

5.4配置檔案修改

return array (
//本地sphinx伺服器位址
    'host'    => '127.0.0.1',
    //本地sphinx伺服器端口号
    'port'    => ,
    'indexes' => array (
    //這裡的my_index_name是剛才配置sphinx.conf中的索引名稱,例如我上面的配置檔案我的索引名稱就應該為main,後面的數組中table表示索引關聯的表,第二個key為搜尋結果中關聯id對應的表id名,
        'my_index_name' => array ( 'table' => 'my_keywords_table', 'column' => 'id' ),
        //當然也可以不使用數組關聯表
        //'my_index_name' => FALSE,
    )
);
           

5.5簡單常見使用方法

//别忘記引入SphinxSearch()類
$sphinx = new SphinxSearch();
//search()第一個參數是查詢的關鍵字,第二個參數是配置檔案中添加的索引名(my_index_name)
$results = $sphinx->search('my query', 'index_name')->query();//傳回值為原生sphinx的結果
$results = $sphinx->search('my query', 'index_name')->get();//傳回值為封裝的後結果數組
//在某個字段中搜尋關鍵字(傳回原生的sphinx結果數組),并添加分頁限制
$sphinx->limit($limit,($page - ) * $limit);
$result=$sphinx->search('@title "my query"','index_name')->query();