安裝
- 首先到github ik上下載下傳版本為1.8.1的源碼,可以直接下載下傳zip檔案,也可以通過git下載下傳。
- 解壓檔案
,在下載下傳目錄執行elasticsearch-analyze-ik-1.8.1.zip
unzip elasticsearch-analyze-ik-1.8.1.zip -d ik
- 進到ik目錄下
cd ik
- 用maven進行編譯打包,需要裝好maven,執行
mvn package
- 打包完後在target/release目錄下,出現
elasticsearch-analysis-ik-1.8.1.zip
- 将該壓縮檔案解壓并複制到Elasticsearch每個節點的
目錄下ES_HOME/plugins/lk
- 重新開機每個節點
注: 如果安裝其他版本,請檢視https://github.com/medcl/elasticsearch-analysis-ik,在分支那裡選擇對應的版本下載下傳。
測試
建立索引
配置映射
curl -XPOST http://host:9200/iktest/fulltext/_mapping -d'
{
"fulltext": {
"_all": {
"analyzer": "ik_max_word",
"search_analyzer": "ik_max_word",
"term_vector": "no",
"store": "false"
},
"properties": {
"content": {
"type": "string",
"store": "no",
"term_vector": "with_positions_offsets",
"analyzer": "ik_max_word",
"search_analyzer": "ik_max_word",
"include_in_all": "true",
"boost":
}
}
}
}'
ik_max_word: 會将文本做最細粒度的拆分,比如會将“中華人民共和國國歌”拆分為“中華人民共和國,中華人民,中華,華人,人民共和國,人民,人,民,共和國,共和,和,國國,國歌”,會窮盡各種可能的組合;
ik_smart: 會做最粗粒度的拆分,比如會将“中華人民共和國國歌”拆分為“中華人民共和國,國歌”。
索引文檔
curl -XPOST http://host:9200/iktest/fulltext/1 -d'
{"content":"美國留給伊拉克的是個爛攤子嗎"}
'
curl -XPOST http://host:9200/iktest/fulltext/2 -d'
{"content":"公安部:各地校車将享最高路權"}
'
curl -XPOST http://host:9200/iktest/fulltext/3 -d'
{"content":"中韓漁警沖突調查:韓警平均每天扣1艘中國漁船"}
curl -XPOST http://host:9200/iktest/fulltext/4 -d'
{"content":"中國駐洛杉矶領事館遭亞裔男子槍擊 嫌犯已自首"}
'
查詢
curl -XPOST http://localhost:9200/iktest/fulltext/_search -d'
{
"query" : { "term" : { "content" : "中國" }},
"highlight" : {
"pre_tags" : ["<tag1>", "<tag2>"],
"post_tags" : ["</tag1>", "</tag2>"],
"fields" : {
"content" : {}
}
}
}
'
結果為
{
"took": ,
"timed_out": false,
"_shards": {
"total": ,
"successful": ,
"failed":
},
"hits": {
"total": ,
"max_score": ,
"hits": [
{
"_index": "iktest",
"_type": "fulltext",
"_id": "4",
"_score": ,
"_source": {
"content": "中國駐洛杉矶領事館遭亞裔男子槍擊 嫌犯已自首"
},
"highlight": {
"content": [
"<tag1>中國</tag1>駐洛杉矶領事館遭亞裔男子槍擊 嫌犯已自首"
]
}
},
{
"_index": "iktest",
"_type": "fulltext",
"_id": "3",
"_score": ,
"_source": {
"content": "中韓漁警沖突調查:韓警平均每天扣1艘中國漁船"
},
"highlight": {
"content": [
"中韓漁警沖突調查:韓警平均每天扣1艘<tag1>中國</tag1>漁船"
]
}
}
]
}
}
分詞結果檢視
curl 'http://host:9200/index/_analyze?analyzer=ik&pretty=true' -d '
{
"text": "别說話,我想靜靜"
}'
結果
{
"tokens": [
{
"token": "别說",
"start_offset": ,
"end_offset": ,
"type": "CN_WORD",
"position":
},
{
"token": "說話",
"start_offset": ,
"end_offset": ,
"type": "CN_WORD",
"position":
},
{
"token": "我",
"start_offset": ,
"end_offset": ,
"type": "CN_CHAR",
"position":
},
{
"token": "想",
"start_offset": ,
"end_offset": ,
"type": "CN_CHAR",
"position":
},
{
"token": "靜靜",
"start_offset": ,
"end_offset": ,
"type": "CN_WORD",
"position":
},
{
"token": "靜",
"start_offset": ,
"end_offset": ,
"type": "CN_WORD",
"position":
},
{
"token": "靜",
"start_offset": ,
"end_offset": ,
"type": "CN_WORD",
"position":
}
]
}