es拼音分詞大帥哥_Elasticsearch中文分詞加拼音

網上可能有很多教程，我寫這個隻是記錄一下自己學習的過程，給自己看的。

中文分司網上搜了一下，用的IK分詞器(https://github.com/medcl/elasticsearch-analysis-ik)，拼音搜尋插件用的是拼音分詞器(https://github.com/medcl/elasticsearch-analysis-pinyin)。

IK分詞器有兩種分詞模式：ik_max_word和ik_smart模式。

1、ik_max_word

會将文本做最細粒度的拆分，比如會将“中華人民共和國人民大會堂”拆分為“中華人民共和國、中華人民、中華、華人、人民共和國、人民、共和國、大會堂、大會、會堂等詞語。

2、ik_smart

會做最粗粒度的拆分，比如會将“中華人民共和國人民大會堂”拆分為中華人民共和國、人民大會堂。

安裝IK分詞器

./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.4.1/elasticsearch-analysis-ik-7.4.1.zip

我的es是7.4是以裝了7.4版本

安裝拼音分司器

./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-pinyin/releases/download/v7.4.1/elasticsearch-analysis-pinyin-7.4.1.zip

按裝好分司器後，需在建立索引時指定所用的分詞器。以下是PHP的示例代碼

protected function getIndexSettings($indexName)

{

return [

'index' => $indexName,

'body' => [

"settings" => [

"number_of_shards" => 1,

"number_of_replicas" => 1,

'analysis' => [

'analyzer' =>

[

'ik_pinyin_analyzer' =>

[

'type' => 'custom',

'tokenizer' => 'ik_smart',

'filter' => ['my_pinyin', 'word_delimiter',]

]

'filter' =>

[

'my_pinyin' =>

[

'type' => 'pinyin',

'keep_separate_first_letter' => false,

'keep_full_pinyin' => true,

'keep_original' => false,

'limit_first_letter_length' => 10,

'lowercase' => true,

'remove_duplicated_term' => true

]

];

}

public function createTestIndex()

{

$client=$this->getElasticClient();

//取索引的配置資訊，建立的索引名稱為test

$settings = $this->getIndexSettings('test');

//建立索引

$response = $client->indices()->create($settings);

return $response;

}

建立完索引後再建立映射。

{

"index":"test",

"body":{

"news":{

"_source":{

"enabled":true

"properties":{

"title":{

"type":"text"

"content":{

"type":"text"

"author":{

"type":"keyword"

}

上面是建立映射的json(請求體)，news是類型名稱，網上看了很多，都是這種結構，我反複試了很多次，都沒有成功。不知道怎麼回事。可能是7.4版本有些不一樣吧。最後改成了下面的結構：

{

"index":"test",

"body":{

"_source":{

"enabled":true

"properties":{

"title":{

"type":"text"

"content":{

"type":"text"

"author":{

"type":"keyword"

}

type:text 表示對該字段做全文索引。(analyzed)

type:keyword 索引這個字段，使之可以被搜尋，但是索引内容和指定值一樣 (not_analyzed)

php代碼如下：

public function createMap()

{

$client=$this->getElasticClient();

$params = [

'index' => 'test',

'body' => [

'_source' => [

'enabled' => true

'properties' => [

'title' => [

'type' => 'text'

'content' => [

'type' => 'text'

'author'=>[

'type'=>'keyword'

]

];

$response = $client->indices()->putMapping($params);

return $response;

}

接下來索引一個文檔，看能否用中文和拼音搜尋

public function index()

{

$client = $this->getElasticClient();

$indexName = $this->request->query('index', 'test');

$params = [

'type' => '_doc',

'index' => $indexName,

'body' =>

[

'title' => '英國通過“12月12日提前大選”的議案',

'content' => '當地時間29号晚上大約20:20左右、中原標準時間大約30日淩晨4:20，英國議會下院表決通過了首相鮑裡斯·約翰遜提出的12月12日提前大選的簡短議案。由于此前工黨決定改變反對提前大選的立場，不出外界所料，約翰遜提出的“12月12日大選”的“一句話”議案在英國議會在英國議會下院以438票支援、20反對的表決結果，順利獲得通過。'

]

];

$response = $client->index($params);

return $response;

}

測試一下是否可以搜尋得到：

POST:/test/_search

{

"query": {

"match": {

"title": "英國"

}

雖然能搜到，并不能說明就是按我設定的中文分司進行的分司。如果是預設的分把每個漢字拆開索引，也可以搜尋得到。我又試了一拼音，發現什麼也沒有搜到。說明我的分詞沒有起做用。仔細想想，查查文檔，發現在建立映射時，可以指定分詞，于時删了索引，重建，在建映射時加入分司的設定：

{

"index":"test",

"body":{

"_source":{

"enabled":true

"properties":{

"title":{

"type":"text",

"analyzer":"ik_pinyin_analyzer"

"content":{

"type":"text",

"analyzer":"ik_pinyin_analyzer"

"author":{

"type":"keyword"

}

PHP代碼如下：

public function createMap()

{

$client = $this->getElasticClient();

$params = [

'index' => 'test',

'body' => [

'_source' => [

'enabled' => true

'properties' => [

'title' => [

'type' => 'text',

"analyzer" => "ik_pinyin_analyzer"

'content' => [

'type' => 'text',

"analyzer" => "ik_pinyin_analyzer"

'author' => [

'type' => 'keyword'

]

];

$response = $client->indices()->putMapping($params);

return $response;

}

然後索引一段文章，再搜尋試一下，發現一切正常，可以用拼音搜尋了。

es拼音分詞大帥哥_Elasticsearch中文分詞加拼音

繼續閱讀

es拼音分詞大帥哥_elasticsearch 拼音+ik分詞，spring data elasticsearch 拼音分詞

es拼音分詞大帥哥_ES查詢優化之中文分詞優化

es拼音分詞大帥哥_SpringBoot內建Elasticsearch 進階，實作中文、拼音分詞，繁簡體轉換...

es拼音分詞 大帥哥_Elasticsearch中文分詞加拼音

繼續閱讀

es拼音分詞大帥哥_Elasticsearch中文分詞加拼音