天天看點

ElasticSearch:高亮搜尋

概述

什麼是highlight

Highlight就是我們所謂的高亮,即允許對一個或者對個字段在搜尋結果中高亮顯示。比如字型加粗或者字型呈現和其他文本普通顔色等。

為了執行高亮顯示,該字段必須有實際的内容,并且這個字段必須存儲,即在mapping中store設為true,不能隻存在于記憶體中,否則系統會自動加載_source字段并比對相關的列。

三種高亮類型

ES提供了三種高亮類型,Lucene的plain highlighter,以及fast vector highlighter(fvh)以及posting highlighter.

Plain Highlighter

Plain Hightlighter是預設的高亮選擇,由使用Lucene Hightlighter實作的。它主要是試圖反應查詢比對邏輯。

如果想高亮很多字段,而且帶有複雜的查詢,那麼這個highlight并不是很快的。為了準确地反映查詢邏輯,它建立了一個很小的記憶體索引。并通過Lucene的查詢執行計劃來重新運作原始的查詢條件,進而獲得對目前文檔的低級比對資訊,每個字段和每個需要高亮顯示的文檔都會重複這個過程,是以是有性能隐患的。是以需要你換一個hightlight類型

Fast Vector Highlighter

如果我們在mapping中對字段指定了term_vector參數,且參數值是with_positions_offsets,那麼fast vector highlighter 将會替代plain highlighter成為預設的highlight類型。

它的主要特點:

  1. 對磁盤的消耗更少
  2. 将文本切割為句子,并且對句子進行高亮,效果更好
  3. 性能比plain highlight高,因為不需要重新對高亮文本進行分詞
Posting Highlighter

如果我們在mapping裡index_options設定成offsets,這個posting hightlighter将會代替plain highlighter。

它對大檔案而言(大于1M),性能更高。

示例

查詢位址資訊中含有mill或者Court的記錄,并将它們高亮顯示。

查詢語句如下:

GET /bank/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "address": "mill" } },
        { "match": { "address": "Court" } }
      ]
    }
  }, 
  "highlight": {
    "fields": {
      "address": {}
    }
  }
}
           

查詢結果如下:

{
    "_index" : "bank",
    "_type" : "account",
    "_id" : "472",
    "_score" : 5.4032025,
    "_source" : {
        "account_number" : 472,
        "balance" : 25571,
        "firstname" : "Lee",
        "lastname" : "Long",
        "age" : 32,
        "gender" : "F",
        "address" : "288 Mill Street",
        "employer" : "Comverges",
        "email" : "[email protected]",
        "city" : "Movico",
        "state" : "MT"
    },
    "highlight" : {
        "address" : [
            "288 <em>Mill</em> Street"
        ]
    }
},
{
    "_index" : "bank",
    "_type" : "account",
    "_id" : "18",
    "_score" : 2.1248586,
    "_source" : {
        "account_number" : 18,
        "balance" : 4180,
        "firstname" : "Dale",
        "lastname" : "Adams",
        "age" : 33,
        "gender" : "M",
        "address" : "467 Hutchinson Court",
        "employer" : "Boink",
        "email" : "[email protected]",
        "city" : "Orick",
        "state" : "MD"
    },
    "highlight" : {
        "address" : [
            "467 Hutchinson <em>Court</em>"
        ]
    }
}
           

發現它會自動在比對字段上加上

<em> </em>

标簽

自定義高亮标簽

文法如下:

"pre_tags": ["<tag1>"],
"post_tags": ["</tag2>"],
           

查詢語句如下:

GET /bank/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "address": "mill" } },
        { "match": { "address": "Court" } }
      ]
    }
  }, 
  "highlight": {
    "pre_tags": ["<a>"],
    "post_tags": ["</a>"], 
    "fields": {
      "address": {}
    }
  }
}
           

查詢結果如下:

{
    "_index" : "bank",
    "_type" : "account",
    "_id" : "472",
    "_score" : 5.4032025,
    "_source" : {
        "account_number" : 472,
        "balance" : 25571,
        "firstname" : "Lee",
        "lastname" : "Long",
        "age" : 32,
        "gender" : "F",
        "address" : "288 Mill Street",
        "employer" : "Comverges",
        "email" : "[email protected]",
        "city" : "Movico",
        "state" : "MT"
    },
    "highlight" : {
        "address" : [
            "288 <a>Mill</a> Street"
        ]
    }
},
{
    "_index" : "bank",
    "_type" : "account",
    "_id" : "18",
    "_score" : 2.1248586,
    "_source" : {
        "account_number" : 18,
        "balance" : 4180,
        "firstname" : "Dale",
        "lastname" : "Adams",
        "age" : 33,
        "gender" : "M",
        "address" : "467 Hutchinson Court",
        "employer" : "Boink",
        "email" : "[email protected]",
        "city" : "Orick",
        "state" : "MD"
    },
    "highlight" : {
        "address" : [
            "467 Hutchinson <a>Court</a>"
        ]
    }
}
           

發現高亮标簽已經被替換

繼續閱讀