天天看點

Search 通過 Kibana—Elastic Stack 實戰手冊

Search 通過 Kibana—Elastic Stack 實戰手冊
https://developer.aliyun.com/topic/download?id=1295 · 更多精彩内容,請下載下傳閱讀全本《Elastic Stack實戰手冊》 https://developer.aliyun.com/topic/download?id=1295 https://developer.aliyun.com/topic/es100 · 加入創作人行列,一起交流碰撞,參與技術圈年度盛事吧 https://developer.aliyun.com/topic/es100

創作人:李增勝

業務背景

在 TO B 行業,對商品的搜尋展示,是有一定業務要求的,例如:存在合作關系的買家和供應商才能看到供應商店鋪的商品,不存在合作關系的買家則不展示商品。另外,有些商品對客戶甲展示一種價格,對客戶乙則展示另外一種價格,進而區分不同的會員、分組對商品價格的差別。

一句話總結:TO B 行業的商品銷售具有一定封閉性、特殊性。後續例子均在此背景下展開描述,以友善大家更加貼近業務場景來熟悉 Elasticsearch 對文檔、索引、查詢的一系列操作。

本文采用 IK 做分詞器,下載下傳的 IK 分詞器版本必須和 Elasticsearch 版本一緻

IK下載下傳位址:

https://github.com/medcl/elasticsearch-analysis-ik/releases
  1. 在 Elasticsearch 的安裝目錄的 Plugins 目錄下建立 IK 檔案夾,然後将下載下傳的 IK 安裝包解壓到此目錄下。
  2. 重新開機 Elasticsearch 即可。

定義 Mapping

商品字段描述如下:

  • goodsName: 商品名稱
  • skuCode:商品 sku 編碼
  • brandName:商品品牌名稱
  • channelType:管道類型
  • shopCode: 店鋪編碼
  • publicPrice:售賣價格(基礎價,對所有人開放價格)
  • closeUserCode:封閉會員編碼
  • groupPrice:分組價格,其中使用嵌套類型存儲,包括: 分組價格、 分組級别

定義商品 Mapping

PUT my_goods
{
  "settings": {
    "index": {
      "number_of_shards": 1,
      "number_of_replicas": 1
    }
  },
  "mappings": {
    "properties": {
      "goodsName": {
        "type": "text",
        "analyzer": "ik_smart"
      },
      "skuCode": {
        "type": "keyword"
      },
      "brandName": {
        "type": "keyword"
      },
      "channelType": {
        "type": "keyword"
      },
      "shopCode": {
        "type": "keyword"
      },
      "publicPrice": {
        "type": "float"
      },
      "closeUserCode": {
        "type": "text",
        "analyzer": "standard"
      },
      "boostValue": {
        "type": "keyword"
      },
      "groupPrice": {
        "type": "nested",
        "properties": {
          "boxLevelPrice": {
            "type": "float"
          },
          "level": {
            "type": "text"
          }
        }
      }
    }
  }
}           

Document APIs

主要涉及以下幾個核心功能

Search 通過 Kibana—Elastic Stack 實戰手冊

Index

對文檔的新增操作支援以下類型

PUT /<target>/_doc/<_id>
POST /<target>/_doc/
PUT /<target>/_create/<_id>
POST /<target>/_create/<_id>           

以 POST //_create/<_id>為例,以下将建立文檔 ID 為 1 的商品資訊

POST /my_goods/_create/1
{
    "goodsName":"蘋果 51英寸 4K超高清",
    "skuCode":"skuCode1",
    "brandName":"蘋果",
    "closeUserCode":[
        "0"
    ],
    "channelType":"cloudPlatform",
    "shopCode":"sc00001",
    "publicPrice":"8188.88",
    "groupPrice":null,
    "boxPrice":null,
    "boostValue":1.8
}           

Bulk

Elasticsearch 支援批量插入,_bulk 批量導入

POST my_goods/_bulk
{"index":{"_id":1}}
{"goodsName":"蘋果 51英寸 4K超高清","skuCode":"skuCode1","brandName":"蘋果","closeUserCode":["0"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8188.88","groupPrice":null,"boxPrice":null,"boostValue":1.8}
{"index":{"_id":2}}
{"goodsName":"蘋果 55英寸 3K超高清","skuCode":"skuCode2","brandName":"蘋果","closeUserCode":["0"],"channelType":"cloudPlatform","shopCode":"sc00002","publicPrice":"6188.88","groupPrice":null,"boxPrice":null,"boostValue":1.0}
{"index":{"_id":3}}
{"goodsName":"蘋果UA55RU7520JXXZ 53英寸 4K高清","skuCode":"skuCode3","brandName":"美國蘋果","closeUserCode":["0"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8388.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["htd003","uc004"],"boxPriceDetail":4388.88},{"boxType":"box2","boxUserCode":["uc005","uc0010"],"boxPriceDetail":5388.88}],"boostValue":1.2}
{"index":{"_id":4}}
{"goodsName":"山東蘋果UA55RU7520JXXZ 蘋果54英寸 5K超高清","skuCode":"skuCode4","brandName":"山東蘋果","closeUserCode":["uc001","uc002","uc003"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8488.88","groupPrice":[{"level":"level1","boxLevelPrice":"2488.88"},{"level":"level2","boxLevelPrice":"3488.88"}],"boxPrice":[{"boxType":"box1","boxUserCode":["uc004","uc005","uc006","uc001"],"boxPriceDetail":4488.88},{"boxType":"box2","boxUserCode":["htd007","htd008","htd009","uc0010"],"boxPriceDetail":5488.88}],"boostValue":1.2}
{"index":{"_id":5}}
{"goodsName":"蘋果UA55R蘋果U7蘋果520JXXZ 55英寸 5K超高清","skuCode":"skuCode5","brandName":"三星蘋果","closeUserCode":["uc001","uc002","uc003"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8488.88","groupPrice":[{"level":"level1","boxLevelPrice":"2500"},{"level":"level2","boxLevelPrice":"3500"}],"boxPrice":[{"boxType":"box1","boxUserCode":["uc004","uc005","uc006","uc001"],"boxPriceDetail":3588.88},{"boxType":"box2","boxUserCode":["htd007","htd008","htd009","uc0010"],"boxPriceDetail":5588.88}],"boostValue":1.2}
{"index":{"_id":6}}
{"goodsName":"三星UA55RU7520JXXZ 51英寸 4K超高清","skuCode":"skuCode1","brandName":"三星","closeUserCode":["0"],"channelType":"cmccPlatform","shopCode":"sc00001","publicPrice":"8188.88","groupPrice":null,"boxPrice":null,"boostValue":1.2}
{"index":{"_id":7}}
{"goodsName":"三星UA55RU7520JXXZ 52英寸 4K超高清","skuCode":"skuCode2","brandName":"三星","closeUserCode":["0"],"channelType":"cmccPlatform","shopCode":"sc00001","publicPrice":"8288.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["htd002"],"boxPriceDetail":4288.88}],"boostValue":1.2}
{"index":{"_id":8}}
{"goodsName":"三星UA55RU7520JXXZ 52英寸 4K超高清","skuCode":"skuCode2","brandName":"三星","closeUserCode":["uc0022"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8288.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["uc0022"],"boxPriceDetail":4288.88}],"boostValue":1.2}
{"index":{"_id":9}}
{"goodsName":"三星UA55RU7520JXXZ 52英寸 4K超高清","skuCode":"skuCode2","brandName":"三星","closeUserCode":["uc0022"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8288.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["uc0022"],"boxPriceDetail":4288.88}],"boostValue":1.2}
{"index":{"_id":10}}
{"goodsName":"三星UA55RU7520JXXZ 52英寸 4K超高清","skuCode":"skuCode2","brandName":"三星","closeUserCode":["uc0022"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8288.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["uc0022"],"boxPriceDetail":4288.88}],"boostValue":1.8}           

Delete

對文檔的删除操作支援以下類型

DELETE /<index>/_doc/<_id>           

删除文檔 ID 為 2 的資料:

DELETE /my_goods/_doc/2           

Delete by query

另外,删除操作支援帶多種條件的删除,可以使用 _delete_by_query

如下操縱,将删除店鋪編碼為 sc00002 的所有商品。

POST /my_goods/_delete_by_query
{
  "query": {
    "match": {
      "shopCode": "sc00002"
    }
  }
}           

Update

對文檔的修改操作支援以下類型

POST /<index>/_update/<_id>           

修改文檔 ID 為1的文檔資訊

新增字段

POST /my_goods/_update/1
{
  "doc": {
    "shopName": "小王店鋪"
  }
}           

修改店鋪名稱為:“張三店鋪”

POST /my_goods/_update/1
{
  "doc": {
    "shopName": "張三店鋪"
  }
}           
{
  "goodsName" : "蘋果 51英寸 4K超高清",
  "skuCode" : "skuCode1",
  "brandName" : "蘋果",
  "closeUserCode" : [
    "0"
  ],
  "channelType" : "cloudPlatform",
  "shopCode" : "sc00001",
  "publicPrice" : "8188.88",
  "groupPrice" : null,
  "boxPrice" : null,
  "boostValue" : 1.8,
  "shopName" : "張三店鋪"
}           

另外還可以使用 PUT 進行修改,隻不過需要羅列所有字段

PUT my_goods/_doc/10
{
  "goodsName": "三星UA55RU7520JXXZ 52英寸 4K超高清",
  "skuCode": "skuCode10",
  "brandName": "三星",
  "closeUserCode": [
    "uc0022"
  ],
  "channelType": "cloudPlatform",
  "shopCode": "sc00001",
  "publicPrice": "8288.88",
  "groupPrice": null,
  "boxPrice": [
    {
      "boxType": "box1",
      "boxUserCode": [
        "uc0022"
      ],
      "boxPriceDetail": 4288.88
    }
  ],
  "boostValue": 1.8
}           

用腳本同樣能實作更新操作

POST my_goods/_update/10
{
  "script": {
    "source": "ctx._source.city=params.channelType",
    "lang": "painless",
    "params": {
      "channelType": "cloudPlatform1"
    }
  }
}           

Update by query

更新操作還可以使用 _update_by_query API,當店鋪編碼為 sc00002 時修改 publicPrice 為 5888.00 元。

插入文檔 ID 為 2 的店鋪商品資訊

POST /my_goods/_create/2
{
  "goodsName": "蘋果 55英寸 3K超高清",
  "skuCode": "skuCode2",
  "brandName": "蘋果",
  "closeUserCode": [
    "0"
  ],
  "channelType": "cloudPlatform",
  "shopCode": "sc00002",
  "publicPrice": "6188.88",
  "groupPrice": null,
  "boxPrice": null,
  "boostValue": 1
}           

此時查詢傳回

{
  "goodsName" : "蘋果 55英寸 3K超高清",
  "skuCode" : "skuCode2",
  "brandName" : "蘋果",
  "closeUserCode" : [
    "0"
  ],
  "channelType" : "cloudPlatform",
  "shopCode" : "sc00002",
  "publicPrice" : "6188.88",
  "groupPrice" : null,
  "boxPrice" : null,
  "boostValue" : 1.0
}           

更新當店鋪編碼為 sc00002 時修改 publicPrice 為 5888.00 元

POST /my_goods/_update_by_query
{
  "script": {
    "source": "ctx._source.publicPrice=5888.00",
    "lang": "painless"
  },
  "query": {
    "term": {
      "shopCode": "sc00002"
    }
  }
}           

再次查詢結果

GET /my_goods/_source/2           
{
  "shopCode" : "sc00002",
  "brandName" : "蘋果",
  "closeUserCode" : [
    "0"
  ],
  "groupPrice" : null,
  "boxPrice" : null,
  "channelType" : "cloudPlatform",
  "boostValue" : 1.0,
  "publicPrice" : 5888.0,
  "goodsName" : "蘋果 55英寸 3K超高清",
  "skuCode" : "skuCode2"
}           

Reindex

當有業務需要重建索引時需要用到 _reindex API。

索引的來源和目的地,必須是已經存在的 index、index alias 或者 data stream。

你可以簡單的将索引 A reindex 到索引 B,當然也可以帶條件的 reindex 到索引 B。

如下所示,将 skuCode=skuCode2 的商品資訊 reindex 到索引 my_goods_new 中

POST _reindex
{
  "source": {
    "index": "my_goods",
    "query": {
      "match": {
        "skuCode": "skuCode2"
      }
    }
  },
  "dest": {
    "index": "my_goods_new"
  }
}           

查詢 my_goods_new 索引資料

GET my_goods_new/_search/           
{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_goods_new",
        "_type" : "_doc",
        "_id" : "7",
        "_score" : 1.0,
        "_source" : {
          "goodsName" : "三星UA55RU7520JXXZ 52英寸 4K超高清",
          "skuCode" : "skuCode2",
          "brandName" : "三星",
          "closeUserCode" : [
            "0"
          ],
          "channelType" : "cmccPlatform",
          "shopCode" : "sc00001",
          "publicPrice" : "8288.88",
          "groupPrice" : null,
          "boxPrice" : [
            {
              "boxType" : "box1",
              "boxUserCode" : [
                "htd002"
              ],
              "boxPriceDetail" : 4288.88
            }
          ],
          "boostValue" : 1.2
        }
      },
      {
        "_index" : "my_goods_new",
        "_type" : "_doc",
        "_id" : "8",
        "_score" : 1.0,
        "_source" : {
          "goodsName" : "三星UA55RU7520JXXZ 52英寸 4K超高清",
          "skuCode" : "skuCode2",
          "brandName" : "三星",
          "closeUserCode" : [
            "uc0022"
          ],
          "channelType" : "cloudPlatform",
          "shopCode" : "sc00001",
          "publicPrice" : "8288.88",
          "groupPrice" : null,
          "boxPrice" : [
            {
              "boxType" : "box1",
              "boxUserCode" : [
                "uc0022"
              ],
              "boxPriceDetail" : 4288.88
            }
          ],
          "boostValue" : 1.2
        }
      },
      {
        "_index" : "my_goods_new",
        "_type" : "_doc",
        "_id" : "9",
        "_score" : 1.0,
        "_source" : {
          "goodsName" : "三星UA55RU7520JXXZ 52英寸 4K超高清",
          "skuCode" : "skuCode2",
          "brandName" : "三星",
          "closeUserCode" : [
            "uc0022"
          ],
          "channelType" : "cloudPlatform",
          "shopCode" : "sc00001",
          "publicPrice" : "8288.88",
          "groupPrice" : null,
          "boxPrice" : [
            {
              "boxType" : "box1",
              "boxUserCode" : [
                "uc0022"
              ],
              "boxPriceDetail" : 4288.88
            }
          ],
          "boostValue" : 1.2
        }
      },
      {
        "_index" : "my_goods_new",
        "_type" : "_doc",
        "_id" : "10",
        "_score" : 1.0,
        "_source" : {
          "goodsName" : "三星UA55RU7520JXXZ 52英寸 4K超高清",
          "skuCode" : "skuCode2",
          "brandName" : "三星",
          "closeUserCode" : [
            "uc0022"
          ],
          "channelType" : "cloudPlatform",
          "shopCode" : "sc00001",
          "publicPrice" : "8288.88",
          "groupPrice" : null,
          "boxPrice" : [
            {
              "boxType" : "box1",
              "boxUserCode" : [
                "uc0022"
              ],
              "boxPriceDetail" : 4288.88
            }
          ],
          "boostValue" : 1.8
        }
      }
    ]
  }
}
           

Get

對文檔的查詢操作支援以下類型

GET <index>/_doc/<_id>
HEAD <index>/_doc/<_id>
GET <index>/_source/<_id>
HEAD <index>/_source/<_id>           

查詢文檔 ID 為 1 的文檔資訊

GET /my_goods/_doc/1           

查詢文檔 ID 為 1 的文檔是否存在,

隻判斷文檔是否存在,head 傳回的資訊更少、性能更高,滿足特殊業務場景使用

HEAD /my_goods/_doc/1           

傳回

200 - OK           

隻傳回文檔資訊

查詢時隻傳回 _source 資訊

GET /my_goods/_source/1           
{
  "goodsName" : "蘋果 51英寸 4K超高清",
  "skuCode" : "skuCode1",
  "brandName" : "蘋果",
  "closeUserCode" : [
    "0"
  ],
  "channelType" : "cloudPlatform",
  "shopCode" : "sc00001",
  "publicPrice" : "8188.88",
  "groupPrice" : null,
  "boxPrice" : null,
  "boostValue" : 1.8
}
           

定制化傳回參數

隻擷取 _source 部分參數,類似資料庫查詢中的指定字段,而不是 select * 傳回所有字段

#GET 請求模式
GET my_goods/_source/1/?_source_includes=brandName,goodsName

#傳回
{
  "brandName" : "蘋果",
  "goodsName" : "蘋果 51英寸 4K超高清"
}


#POST body 請求模式
POST my_goods/_search
{
  "query": {
    "match_all": {
      
    }
  },
  "fields": ["brandName", "goodsName"],
  "_source": false
}
#傳回
"hits" : [
      {
        "_index" : "my_goods",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "fields" : {
          "brandName" : [
            "蘋果"
          ],
          "goodsName" : [
            "蘋果 55英寸 3K超高清"
          ]
        }
      },
      {
        "_index" : "my_goods",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "fields" : {
          "brandName" : [
            "美國蘋果"
          ],
          "goodsName" : [
            "蘋果UA55RU7520JXXZ 53英寸 4K高清"
          ]
        }
      },
   ...
}           

查詢文檔 ID 為 1 的文檔是否存在。

隻判斷文檔是否存在 ,Head 傳回的資訊更少、性能更高,滿足特殊業務場景使用

HEAD /my_goods/_doc/1           
200 - OK           

Mutil get

ES 同時支援批量查詢,需要使用 _mget API,查詢文檔 ID 等于1和2的文檔資訊

GET /my_goods/_mget
{
  "docs": [
    {
      "_id": "1"
    },
    {
      "_id": "2"
    }
  ]
}           
{
  "docs" : [
    {
      "_index" : "my_goods",
      "_type" : "_doc",
      "_id" : "1",
      "_version" : 7,
      "_seq_no" : 8,
      "_primary_term" : 1,
      "found" : true,
      "_source" : {
        "goodsName" : "蘋果 51英寸 4K超高清",
        "skuCode" : "skuCode1",
        "brandName" : "蘋果",
        "closeUserCode" : [
          "0"
        ],
        "channelType" : "cloudPlatform",
        "shopCode" : "sc00001",
        "publicPrice" : "8188.88",
        "groupPrice" : null,
        "boxPrice" : null,
        "boostValue" : 1.8,
        "shopName" : "張三店鋪"
      }
    },
    {
      "_index" : "my_goods",
      "_type" : "_doc",
      "_id" : "2",
      "found" : false
    }
  ]
}           

Query DSL

查詢索引包括全文本查詢、組合查詢、結構化查詢等。

通常 Search 與 Filter 差別

二者的查詢是有差別的:

Query 查詢

用于解答文檔是否存在,并且告知傳回文檔與查詢條件的比對度,傳回 _score 評分供使用者選擇。

Filter 查詢

隻用于傳回文檔是否與查詢比對,但是不會告訴你比對度,即不進行評分。在做聚合查詢時,filter 經常發揮更大的作用。因為沒有評分 Elasticsearch 的處理速度就會提高,提升了整體響應時間。同時 filter 可以緩存查詢結果,而 Query 則不能緩存。

使用場景

如果涉及到全文檢索以及評分相關業務使用 Query,其他場景推薦使用 Filter 查詢。

組合查詢

Boolean 查詢

Boolean 查詢包含 must、filter、must_not。

must :必須比對并且傳回評分,filter 忽略評分,should 相當于資料庫查詢中的 or,針對 should 有一個特殊的情況,也就是所有的搜尋隻有 should ,那麼必須滿足 should 裡的其中一個才會被搜尋到。must_not 為不比對,相當于不等于。

查詢:店鋪編碼=sc00001 且管道 channelType=cloudPlatform 且 publicPrice 價格區間不在 8288-8888 之間,或者品牌包含"果"。首先以下條件必須全部滿足:

  • 店鋪編碼=sc00001
  • 管道 channelType=cloudPlatform
  • publicPrice 價格區間不在 8288-8888 之間

另外,由于還有 should 查詢,滿足品牌中包含“果”的也會被查詢出來,另外比對成功後的評分也會提高,相應的結果也會排在前面:

  • 品牌包含"果"

2 者取并集的結果作為最終結果傳回

POST /my_goods/_search
{
  "query": {
    "bool": {
      "must": {
        "term":{
          "shopCode":"sc00001"
        }
      },
      "filter": {
        "term": {
          "channelType": "cloudPlatform"
        }
      },
      "must_not": [
        {
         "range": {
           "publicPrice": {
             "gte": 8288,
             "lte": 8888
           }
         }
        }
      ],
      "should": [
        {
          "term": {
            "brandName": {
              "value": "果"
            }
          }
        }
      ],
      "minimum_should_match" : 1
    }
  }
}           

minimum_should_match 為最小比對數量,如果 bool 查詢包含至少一個 should 子句,并且沒有 must 或 filter 子句,則預設值為 1,否則,預設值為 0。舉例說明:

POST /my_goods/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "brandName": {
              "value": "東"
            }
          }
        },
        {
          "term": {
            "brandName": {
              "value": "果"
            }
          }
        }
      ],
      "minimum_should_match" : 1
    }
  }
}           

以上查詢表示 brandName 包含“東” 和 “果” 至少比對成功一次,查詢結果如下:

"hits" : [
      {
        "_index" : "my_goods",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.5678144,
        "_source" : {
          "shopCode" : "sc00001",
          "brandName" : "山東蘋果",
          "closeUserCode" : [
            "uc001",
            "uc002",
            "uc003"
          ],
          "skuCode_brandName" : "skuCode4山東蘋果",
          "channelType" : "cloudPlatform",
          "publicPrice" : 16977.76,
          "goodsName_length" : 31,
          "groupPrice" : [
            {
              "level" : "level1",
              "boxLevelPrice" : "2488.88"
            },
            {
              "level" : "level2",
              "boxLevelPrice" : "3488.88"
            }
          ],
          "boxPrice" : [
            {
              "boxType" : "box1",
              "boxUserCode" : [
                "uc004",
                "uc005",
                "uc006",
                "uc001"
              ],
              "boxPriceDetail" : 4488.88
            },
            {
              "boxType" : "box2",
              "boxUserCode" : [
                "htd007",
                "htd008",
                "htd009",
                "uc0010"
              ],
              "boxPriceDetail" : 5488.88
            }
          ],
          "boostValue" : 1.2,
          "goodsName" : "山東蘋果UA55RU7520JXXZ 蘋果54英寸 5K超高清",
          "skuCode" : "skuCode4"
        }
      },
      {
        "_index" : "my_goods",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.2792403,
        "_source" : {
          "shopCode" : "sc00002",
          "brandName" : "蘋果",
          "closeUserCode" : [
            "0"
          ],
          "skuCode_brandName" : "skuCode2蘋果",
          "channelType" : "cloudPlatform",
          "publicPrice" : 12377.76,
          "goodsName_length" : 13,
          "groupPrice" : null,
          "boxPrice" : null,
          "boostValue" : 1.0,
          "goodsName" : "蘋果 55英寸 3K超高清",
          "skuCode" : "skuCode2"
        }
      },
      {
        "_index" : "my_goods",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.2792403,
        "_source" : {
          "shopCode" : "sc00001",
          "brandName" : "蘋果",
          "closeUserCode" : [
            "0"
          ],
          "skuCode_brandName" : "skuCode1蘋果",
          "channelType" : "cloudPlatform",
          "publicPrice" : 32755.52,
          "goodsName_length" : 13,
          "groupPrice" : null,
          "boxPrice" : null,
          "boostValue" : 1.8,
          "goodsName" : "蘋果 51英寸 4K超高清",
          "skuCode" : "skuCode1"
        }
      },
      {
        "_index" : "my_goods",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.21222264,
        "_source" : {
          "shopCode" : "sc00001",
          "brandName" : "美國蘋果",
          "closeUserCode" : [
            "0"
          ],
          "skuCode_brandName" : "skuCode3美國蘋果",
          "channelType" : "cloudPlatform",
          "publicPrice" : 16777.76,
          "goodsName_length" : 26,
          "groupPrice" : null,
          "boxPrice" : [
            {
              "boxType" : "box1",
              "boxUserCode" : [
                "htd003",
                "uc004"
              ],
              "boxPriceDetail" : 4388.88
            },
            {
              "boxType" : "box2",
              "boxUserCode" : [
                "uc005",
                "uc0010"
              ],
              "boxPriceDetail" : 5388.88
            }
          ],
          "boostValue" : 1.2,
          "goodsName" : "蘋果UA55RU7520JXXZ 53英寸 4K高清",
          "skuCode" : "skuCode3"
        }
      },
  ...
]           

當我們調整 minimum_should_match 為 2 時觀察結果傳回:

POST /my_goods/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "brandName": {
              "value": "東"
            }
          }
        },
        {
          "term": {
            "brandName": {
              "value": "果"
            }
          }
        }
      ],
      "minimum_should_match" : 2
    }
  }
}

#傳回:
"hits" : [
      {
        "_index" : "my_goods",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.5678144,
        "_source" : {
          "shopCode" : "sc00001",
          "brandName" : "山東蘋果",
          "closeUserCode" : [
            "uc001",
            "uc002",
            "uc003"
          ],
          "skuCode_brandName" : "skuCode4山東蘋果",
          "channelType" : "cloudPlatform",
          "publicPrice" : 16977.76,
          "goodsName_length" : 31,
          "groupPrice" : [
            {
              "level" : "level1",
              "boxLevelPrice" : "2488.88"
            },
            {
              "level" : "level2",
              "boxLevelPrice" : "3488.88"
            }
          ],
          "boxPrice" : [
            {
              "boxType" : "box1",
              "boxUserCode" : [
                "uc004",
                "uc005",
                "uc006",
                "uc001"
              ],
              "boxPriceDetail" : 4488.88
            },
            {
              "boxType" : "box2",
              "boxUserCode" : [
                "htd007",
                "htd008",
                "htd009",
                "uc0010"
              ],
              "boxPriceDetail" : 5488.88
            }
          ],
          "boostValue" : 1.2,
          "goodsName" : "山東蘋果UA55RU7520JXXZ 蘋果54英寸 5K超高清",
          "skuCode" : "skuCode4"
        }
      }
    ]           

可以看到,隻有 goodsName 出現 “東” 和 “果” 2 次以及 2 次以上的結果被查詢到。

Boosting 查詢

Boosting 用于控制評分相關度相關,可以提升評分也可以降低評分。

可以看到 2 條文檔記錄評分一緻:"_score" : 1.3862942 ,

當我們修改 negative_boost: 0.2 時,此時傳回(省略部分無關字段)

POST /my_goods/_search
{
  "query": {
    "boosting": {
      "positive": {
        "term": {
          "skuCode": {
            "value": "skuCode1"
          }
        }
      },
      "negative": {
        "term": {
          "goodsName": {
            "value": "三星"
          }
        }
      }, 
      "negative_boost": 0.2
    }
  }
}

#傳回
"hits" : [
      {
        "_index" : "my_goods",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.3862942,
        "_source" : {
          "goodsName" : "蘋果 51英寸 4K超高清",
          "skuCode" : "skuCode1",
          "brandName" : "蘋果",
          "closeUserCode" : [
            "0"
          ],
          "channelType" : "cloudPlatform",
          "shopCode" : "sc00001",
          "publicPrice" : "8188.88",
          "groupPrice" : null,
          "boxPrice" : null,
          "boostValue" : 1.8,
          "shopName" : "張三店鋪"
        }
      },
      {
        "_index" : "my_goods",
        "_type" : "_doc",
        "_id" : "6",
        "_score" : 0.27725884,
        "_source" : {
          "goodsName" : "三星UA55RU7520JXXZ 51英寸 4K超高清",
          "skuCode" : "skuCode1",
          "brandName" : "三星",
          "closeUserCode" : [
            "0"
          ],
          "channelType" : "cmccPlatform",
          "shopCode" : "sc00001",
          "publicPrice" : "8188.88",
          "groupPrice" : null,
          "boxPrice" : null,
          "boostValue" : 1.2
        }
      }
    ]           

此時發現文檔 ID=6 的評分下降到 _score : 0.27725884,因為在 negative 命中了查詢條件,negative_boost 在 0 到 1 之間時,用于降低評分,相反,大于 1 用于提升評分。

Constant score query 查詢

當查詢不關心 TF(詞頻)時,就可以使用 constant score query 。

POST /my_goods/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "goodsName": "蘋果"
        }
      },
      "boost": 1.2
    }
  }
}           

傳回(省略部分無關字段)

{
        "_index" : "my_goods",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.2,
        "_source" : {
          "goodsName" : "蘋果UA55RU7520JXXZ 53英寸 4K高清"
        }
      },
      {
        "_index" : "my_goods",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.2,
        "_source" : {
          "goodsName" : "山東蘋果UA55RU7520JXXZ 蘋果54英寸 5K超高清"
        }
      }
}           

可以看到,文檔 ID =3 的評分和文檔 ID =4 的評分一樣,但是 ID=4 的比對相關度更高,這是由于我們忽略了詞頻對打分的影響。

Disjunction max query 查詢

Disjunction 查詢也被了解為分離最大化查詢,指的是将任何與任一查詢比對的文檔,作為結果傳回,但隻将最佳比對的評分,作為查詢的評分結果傳回。

例如查詢商品名稱和品牌名稱中包含“蘋果”的資訊,當品牌的評分高于商品名稱時,則傳回品牌的評分做為總評分(忽略tie_breaker緩沖)。

GET /my_goods/_search
{
  "query": {
    "dis_max": {
      "tie_breaker": 0.7,
      "boost": 1.2,
      "queries": [
        {
          "term": {
            "goodsName": {
              "value": "蘋果"
            }
          }
        },
        {
          "term": {
            "brandName": {
              "value": "蘋果"
            }
          }
        }
        ]
    }
  }
}           

傳回結果(忽略無關字段)

"max_score" : 3.0150018,
    "hits" : [
      {
        "_index" : "my_goods",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 3.0150018,
        "_source" : {
          "goodsName" : "蘋果 51英寸 4K超高清",
          "brandName" : "蘋果"
        }
      },
      {
        "_index" : "my_goods",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 1.3465583,
        "_source" : {
          "goodsName" : "蘋果UA55R蘋果U7蘋果520JXXZ 55英寸 5K超高清",
          "brandName" : "三星蘋果"
        }
      },
      {
        "_index" : "my_goods",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.2337791,
        "_source" : {
          "goodsName" : "山東蘋果UA55RU7520JXXZ 蘋果54英寸 5K超高清",
          "brandName" : "山東蘋果"
        }
      },           

分析:

  1. ID=1 的記錄,由于品牌隻包含“蘋果” 2 字,Elasticsearch 認為這種比對度更高,是以此條記錄評分排在第一位。
  2. ID=5 的記錄,由于品牌中和 ID =4 的記錄都包含蘋果且字數一樣,此時就要看 goodsName 包含蘋果的詞頻數量了,ID=5 的品牌中,“蘋果”出現了 3 次,而 ID=4 的值出現了 2 次,是以評分沒有 ID=5 的高,符合我們的預期結果。
  3. tie_breaker 字段做什麼用呢?它是起到了緩沖的作用(取值範圍:0 到 1 之間),Disjunction 查詢會将比對度最高的字段得分,做為整個文檔的得分傳回,這種情況其他字段就不起作用了,難免有點走極端。此時就需要 tie_breaker 來做緩存,提升其他字段的影響力,最終的結果:brandName 評分+ goodsName 評分 *tie_breaker,作為總評分傳回。

Function score query 查詢

Function score 允許你控制查詢評分,是用來控制評分過程的終極武器。最高效的用法是用過濾器對結果的子集應用不同的函數,同時運用了 filter 的緩存,并且達到控制評分的過程。

我們想讓山東的蘋果搜尋出現在美國蘋果之前,查詢商品名稱包含“蘋果”,當品牌中包含“美國”時,權重設定為 2,當出現“山東”時,權重設定為 40 。

GET /my_goods/_search
{
  "query": {
    "function_score": {
      "query": {
        "term": {
          "goodsName": {
            "value": "蘋果"
          }
        }
      },
      "boost": 2, 
      "functions": [
        {
          "filter": {
            "match":{
              "brandName":"美國"
            }
          },
          "random_score": {

          },
          "weight": 2
        },
        {
          "filter": {
            "match":{
              "brandName":"山東"
            }
          },
          "weight": 40
        }
      ],
      "max_boost": 60,
      "score_mode": "max",
      "boost_mode": "multiply",
      "min_score": 2
    }
  }
}           

傳回主要資訊

"max_score" : 2.2442641,
    "hits" : [
     {
        "_index" : "my_goods",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 2.0562985,
        "_source" : {
          "goodsName" : "山東蘋果UA55RU7520JXXZ 蘋果54英寸 5K超高清",
          "brandName" : "山東蘋果"
        }
      },
      {
        "_index" : "my_goods",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.7582327,
        "_source" : {
          "goodsName" : "蘋果UA55RU7520JXXZ 53英寸 4K高清",
          "brandName" : "美國蘋果",
        }
      }
    ]           

解釋幾個參數:

  • score_modemultiply: 預設,分數相乘
  • avg:平均分數,第一個 function 的分數
  • max:使用評分最大的分數
  • min:使用評分最小的分數 avg

舉例,如果 2 個函數傳回的分數為 1 和 2,并且它們的權重分别為 3 和 4,則他們的評分為:(13+24)/(3+4)

其他詳解請參考官方score-functions詳解: https://www.elastic.co/guide/en/elasticsearch/reference/7.10/query-dsl-function-score-query.html#score-functions

Full text 全文本查詢

Match 查詢

Match 查詢是一種标準的查詢,示例如下

# 通過 highlight 對查詢到的結果進行高亮顯示
GET /my_goods/_search
{
  "query": {
    "match": {
      "goodsName": "蘋果 高清 英寸"
    }
  },
  "highlight": {
    "fields": {
      "goodsName": {
        "pre_tags": [
          "<strong>"
        ],
        "post_tags": [
          "</strong>"
        ]
      }
    }
  }
}
           

Match 查詢是一種 boolean 類型的查詢,可以使用 operator 來控制 boolean 字句,operator 包含 and 和 or (預設為 or)。

GET /my_goods/_search
{
  "query": {
    "match": {
      "goodsName": {
        "query": "蘋果 高清 英寸",
        "operator": "and"
      }
    }
  }
}
#傳回結果:
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}           

命中為 0,因為沒有标題中包含 “蘋果 高清 英寸” 詞組的商品資訊,這裡的 and 是将查詢條件做分詞處理,然後查詢結果時,必須全部包含 “蘋果 高清 英寸” 分詞詞組才能被檢索,下面再示範下 or 的例子:

GET /my_goods/_search
{
  "query": {
    "match": {
      "goodsName": {
        "query": "蘋果 高清 英寸",
        "operator": "or"
      }
    }
  }
}

#傳回

{
        "_index" : "my_goods",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.836855,
        "_source" : {
          "shopCode" : "sc00001",
          "brandName" : "山東蘋果",
          "closeUserCode" : [
            "uc001",
            "uc002",
            "uc003"
          ],
          "skuCode_brandName" : "skuCode4山東蘋果",
          "channelType" : "cloudPlatform",
          "publicPrice" : 16977.76,
          "goodsName_length" : 31,
          "groupPrice" : [
            {
              "level" : "level1",
              "boxLevelPrice" : "2488.88"
            },
            {
              "level" : "level2",
              "boxLevelPrice" : "3488.88"
            }
          ],
          "boxPrice" : [
            {
              "boxType" : "box1",
              "boxUserCode" : [
                "uc004",
                "uc005",
                "uc006",
                "uc001"
              ],
              "boxPriceDetail" : 4488.88
            },
            {
              "boxType" : "box2",
              "boxUserCode" : [
                "htd007",
                "htd008",
                "htd009",
                "uc0010"
              ],
              "boxPriceDetail" : 5488.88
            }
          ],
          "boostValue" : 1.2,
          "goodsName" : "山東蘋果UA55RU7520JXXZ 蘋果54英寸 5K超高清",
          "skuCode" : "skuCode4"
        }
      },
      {
        "_index" : "my_goods",
        "_type" : "_doc",
        "_id" : "10",
        "_score" : 0.9227071,
        "_source" : {
          "goodsName" : "三星UA55RU7520JXXZ 52英寸 4K超高清",
          "skuCode" : "skuCode10",
          "brandName" : "三星",
          "closeUserCode" : [
            "uc0022"
          ],
          "channelType" : "cloudPlatform",
          "shopCode" : "sc00001",
          "publicPrice" : "8288.88",
          "groupPrice" : null,
          "boxPrice" : [
            {
              "boxType" : "box1",
              "boxUserCode" : [
                "uc0022"
              ],
              "boxPriceDetail" : 4288.88
            }
          ],
          "boostValue" : 1.8,
          "city" : "cloudPlatform1"
        }
      }           

可以看到,“三星 UA55RU7520JXXZ 52 英寸 4K 超高清” 由于包含 “高清” 是以能被查詢到。

Match phrase query

用于比對索引中是否存在所輸入的查詢條件資料

GET /my_goods/_search
{
  "query": {
    "match_phrase": {
      "goodsName": "apple"
    }
  }
}           

比較 match_phrase 與 match 差別

match_phrase

将查詢條件的中的資訊看做一個整體,如下面的 “goods t” 必須 goods 在前 t 在後。

match

将查詢中的條件做分詞處理後,再去做查詢。

#查詢不到任何資料,因為不存在'goods t'的詞組
GET /my_goods/_search
{
  "query": {
    "match_phrase": {
      "goodsName": "goods t"
    }
  }
}
#能查詢到資料,因為文檔中包含goods和t的詞組
GET /my_goods/_search
{
  "query": {
    "match": {
      "goodsName": "goods t"
    }
  }
}           

在 match_phrase 中,可以通過 slop 來控制單詞中間的間隔,預設為 0,下面舉例說明

GET /my_goods/_search
{
  "query": {
    "match_phrase": {
      "goodsName": {
        "query": "apple test",
        "slop": 1
      }
    }
  }
}

#傳回
{
  "took" : 10,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 3.08089,
    "hits" : [
      {
        "_index" : "my_goods",
        "_type" : "_doc",
        "_id" : "21",
        "_score" : 3.08089,
        "_source" : {
          "goodsName" : "apple goods test",
          "skuCode" : "skuCode3",
          "brandName" : "美國蘋果",
          "closeUserCode" : [
            "0"
          ],
          "channelType" : "cloudPlatform",
          "shopCode" : "sc00001",
          "publicPrice" : "8388.88",
          "groupPrice" : null,
          "boxPrice" : [
            {
              "boxType" : "box1",
              "boxUserCode" : [
                "htd003",
                "uc004"
              ],
              "boxPriceDetail" : 4388.88
            },
            {
              "boxType" : "box2",
              "boxUserCode" : [
                "uc005",
                "uc0010"
              ],
              "boxPriceDetail" : 5388.88
            }
          ],
          "boostValue" : 1.2
        }
      }
    ]
  }
}

           

可以看到,我們設定了 1 個詞條,apple 與 test 之間間隔 一個詞條,故能查詢到。

Match phrase prefix query

傳回文檔包含給定查詢條件的文檔,文檔中必須包含給定條件的内容,且是按照 prefix 來進行比對的,如 "apple goods test" ,商品名稱包含 apple goods test 的資料将被查詢到傳回。

新增一條測試資料

POST my_goods/_bulk
{"index":{"_id":13}}
{"goodsName":"apple and goods product ","skuCode":"skuCode3","brandName":"美國蘋果","closeUserCode":["0"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8388.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["htd003","uc004"],"boxPriceDetail":4388.88},{"boxType":"box2","boxUserCode":["uc005","uc0010"],"boxPriceDetail":5388.88}],"boostValue":1.2}
{"index":{"_id":21}}
{"goodsName":"apple goods test","skuCode":"skuCode3","brandName":"美國蘋果","closeUserCode":["0"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":"8388.88","groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["htd003","uc004"],"boxPriceDetail":4388.88},{"boxType":"box2","boxUserCode":["uc005","uc0010"],"boxPriceDetail":5388.88}],"boostValue":1.2}           
#隻傳回goodsName : apple goods test的資料
GET /my_goods/_search
{
  "query": {
    "match_phrase_prefix": {
      "goodsName": "apple goods t"
    }
  }
}           

總結比較 match 這四種查詢

| Match | 傳回比對查詢條件的文檔内容,查詢條件會在比對之前會被分詞處理。

|           
Match boolean prefix 是一個 Boolean 查詢,将分詞後的短語按照 term 進行查詢,最後一個詞組按照 prefix 查詢。
Match phrase
| 将查詢條件當做一個詞組進行查詢,不進行分詞處理。
                                                                          |           

| Match phrase prefix

| 傳回文檔包含給定查詢條件的文檔,文檔中必須包含給定條件的内容且是按照順序的

,與 match phrase 類似,對最後一個 token 會進行字首比對,可以通過 slop 來控制比對token的位置差。 |

Multi-match

多字段比對,可以在多個字段中比對查詢相關資訊,通過 type 參數可以調整結果集

#查詢商品名稱和品牌名稱中包含蘋果的文檔資訊
POST /my_goods/_search
{
  "query": {
    "multi_match": {
      "query": "蘋果",
      "type": "best_fields", 
      "fields": ["goodsName","brandName"],
      "tie_breaker": 0.3
    }
  }
}           

type 參數類型詳解:

  • best_fields :預設,比對 fields,将評分最高的分數做為整個查詢的分數傳回;
  • most_fields:查詢比對的文檔,并且傳回各個字段的分數之和的平均值;
  • cross_fields:跨字段比對,比對多個字段中是否包含查詢詞組,對每個字段分别進行打分,然後執行 max 運算擷取打分最高的;
  • phrase:以 match_phrase 方式運作查詢,并傳回最佳比對的評分做為總評分;
  • phrase_prefix:以 match_phrase_prefix 方式運作查詢,并傳回最佳比對的評分做為總評分;
  • bool_prefix:在每個字段上運作 match_bool_prefix 查詢,并組合每個字段的評分,詳情參考 bool_prefix 以 cross_fields 為例進行實戰講解。
#插入測試資料
PUT my_shop
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "firstName":{
        "type":"text"
      },
      "lastName":{
        "type":"text"
      }
    }
  }
}
POST my_shop/_bulk
{"index":{"_id":1}}
{"first_name":"Will","last_name":"Smith","age":25}
{"index":{"_id":2}}
{"first_name":"Smith","last_name":"hello","age":21}
{"index":{"_id":3}}
{"first_name":"Will","last_name":"hello","age":20}
           
#查詢姓名為 Will Smith 的資訊
GET /my_shop/_search
{
  "query": {
    "multi_match" : {
      "query":      "Will Smith",
      "type":       "cross_fields",
      "fields":     [ "first_name^2", "last_name" ],
      "operator":   "and"
    }
  }
}
#傳回
    "max_score" : 1.9208363,
    "hits" : [
      {
        "_index" : "my_shop",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.9208363,
        "_source" : {
          "first_name" : "Will",
          "last_name" : "Smith",
          "age" : 25
        }
      }
    ]           

另外,first_name 提升了權重,預設為1。

Term - level 查詢

可以使用 Term - level 查詢結構化資料,結構化資料如日期範圍、IP 位址、價格等,下面分别示範在業務場景中的實際使用。

Exists 查詢

傳回包含字段索引值的文檔

#傳回包含 goodsName 字段的索引文檔
GET /my_goods/_search
{
  "query": {
    "exists": {
      "field": "goodsName"
    }
  }
}           

Fuzzy 查詢

傳回包含與搜尋字詞相似的字詞的文檔,可以用于查詢糾錯功能。

Edit distance 指的是最小編輯距離,指的是兩個字元串之間,由一個字元串轉換為另外一個字元串,所需要的最少編輯次數,也叫:Levenshtein ,

參考位址: https://en.wikipedia.org/wiki/Levenshtein_distance

一些查詢和 APIs 支援參數去做不精準查詢操作,此時可以使用 fuzziness 參數

  • 0、1、2 表示最大允許可編輯距離

AUTO 根據詞項的長度确定可編輯距離數值,有兩種可選參數,AUTO:[low] 和 [high],用于分别表示短距離參數與長距離參數,未指定情況下,預設值是 3 和 6

  • 0..2 單詞長度為 0 到 2個字母之間時,必須要精确比對
  • 3..5 單詞長度 3 到 5 個字母時,最大編輯距離為 1
  • 5 單詞長度大于 5 個字母時,最大編輯距離為 2
#以官網例子舉例說明
POST /my_index/_bulk
{ "index": { "_id": 1 }}
{ "text": "Surprise me!"}
{ "index": { "_id": 2 }}
{ "text": "That was surprising."}
{ "index": { "_id": 3 }}
{ "text": "I wasn't surprised."}

GET /my_index/_search
{
  "query": {
    "fuzzy": {
      "text": {
        "value": "surprize",
        "prefix_length": 1
      }
    }
  }
}
#傳回
"hits" : [
      {
        "_index" : "my_index",
        "_type" : "my_type",
        "_id" : "1",
        "_score" : 0.9559981,
        "_source" : {
          "text" : "Surprise me!"
        }
      },
      {
        "_index" : "my_index",
        "_type" : "my_type",
        "_id" : "3",
        "_score" : 0.69983494,
        "_source" : {
          "text" : "I wasn't surprised."
        }
      }           

預設如果不設定,prefix_length 就是 0

  1. surprising 未被搜尋到,原因是預設的 auto 隻允許兩個編輯錯誤,因為 surprize 的長度大于 5,确切地說有三個編輯距離(需要有三次編輯),不能糾錯。
  2. surprize 拼寫錯誤,s->z,錯誤在一個位置,在 2 個位置的糾錯範圍之内為提高性能,可以設定 max_expansions,将限制産生模糊文檔的個數;
  3. prefix_length 不宜設定過大,也将影響查詢性能,同時錯誤過多,也将導緻查詢結果不是使用者期望的。

fuziness 實際上采用的是 auto,允許有兩個編輯距離,假設采用如下的查詢,将隻有一個結果

GET /my_index/_search
{
  "query": {
    "fuzzy": {
      "text": {
        "value": "surprize",
        "fuzziness": "1",
        "prefix_length": 1
      }
    }
  }
}

#傳回:
{
  "took" : 19,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.9559981,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "my_type",
        "_id" : "1",
        "_score" : 0.9559981,
        "_source" : {
          "text" : "Surprise me!"
        }
      }
    ]
  }
}

           

Ids 查詢

範圍文檔包含ID的文檔資訊

GET /my_goods/_search
{
  "query": {
    "ids" : {
      "values" : ["1", "4", "5"]
    }
  }
}           

Prefix 查詢

傳回在提供的字段中包含特定字首的文檔

PUT my_shop_test
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "shopName":{
        "type":"text"
      },
      "shopCode":{
        "type":"text"
      }
    }
  }
}

#添加測試資料
POST my_shop_test/_bulk
{"index":{"_id":1}}
{"shopName":"box","shopCode":"Smith"}
{"index":{"_id":2}}
{"shopName":"black","shopCode":"jack"}
{"index":{"_id":3}}
{"shopName":"fox","shopCode":"act"}
{"index":{"_id":4}}
{"shopName":"booex","shopCode":"act"}

#
GET /my_shop_test/_search
{
  "query": {
    "prefix": {
      "shopName": {
        "value": "bo"
      }
    }
  }
}
#傳回
"hits" : [
      {
        "_index" : "my_shop_test",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "shopName" : "box",
          "shopCode" : "Smith"
        }
      },
      {
        "_index" : "my_shop_test",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.0,
        "_source" : {
          "shopName" : "booex",
          "shopCode" : "act"
        }
      }
    ]           

Range 查詢

Range 查詢類似資料庫中的 大于、小于範圍查詢

GET my_goods/_search
{
  "query": {
    "range": {
      "publicPrice": {
        "gte": 2000,
        "lte": 8488
      }
    }
  }
}           
  • gt:大于
  • gte:大于等于
  • lt:小于
  • lte:小于等于

Regexp 查詢

正規表達式查詢,查詢店鋪編碼以 's' 開頭,中間包括任何字元,以及長度且以'1'結尾的資料

GET my_goods/_search
{
  "query": {
    "regexp": {
      "shopCode": {
        "value": "s.*1",
        "flags": "ALL",
        "case_insensitive": true,
        "max_determinized_states": 10000,
        "rewrite": "constant_score"
      }
    }
  }
}           

Term 查詢

#傳回确切的文檔内容,避免對 text 字段類型使用 term
GET my_goods/_search
{
  "query": {
    "term": {
      "brandName": {
        "value": "三星",
        "boost": 1.0
      }
    }
  }
}           

Terms 查詢

Terms 傳回一個或多個包含精确查詢條件的文檔資訊

GET /my_goods/_search
{
  "query": {
    "terms": {
      "brandName": [ "美國", "三星" ],
      "boost": 1.0
    }
  }
}           

Terms_set 查詢

傳回最小精确比對成功的文檔資訊,terms_set 類似 terms 查詢,隻不過 terms_se 多定義了傳回最小比對的數量。

#新定義商品資訊
PUT /my_goods_info
{
  "mappings": {
    "properties": {
      "goodsName": {
        "type": "keyword"
      },
      "sale_property": {
        "type": "keyword"
      },
      "required_matches": {
        "type": "long"
      }
    }
  }
}
#添加3條商品測試資料
#銷售屬性 白色、64G、标品
PUT /my_goods_info/_doc/1?refresh
{
  "name": "apple",
  "sale_property": [ "white", "64","standard" ],
  "required_matches": 2
}
#黑色、32G、非标品
PUT /my_goods_info/_doc/2?refresh
{
  "name": "apple",
  "sale_property": [ "black", "32","no standard" ],
  "required_matches": 2
}
#黑色、64 非标品
PUT /my_goods_info/_doc/3?refresh
{
  "name": "apple",
  "sale_property": [ "black", "64","no standard" ],
  "required_matches": 2
}
#查詢
GET /my_goods_info/_search
{
  "query": {
    "terms_set": {
      "sale_property": {
        "terms": [ "white", "64"],
        "minimum_should_match_field": "required_matches"
      }
    }
  }
}
#傳回
"hits" : [
      {
        "_index" : "my_goods_info",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.1149836,
        "_source" : {
          "name" : "apple",
          "sale_property" : [
            "white",
            "64",
            "standard"
          ],
          "required_matches" : 2
        }
      }
    ]           

Wildcard 查詢

傳回包含與通配符模式比對的術語的文檔

GET /my_goods/_search
{
  "query": {
    "wildcard": {
      "shopCode": {
        "value": "sc*1",
        "boost": 1.0,
        "rewrite": "constant_score"
      }
    }
  }
}           

Geo 查詢

Elasticsearch 支援兩種 geo 資料:geo_point 經緯度 和 geo_shape 點、線、圓、多邊形等複雜圖形

Geo_point

用于查找距離另一個坐标範圍内的所有坐标點,或者計算亮點之間的距離用于排序、打分、聚合等操作。

Geo-shapes

常用于過濾,比如判斷兩個地理形狀是否有重疊或者某個地形是否包含了其他的地理形狀

查詢分為 4 種類型

  • geo_bounding_box:查找具有落入指定矩形的地理位置的坐标點
  • geo_distance:查找地理點在中心點指定距離内的坐标點
  • geo_polygon:查找具有指定多邊形内的地理點的坐标點
  • geo_shape:查找具有以下内容的坐标點:
    • geo-shapes 與指定的幾何形狀相交,包含于其中或不與指定的幾何形狀相交的坐标點
    • geo-points 與指定的地理形狀相交的坐标點

過濾器将所有文檔載入記憶體,然後每個過濾器執行計算,判斷坐标點是否落在指定區域。可見坐标過濾器的代價較昂貴。

最優的做法是先用簡單的過濾器盡可能多的過濾掉文檔,然後再交給地理坐标過濾器來處理資料。

Geo-bounding box 查詢

定義索引對象店鋪資訊

PUT /my_shop_info
{
  "mappings": {
    "properties": {
      "pin": {
        "properties": {
          "location": {
            "type": "geo_point"
          }
        }
      }
    }
  }
}

#添加2條測試資料

PUT /my_shop_info/_doc/1
{
  "pin": {
    "location": {
      "lat": 40.12,
      "lon": -71.34
    }
  }
}

PUT /my_shop_info/_doc/2
{
  "pin": {
    "location": {
      "lat": 50.12,
      "lon": -61.34
    }
  }
}

#查詢指定範圍内的資料

GET my_shop_info/_search
{
  "query": {
    "bool": {
      "must": {
        "match_all": {}
      },
      "filter": {
        "geo_bounding_box": {
          "pin.location": {
            "top_left": {
              "lat": 40.73,
              "lon": -74.1
            },
            "bottom_right": {
              "lat": 40.01,
              "lon": -71.12
            }
          }
        }
      }
    }
  }
}

#傳回
"hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_shop_info",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "pin" : {
            "location" : {
              "lat" : 40.12,
              "lon" : -71.34
            }
          }
        }
      }
    ]
  }
           

Geo-distance 查詢

查詢僅包含距某個地理點特定距離之内的比對的坐标,如下所示,查詢坐标

#仍然以 my_shop_info 為例
GET /my_shop_info/_search
{
  "query": {
    "bool": {
      "must": {
        "match_all": {}
      },
      "filter": {
        "geo_distance": {
          "distance": "200km",
          "pin.location": {
            "lat": 40,
            "lon": -70
          }
        }
      }
    }
  }
}           

創作人簡介:

李增勝,Elasticsearch 認證工程師、PMP 項目管理認證,現就職于彙通達網絡股份有

限公司,任産業交易平台交易域技術經理,從事微服務架構、搜尋架構方向開發與管理

工作。技術關注:電商、産業網際網路等領域。

部落格:

https://www.jianshu.com/u/59dceda66b57