天天看點

Painless scripting — Elastic Stack 實戰手冊

Painless scripting — Elastic Stack 實戰手冊
https://developer.aliyun.com/topic/download?id=1295 · 更多精彩内容,請下載下傳閱讀全本《Elastic Stack實戰手冊》 https://developer.aliyun.com/topic/download?id=1295 https://developer.aliyun.com/topic/es100 · 加入創作人行列,一起交流碰撞,參與技術圈年度盛事吧 https://developer.aliyun.com/topic/es100

創作人:李增勝

Painless scripting 是一種簡單的、安全的針對 Elasticsearch 設計的腳本語言,Painless 可以使用在任何可以使用 scripting 的場景。腳本提供了以下優點:

  • 更高的性能,scripting 腳本比其他的可選腳本快數倍。
  • 安全性高,更小顆粒度的字段授權機制,避免可能不必要的安全隐患安全。
  • 可選類型,變量和參數可以使用顯示或者動态類型程式設計方式。
  • 文法,擴充 Java 的文法并相容了其他腳本。
  • 優化,專為 Elasticsearch 設計的腳本語言。

常用關鍵字:

if、else、while、do、for、in,continue,break,return,

new、try、catch、throw、this、instanceof。

常用舉例

首先我們建立測試資料,商品資訊

#添加測試資料
POST my_goods/_bulk
{"index":{"_id":1}}
{"goodsName":"蘋果 51英寸 4K超高清","skuCode":"skuCode1","brandName":"蘋果","closeUserCode":["0"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":8188.88,"groupPrice":null,"boxPrice":null,"boostValue":1.8}
{"index":{"_id":2}}
{"goodsName":"蘋果 55英寸 3K超高清","skuCode":"skuCode2","brandName":"蘋果","closeUserCode":["0"],"channelType":"cloudPlatform","shopCode":"sc00002","publicPrice":6188.88,"groupPrice":null,"boxPrice":null,"boostValue":1.0}
{"index":{"_id":3}}
{"goodsName":"蘋果UA55RU7520JXXZ 53英寸 4K高清","skuCode":"skuCode3","brandName":"美國蘋果","closeUserCode":["0"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":8388.88,"groupPrice":null,"boxPrice":[{"boxType":"box1","boxUserCode":["htd003","uc004"],"boxPriceDetail":4388.88},{"boxType":"box2","boxUserCode":["uc005","uc0010"],"boxPriceDetail":5388.88}],"boostValue":1.2}
{"index":{"_id":4}}
{"goodsName":"山東蘋果UA55RU7520JXXZ 蘋果54英寸 5K超高清","skuCode":"skuCode4","brandName":"山東蘋果","closeUserCode":["uc001","uc002","uc003"],"channelType":"cloudPlatform","shopCode":"sc00001","publicPrice":8488.88,"groupPrice":[{"level":"level1","boxLevelPrice":"2488.88"},{"level":"level2","boxLevelPrice":"3488.88"}],"boxPrice":[{"boxType":"box1","boxUserCode":["uc004","uc005","uc006","uc001"],"boxPriceDetail":4488.88},{"boxType":"box2","boxUserCode":["htd007","htd008","htd009","uc0010"],"boxPriceDetail":5488.88}],"boostValue":1.2}           

Inline script

少量代碼跟随其他 DSL 一起執行的腳本,在下面的例子用會說明具體案例。

添加字段

如果我們想添加一個新字段,而新字段又依賴已有字段,如下所示,我們添加一個新品牌,品牌的名稱為原有品牌的基礎上拼接“新品”,就可以使用腳本來實作此業務。

POST my_goods/_update_by_query
{
  "script": {
    "source": "ctx._source.new_brandName = ctx._source.brandName + '新品'"
  }
}

#查詢結果
GET my_goods/_search

#傳回(省略部分無關字段)
"hits" : [
      {
        "_index" : "my_goods",
        "_source" : {
          "shopCode" : "sc00001",
          "new_brandName" : "蘋果新品",
          "brandName" : "蘋果",
          "closeUserCode" : [
            "0"
          ]
        }
      },
      {
        "_index" : "my_goods",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "shopCode" : "sc00002",
          "new_brandName" : "蘋果新品",
          "brandName" : "蘋果",
          "closeUserCode" : [
            "0"
          ],
          "groupPrice" : null,
          "boxPrice" : null,
          "channelType" : "cloudPlatform",
          "boostValue" : 1.0,
          "publicPrice" : "6188.88",
          "goodsName" : "蘋果 55英寸 3K超高清",
          "skuCode" : "skuCode2"
        }
      },
     ....
    ]

#可以看到使用腳本新增的字段 new_brandName 已經生效           

上面的 source 表示我們使用了 Painless 腳本代碼,這種使用少量代碼在 DSL 中的 Painless 腳本稱為 Inline script 。

删除字段

當我們需要删除已有字段時,可以通過腳本來删除

POST my_goods/_update_by_query
{
  "script": {
    "source": "ctx._source.remove('new_brandName')"
  }
}           

更改字段值

在更改字段值時,我們使用了 params 參數的形式進行處理,使用 params 有一定優點,當腳本中 source 值一樣時,ES 會視為同一個腳本,會進行緩存不需要重新編譯,可以加快處理速度,在下次使用時可以拿出來直接使用而不用經過編譯。

#性能較差,寫死實作價格提升2倍
POST my_goods/_update/1
{
  "script": {
    "source": "ctx._source.publicPrice = ctx._source.publicPrice * 2",
    "lang": "painless"
  }
}

#性能較優,使用 params 将 ID 為1的商品的價格提高2倍
POST my_goods/_update/1
{
  "script": {
    "source": "ctx._source.publicPrice = ctx._source.publicPrice * params.promote_percent",
    "lang": "painless",
    "params": {
      "promote_percent": 2
    }
  }
}

#查詢
GET my_goods/_doc/1

#傳回
{
  "_index" : "my_goods",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 2,
  "_seq_no" : 4,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "goodsName" : "蘋果 51英寸 4K超高清",
    "skuCode" : "skuCode1",
    "brandName" : "蘋果",
    "closeUserCode" : [
      "0"
    ],
    "channelType" : "cloudPlatform",
    "shopCode" : "sc00001",
    "publicPrice" : 16377.76,
    "groupPrice" : null,
    "boxPrice" : null,
    "boostValue" : 1.8
  }
}
#可以看到,在更新前價格為“8188.88”,通過腳本更新後價格變為16377.76
           

在 Elasticsearch 中,以下的腳本會視為一個腳本:

"source": "ctx._source.publicPrice = ctx._source.publicPrice * params.promote_percent"           

下面的會被認為是 2 個不同的腳本,運作時每次都需要編譯,性能比上面使用 params

稍差:

"source": "ctx._source.publicPrice = ctx._source.publicPrice * 2"
"source": "ctx._source.publicPrice = ctx._source.publicPrice * 3"           

排序

#修改goodsName可以被doc通路
PUT my_goods/_mapping
{
  "properties": {
    "goodsName":{
      "type":"text", 
      "fielddata": "true"
    }
  }
}
#查詢并排序,根據商品名稱長度并添加幹擾因子1.1倍為最終排序結果
POST my_goods/_search
{
  "query": {
    "match": {
      "brandName": "蘋果"
    }
  },
  "sort": {
    "_script": {
      "type": "number",
      "script": {
        "lang": "painless",
        "source": "doc['goodsName'].value.length() * params.factor",
        "params": {
          "factor": 1.1
        }
      },
      "order": "asc"
    }
  }
}
           

Stored script

先将腳本存儲,在 DSL 查詢時使用已經存儲更好的腳本,叫做 stored script

#定義 stored script,腳本名稱為:promote_price
PUT _scripts/promote_price
{
  "script": {
    "source": "ctx._source.publicPrice = ctx._source.publicPrice * params.value",
    "lang": "painless"
  }
}           

如上代碼所示,我們定義了一個名稱為 promote_price 的腳本,作用就是提升售賣價格(publicPrice)一定的倍數,這個倍數是在調用時傳入的。

POST my_goods/_update_by_query
{
  "script": {
    "id": "promote_price",
    "params": {
      "value": 2
    }
  }
}           

執行 stored script,将會看到價格提升了 2 倍

Source 裡字段通路

在使用 Painless 通路 Source 裡的字段值時,需要根據運作時的上下文來确定使用的文法,Painless 常見的上下文有:update 、update_by_query、sort、ingest pipeline 等。

Context 通路字段
update ctx._source.field_name
ingest node ctx.field_name

分别舉例使用 _source 與 ctx 來操作字段的值。

# 在上面的例子中,就曾使用過ctx._source.field_name 來更新資料
POST my_goods/_update/1
{
  "script": {
    "source": "ctx._source.publicPrice = ctx._source.publicPrice * params.promote_percent",
    "lang": "painless",
    "params": {
      "promote_percent": 2
    }
  }
}           

在ingest pipeline中更新字段值

#定義 pipeline
PUT _ingest/pipeline/add_my_goods_newField
{
  "processors": [
    {
      "script": {
        "lang": "painless",
        "source": "ctx.skuCode_brandName = ctx.skuCode + ctx.brandName"
      }
    }
  ]
}

#執行 pipeline
POST my_goods/_update_by_query?pipeline=add_my_goods_newField
{
  
}

#查詢結果
GET my_goods/_search

#傳回
"hits" : [
      {
        "_index" : "my_goods",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "shopCode" : "sc00002",
          "brandName" : "蘋果",
          "closeUserCode" : [
            "0"
          ],
          "skuCode_brandName" : "skuCode2蘋果",
          "channelType" : "cloudPlatform",
          "publicPrice" : 12377.76,
          "goodsName_length" : 13,
          "groupPrice" : null,
          "boxPrice" : null,
          "boostValue" : 1.0,
          "goodsName" : "蘋果 55英寸 3K超高清",
          "skuCode" : "skuCode2"
        }
      },
      {
        "_index" : "my_goods",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "shopCode" : "sc00001",
          "brandName" : "美國蘋果",
          "closeUserCode" : [
            "0"
          ],
          "skuCode_brandName" : "skuCode3美國蘋果",
          "channelType" : "cloudPlatform",
          "publicPrice" : 16777.76,
          "goodsName_length" : 26,
          "groupPrice" : null,
          "boxPrice" : [
            {
              "boxType" : "box1",
              "boxUserCode" : [
                "htd003",
                "uc004"
              ],
              "boxPriceDetail" : 4388.88
            },
            {
              "boxType" : "box2",
              "boxUserCode" : [
                "uc005",
                "uc0010"
              ],
              "boxPriceDetail" : 5388.88
            }
          ],
          "boostValue" : 1.2,
          "goodsName" : "蘋果UA55RU7520JXXZ 53英寸 4K高清",
          "skuCode" : "skuCode3"
        }
      },
   ....
]           

可以看到 ,skuCode_brandName 是通過 skuCode+brandName 拼接成功的,通過 ctx.field 通路字段成功。

Painless Debug

Elasticsearch 中為我們提供了腳本調試方法,使我們在使用時可以友善的進行腳本調試,

#定義使用者資訊,shop_id為使用者開的店鋪ID資訊
PUT /user_info/_doc/1?refresh
{
  "first": "Michael",
  "last": "Jordan",
  "shop_id": [
    100,
    102,
    103
  ],
  "time": "2021-05-09"
}

PUT /user_info/_doc/2?refresh
{
  "first": "Michael2",
  "last": "Jordan2",
  "shop_id": [
    110,
    112,
    113,
    114,
    115
  ],
  "time": "2021-05-08"
}


#檢視mapping
GET  user_info/_mapping

#傳回

{
  "user_info" : {
    "mappings" : {
      "properties" : {
        "first" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "last" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "shop_id" : {
          "type" : "long"
        },
        "time" : {
          "type" : "date"
        }
      }
    }
  }
}           

可以看到傳回了很多字段類型,包括:long、date、keyword、text,每種類型有哪些方法可以操作呢?一種是檢視官網文檔,另外一種擷取使用的方法就是通過調試來擷取資訊了,使用_explain 來看看效果:

POST /user_info/_explain/1
{
  "query": {
    "script": {
      "script": "Debug.explain(doc.shop_id)"
    }
  }
}

#傳回:
{
  "error": {
    "root_cause": [
      {
        "type": "script_exception",
        "reason": "runtime error",
        "painless_class": "org.elasticsearch.index.fielddata.ScriptDocValues.Longs",
        "to_string": "[100, 102, 103]",
        "java_class": "org.elasticsearch.index.fielddata.ScriptDocValues$Longs",
        "script_stack": [
          "Debug.explain(doc.shop_id)",
          "                 ^---- HERE"
        ],
        "script": "Debug.explain(doc.shop_id)",
        "lang": "painless",
        "position": {
          "offset": 17,
          "start": 0,
          "end": 26
        }
      }
    ],
    "type": "script_exception",
    "reason": "runtime error",
    "painless_class": "org.elasticsearch.index.fielddata.ScriptDocValues.Longs",
    "to_string": "[100, 102, 103]",
    "java_class": "org.elasticsearch.index.fielddata.ScriptDocValues$Longs",
    "script_stack": [
      "Debug.explain(doc.shop_id)",
      "                 ^---- HERE"
    ],
    "script": "Debug.explain(doc.shop_id)",
    "lang": "painless",
    "position": {
      "offset": 17,
      "start": 0,
      "end": 26
    },
    "caused_by": {
      "type": "painless_explain_error",
      "reason": null
    }
  },
  "status": 400
}
           

可以看到是一個 runtime error 異常,那我們應該如何解決呢?

仔細觀察,doc.shop_id 是這樣的類提供支撐:

"painless_class": "org.elasticsearch.index.fielddata.ScriptDocValues.Longs"
"java_class": "org.elasticsearch.index.fielddata.ScriptDocValues$Longs"           

通過 Painless Script 的 API 幫助:

https://www.elastic.co/guide/en/elasticsearch/painless/7.10/painless-api-reference.html

最終找到 Long 類型的 API 文檔位址:

https://www.elastic.co/guide/en/elasticsearch/painless/7.10/painless-api-reference-shared-org-elasticsearch-index-fielddata.html#painless-api-reference-shared-ScriptDocValues-Longs

ScriptDocValues.Longs

  • List asList()
  • int getLength()
  • Collection asCollection()
  • Long get(int)
  • .......

我們通過觀察資料知道 shop_id 存儲的是一個 list 資料,加入我們要擷取第一個資料,

再次調整腳本:

GET user_info/_search
{
  "query": {
    "function_score": {
      "script_score": {
        "script": {
          "lang": "painless",
          "source": """
               return doc['shop_id'].getLength();
          """
        }
      }
    }
  }
}

#傳回:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 5.0,
    "hits" : [
      {
        "_index" : "user_info",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 5.0,
        "_source" : {
          "first" : "Michael2",
          "last" : "Jordan2",
          "shop_id" : [
            110,
            112,
            113,
            114,
            115
          ],
          "time" : "2021-05-08"
        }
      },
      {
        "_index" : "user_info",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 3.0,
        "_source" : {
          "first" : "Michael",
          "last" : "Jordan",
          "shop_id" : [
            100,
            102,
            103
          ],
          "time" : "2021-05-09"
        }
      }
    ]
  }
}
           

可以看到,得分最高的為 "max_score" : 5.0, 因為我們使用 script_score 調整了評分,以店鋪 ID 個數為評分結果,文檔 2 共計 5 個ID,是以傳回的是 5 。

通過以上案例,詳細解讀了 Painless Debug 在實際場景中的應用,通過一步步分析最終掌握了調試、看錯誤資訊、找官方文檔解決的方法,最終實作了掌握 Painless Debug 的目的。