Elasticsearch 運作時類型 Runtime fields 深入詳解

這時，可能想到的解決方案：

方案一：重新建立索引時添加字段，清除已有資料再重新導入資料。

方案二：重新建立索引時添加字段，原索引通過 reindex 寫入到新索引。

方案三：提前指定資料預處理，結合管道 ingest 重新導入或批量更新 update_by_query 實作。

方案四：保留原索引不動，通過script 腳本實作。

方案一、二類似，新加字段導入資料即可。

方案三、方案四我們模拟實作一把。

2、方案三、四實作一把

2.1 方案三 Ingest 預處理實作

DELETE news_00001

PUT news_00001

{

"mappings": {

"properties": {

"emotion": {

"type": "integer"

}

POST news_00001/_bulk

{"index":{"_id":1}}

{"emotion":558}

{"index":{"_id":2}}

{"emotion":125}

{"index":{"_id":3}}

{"emotion":900}

{"index":{"_id":4}}

{"emotion":600}

PUT _ingest/pipeline/my-pipeline

"processors": [

{

"script": {

"description": "Set emotion flag param",

"lang": "painless",

"source": """

if (ctx['emotion'] < 300 && ctx['emotion'] > 0)

ctx['emotion_flag'] = -1;

if (ctx['emotion'] >= 300 && ctx['emotion'] <= 700)

ctx['emotion_flag'] = 0;

if (ctx['emotion'] > 700 && ctx['emotion'] < 1000)

ctx['emotion_flag'] = 1;

"""

]

POST news_00001/_update_by_query?pipeline=my-pipeline

"query": {

"match_all": {}

方案三的核心：定義了預處理管道：my-pipeline，管道裡做了邏輯判定，對于emotion 不同的取值區間，設定 emotion_flag 不同的結果值。

該方案必須提前建立管道，可以通過寫入時指定預設管道 default_pipeline 或者結合批量更新實作。

實際是兩種細分實作方式：

方式一：udpate_by_query 批量更新。而更新索引尤其全量更新索引是有很大的成本開銷的。

方式二：寫入階段指定預處理管道，每寫入一條資料預處理一次。

2.2 方案四 script 腳本實作

POST news_00001/_search

"script_fields": {

"emotion_flag": {

"source": "if (doc['emotion'].value < 300 && doc['emotion'].value>0) return -1; if (doc['emotion'].value >= 300 && doc['emotion'].value<=700) return 0; if (doc['emotion'].value > 700 && doc['emotion'].value<=1000) return 1;"

方案四的核心：通過 script_field 腳本實作。

該方案僅是通過檢索擷取了結果值，該值不能用于别的用途，比如：聚合。

還要注意的是：script_field 腳本處理字段會有性能問題。

兩種方案各有利弊，這時候我們會進一步思考：

能不能不改 Mapping、不重新導入資料，就能得到我們想要的資料呢？

早期版本不可以，7.11 版本之後的版本有了新的解決方案——Runtime fields 運作時字段。

3、Runtime fields 産生背景

Runtime fields 運作時字段是舊的腳本字段 script field 的 Plus 版本，引入了一個有趣的概念，稱為“讀取模組化”（Schema on read）。

有 Schema on read 自然會想到 Schema on write（寫時模組化），傳統的非 runtime field 類型都是寫時模組化的，而 Schema on read 則是另辟蹊徑、讀時模組化。

這樣，運作時字段不僅可以在索引前定義映射，還可以在查詢時動态定義映射，并且幾乎具有正常字段的所有優點。

Runtime fields在索引映射或查詢中一旦定義，就可以立即用于搜尋請求、聚合、篩選和排序。

4、Runtime fields 解決文章開頭問題

4.1 Runtime fields 實戰求解

PUT news_00001/_mapping

"runtime": {

"emotion_flag_new": {

"type": "keyword",

"source": "if (doc['emotion'].value > 0 && doc['emotion'].value < 300) emit('-1'); if (doc['emotion'].value >= 300 && doc['emotion'].value<=700) emit('0'); if (doc['emotion'].value > 700 && doc['emotion'].value<=1000) emit('1');"

GET news_00001/_search

"fields" : ["*"]

4.2 Runtime fields 核心文法解讀

第一：PUT news_00001/_mapping 是在已有 Mapping 的基礎上更新 Mapping。

這是更新 Mapping 的方式。實際上，建立索引的同時，指定 runtime field 原理一緻。實作如下：

PUT news_00002

"runtime": {

"emotion_flag_new": {

"type": "keyword",

"script": {

}

第二：更新的什麼呢？

加了字段，确切的說，加了：runtime 類型的字段，字段名稱為：emotion_flag_new，字段類型為：keyword，字段數值是用腳本 script 實作的。

腳本實作的什麼呢？

當 emotion 介于 0 到 300 之間時，emotion_flag_new 設定為 -1 。

當 emotion 介于 300 到 700 之間時，emotion_flag_new 設定為 0。

當 emotion 介于 700 到 1000 之間時，emotion_flag_new 設定為 1。

第三：如何實作檢索呢？

我們嘗試一下傳統的檢索，看一下結果。

我們先看一下 Mapping：

"news_00001" : {

"mappings" : {

"runtime" : {

"emotion_flag_new" : {

"type" : "keyword",

"script" : {

"source" : "if (doc['emotion'].value > 0 && doc['emotion'].value < 300) emit('-1'); if (doc['emotion'].value >= 300 && doc['emotion'].value<=700) emit('0'); if (doc['emotion'].value > 700 && doc['emotion'].value<=1000) emit('1');",

"lang" : "painless"

}

"properties" : {

"emotion" : {

"type" : "integer"

多了一個 runtime 類型的字段：emotion_flag_new。

執行：

傳回結果如下：

1. GET news_00001/_search
2. {
3.   "query": {
4.     "match": {
5.       "emotion_flag_new": "-1"
6.     }
7.   }
8. }

1. GET news_00001/_search
2. {
3.   "fields" : ["*"],
4.   "query": {
5.     "match": {
6.       "emotion_flag_new": "-1"
7.     }
8.   }
9. }

4.3 Runtime fields 核心文法解讀

為什麼加了：field:[*] 才可以傳回檢索比對結果呢？

因為：Runtime fields 不會顯示在：_source 中，但是：fields API 會對所有 fields 起作用。

如果需要指定字段，就寫上對應字段名稱；否則，寫 * 代表全部字段。

4.4 如果不想另起爐竈定義新字段，在原來字段上能實作嗎？

其實上面的示例已經完美解決問題了，但是再吹毛求疵一下，在原有字段 emotion 上查詢時實作更新值可以嗎？

實戰一把如下：

"runtime_mappings": {

"emotion": {

if(params._source['emotion'] > 0 && params._source['emotion'] < 300) {emit('-1')}

if(params._source['emotion'] >= 300 && params._source['emotion'] <= 700) {emit('0')}

if(params._source['emotion'] > 700 && params._source['emotion'] <= 1000) {emit('1')}

"""

"fields": [

"emotion"

傳回結果：

解釋一下：

第一：原來 Mapping 裡面 emotion是 integer 類型的。

第二：我們定義的是檢索時類型，mapping 沒有任何變化，但是：檢索時字段類型 emotion 在字段名稱保持不變的前提下，被修改為：keyword 類型。

這是一個非常牛逼的功能！！！

早期 5.X、6.X 沒有這個功能的時候，實際業務中我們的處理思路如下：

步驟一：停掉實時寫入；

步驟二：建立新索引，指定新 Mapping，新增 emotion_flag 字段。

步驟三：恢複寫入，新資料會生效；老資料 reindex 到新索引，reindex 同時結合 ingest 腳本處理。

有了 Runtime field，這種相當繁瑣的處理的“苦逼”日子一去不複回了！

5、Runtime fields 适用場景

比如：日志場景。運作時字段在處理日志資料時很有用，尤其是當不确定資料結構時。

使用了 runtime field，索引大小要小得多，可以更快地處理日志而無需對其進行索引。

6、Runtime fields 優缺點

優點 1：靈活性強

運作時字段非常靈活。主要展現在：

需要時，可以将運作時字段添加到我們的映射中。

不需要時，輕松删除它們。

删除操作實戰如下：

"runtime": {

"emotion_flag": null

}

也就是說将這個字段設定為：null，該字段便不再出現在 Mapping 中。

優點 2：打破傳統先定義後使用方式

運作時字段可以在索引時或查詢時定義。

由于運作時字段未編入索引，是以添加運作時字段不會增加索引大小，也就是說 Runtime fields 可以降低存儲成本。

優點3：能阻止 Mapping 爆炸

Runtime field 不被索引（indexed）和存儲（stored），能有效阻止 mapping “爆炸”。

原因在于 Runtime field 不計算在 index.mapping.total_fields 限制裡面。

缺點1：對運作時字段查詢會降低搜尋速度

對運作時字段的查詢有時會很耗費性能，也就是說，運作時字段會降低搜尋速度。

7、Runtime fields 使用建議

權衡利弊：可以通過使用運作時字段來減少索引時間以節省 CPU 使用率，但是這會導緻查詢時間變慢，因為資料的檢索需要額外的處理。

結合使用：建議将運作時字段與索引字段結合使用，以便在寫入速度、靈活性和搜尋性能之間找到适當的平衡。

8、小結

本文通過實戰中添加字段的問題引出解決問題的幾個方案；傳統的解決方案大多都需要更改 Mapping、重建索引、reindex 資料等，相對複雜。

因而，引申出更為簡單、快捷的 7.11 版本後才有的方案——Runtime fields。

Runtime fields 的核心知識點如下：

Mapping 環節定義；

在已有 Mapping 基礎上更新；

檢索時使用 runtime fields 達到動态添加字段的目的；

覆寫已有 Mapping 字段類型，保證字段名稱一緻的情況下，實作特定用途

優缺點、适用場景、使用建議。

你在實戰環節使用 Runtime fields 了嗎？效果如何呢？

歡迎留言回報交流。

參考

https://opster.com/elasticsearch-glossary/runtime-fields/ https://www.elastic.co/cn/blog/introducing-elasticsearch-runtime-fields https://dev.to/lisahjung/beginner-s-guide-understanding-mapping-with-elasticsearch-and-kibana-3646 https://www.elastic.co/cn/blog/getting-started-with-elasticsearch-runtime-fields

Elasticsearch 運作時類型 Runtime fields 深入詳解

繼續閱讀

黑馬程式員——C#結構及常用基本類型

試分析如何把數組array中的所有元素循環右移p位

Flash AS3 連續加載外部若幹圖檔

手機軟體抓包工具及其使用方法

DB2表壓縮功能

推薦一些VB的學習交流網站

華為筆試軟體

項目管理那些事兒

OS --written test1

OS-written test2

壓縮編碼M-JPEG、MPEG4、H.264

轉詳解C#資料庫存取圖檔三大方式

GNU科學函數庫[參考手冊][v0.1 Build 090129 Beta][GNU Scientific Library]

與專家面對面：Android開發入門問與答

BMP檔案結構及圖像每行位元組計算方法

磁盤結構及在Linux中的命名