索引生命周期管理-Elastic Stack 實戰手冊

https://developer.aliyun.com/topic/download?id=1295 · 更多精彩内容，請下載下傳閱讀全本《Elastic Stack實戰手冊》 https://developer.aliyun.com/topic/download?id=1295 https://developer.aliyun.com/topic/es100 · 加入創作人行列，一起交流碰撞，參與技術圈年度盛事吧 https://developer.aliyun.com/topic/es100

創作人：趙凱

審稿人：朱永生

Elasticsearch 在 6.7 版本正式加入索引生命周期管理，旨在管理 Elasticsearch 中的索引。

通常我們使用 elasticsearch 的時候，index 命名都是 xxx-YYYY.MM.dd 類似這樣的格式，每天建立一個index，這需要我們自己建立 index，或者通過自動建立。

每天建立一個 index，但是每天的資料量又非常少，這對叢集來說是不利的。
如果是自動建立的話，叢集 index 和 shard 數過多，那麼在每天的 00:00 時，大量的 index 同時建立，這時我們就會發現叢集的寫入速度會變慢，可能會發生 index 寫入拒絕的情況。
叢集需要對冷熱資料進行分離，性能好的機器放最近頻繁查詢的資料，随着時間推移，資料查詢不在頻繁，需要将資料遷移到性能較差的機器上。

以上這些我們可以使用 Elasticsearch 提供的索引生命周期管理功能能很好的解決，接下來我們了解一下索引生命周期管理。

索引生命周期的四個階段

Hot:

index 正在查詢和更新，一般性能好的機器會設定為 Hot 節點來進行資料的讀寫。

Warm:

index不再更新，但是仍然需要查詢，節點性能一般可以設定為 Warm 節點。

Cold:

index不再被更新，且很少被查詢，資料仍然可以搜尋，但是能接受較慢的查詢，節點性能較差，但有大量的磁盤空間。

Delete:

資料不需要了，可以删除。

節點的類型可以通過一下兩種方式設定，推薦第二種，第一種後續可能會棄用。

第一種：

# elasticsearch.yml
# node.attr.xxx: xxx
node.attr.data: warm

第二種（推薦）：

# elasticsearch.yml 
# data_content, data_hot, data_warm, data_cold
# 配置該節點既屬于内容層又屬于熱層
node.roles: ["data_hot", "data_content"]

這四個階段按照 Hot，Warm，Cold，Delete 順序執行，上一個階段沒有執行完成是不會執行下一個階段的，對于不存在的階段，會跳過該階段進入到下一個階段。

生命周期預設每 10 分鐘檢測一次，可以通過叢集的配置動态修改，如下

PUT _cluster/settings
{
  "transient": {
    "indices.lifecycle.poll_interval": "10m" 
  }
}

生命周期管理 API

每個階段支援的行為會在下一章節進行介紹，此章節僅僅為了介紹 API。

建立生命周期管理政策

min_age 參數指定從 index 建立後多長時間進入到該階段。

以下示例是指從當 index 建立時間超過 10 天後，進入到 warm 階段，将 segment 數量 merge 為 1，warm 階段完成後，進入 delete 階段，index 建立時間超過 30 天後，将 index 删除。

PUT _ilm/policy/my_policy
{
  "policy": {
    "phases": {
      "warm": {
        "min_age": "10d",
        "actions": {
          "forcemerge": {
            "max_num_segments": 1
          }
        }
      },
      "delete": {
        "min_age": "30d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

檢視生命周期管理政策

# 檢視所有的生命周期管理政策
GET _ilm/policy

# 檢視特定的生命周期管理政策
# GET _ilm/policy/<policy_id>

删除生命周期管理政策

DELETE _ilm/policy/<policy_id>

觸發生命周期政策中特定步驟的執行

current_step

- phase 目前階段的名稱

    - action 目前行為的名稱

    - name  目前步驟的名稱

next_step

- phase 想要執行階段的名稱

    - action 想要執行行為的名稱

    - name  想要執行步驟的名稱

POST _ilm/move/my-index-000001
{
  "current_step": { 
    "phase": "new",
    "action": "complete",
    "name": "complete"
  },
  "next_step": { 
    "phase": "warm",
    "action": "forcemerge",
    "name": "forcemerge"
  }
}

移除生命周期管理政策

# POST <target>/_ilm/remove
POST my-index-000001/_ilm/remove

生命周期重試

# POST <index>/_ilm/retry
POST my-index-000001/_ilm/retry

檢視目前索引生命周期管理狀态

GET /_ilm/status

檢視一個或多個索引的目前生命周期狀态

# GET <target>/_ilm/explain
GET my-index-000001/_ilm/explain

啟動索引生命周期管理插件

POST _ilm/start

停止索引生命周期管理插件

POST /_ilm/stop

四個階段支援的行為

索引生命周期每個階段支援的行為如下：

Hot
- Set Priority
- Unfollow
- Force Merge
- Rollover
Warm
- Read only
- Allocate
- Shrink
- Migrate
Cold
- Freeze
- Searchable Snapshot
Delete
- Wait For Snapshot

行為

設定索引的優先級，一旦進入到某階段，就設定索引的優先級，節點重新啟動後，優先級較高的索引将會優先恢複。

參數:

priority: 正整數。

例如：設定 warm 階段 index 的優先級為 50

PUT _ilm/policy/my_policy
{
  "policy": {
    "phases": {
      "warm": {
        "actions": {
          "set_priority" : {
            "priority": 50
          }
        }
      }
    }
  }
}

當 index 滿足三個條件中的任何一個時，會将别名指向新生成的索引。

參數：

max_age

達到索引建立的最大時間

max_docs

達到指定的文檔數後

max_size

index 達到指定的大小時，主分片的大小，不包含副本。

以上三個參數至少應該存在一個

例如：目前 index 主分片大小達到 100GB 或文檔數超過 100000000 或者 index 建立超過 7天生産新的 index

PUT _ilm/policy/my_policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover" : {
            "max_size": "100GB",
            
"max_docs": 100000000,
            "max_age": "7d"
          }
        }
      }
    }
  }
}

将 follow 索引轉換為正常索引。

例如：

PUT _ilm/policy/my_policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "unfollow" : {}
        }
      }
    }
  }
}

指定 index 的副本數，遷移 index 到某些節點，冷熱節點資料遷移依賴此步驟。

number_of_replicas

指定 index 的副本數

include

将 index 遷移到具有指定屬性之一的節點

exclude

将 index 遷移到不包含指定屬性的節點

require

将 index 遷移到具有所有指定屬性的節點

Note: include 滿足其中一個就可以， require 必須全部滿足。

例如：到達 warm 階段将 index 的備份數設定為 2，并且将 index 遷移至屬性 box_type 包含 hot, warm 且不包含 cold 的節點。

PUT _ilm/policy/my_policy
{
  "policy": {
    "phases": {
      "warm": {
        "actions": {
          "allocate" : {
            "number_of_replicas" : 2,
            "include" : {
                "box_type": "hot"
            },
            "exclude" : {
                "box_type": "cold"
            },
            "require" : {
                "box_type": "hot,warm"
            }
          }
        }
      }
    }
  }
}

指定 index 合并後 segment 數量，在 hot 階段使用時，必須包含 rollover ，merge 時會将 index 設定為隻讀。

max_num_segments

segment 最大數量

index_codec

壓縮檔案存儲， default: LZ4

例如：warm 階段将 index 的 segments 合并為 1。

PUT _ilm/policy/my_policy
{
  "policy": {
    "phases": {
      "warm": {
        "actions": {
          "forcemerge" : {
            "max_num_segments": 1
          }
        }
      }
    }
  }
}

将 index 設定為隻讀。

PUT _ilm/policy/my_policy
{
  "policy": {
    "phases": {
      "warm": {
        "actions": {
          "readonly" : { }
        }
      }
    }
  }
}

index 設定為隻讀，然後将 index 縮小為具有更少的的 shard，縮小後的 index 名稱為 shrink-

number_of_shards

合并後的主分片數

例如：warm 階段将 index 的 shard 數合并為 1 個。

PUT _ilm/policy/my_policy
{
  "policy": {
    "phases": {
      "warm": {
        "actions": {
          "shrink" : {
            "number_of_shards": 1
          }
        }
      }
    }
  }
}

最大程度減少 index 的記憶體占用。

例如：cold 階段将 index freeze，釋放記憶體。

PUT _ilm/policy/my_policy
{
  "policy": {
    "phases": {
      "cold": {
        "actions": {
          "freeze" : { }
        }
      }
    }
  }
}

通過更新 index.routing.allocation.include._tier_preference 設定，将 index 移動到對應的資料層，如果指定了 allocate，會在遷移前先将副本數減少。如果在熱階段和冷階段沒有指定 allocate 配置設定選項，ILM 會自動注入遷移操作，如果要禁用可以将 enabled 設定為 false。

enabled

default: true, 控制 ILM 在此階段是否自動遷移索引

例如：warm 階段禁用遷移操作，主動将 index 備份數設定為 1，并且将 index 遷移至屬性 rack_id 為 one 或者 two 的節點。

PUT _ilm/policy/my_policy
{
  "policy": {
    "phases": {
      "warm": {
        "actions": {
          "migrate" : {
           "enabled": false
          },
          "allocate": {
            "number_of_replicas": 1,
            "include" : {
              "rack_id": "one,two"
            }
          }
        }
      }
    }
  }
}

生成可搜尋快照，在 7.10 版本還處于 beta，在新版可能會有所更改。

在 delete action 步驟中預設會删除快照，如果需要保留，在 delete action 中将 delete_searchable_snapshot 設定 false

snapshot_repository

Required，指定存儲快照的位置

force_merge_index

Boolean，default: true, 如果索引在先前的操作中已經使用了 force merge，則可搜尋快照操作不會執行強制合并。

例如：在 cold 階段生成快照。

PUT _ilm/policy/my_policy
{
  "policy": {
    "phases": {
      "cold": {
        "actions": {
          "searchable_snapshot" : {
            "snapshot_repository" : "backing_repo"
          }
        }
      }
    }
  }
}

等待制定的 SLM 政策執行，然後在删除索引，為了確定删除的索引快照是可用的。

policy

required， SML 政策的名字

例如：delete 階段等待 SLM 政策執行，然後删除索引。

PUT _ilm/policy/my_policy
{
  "policy": {
    "phases": {
      "delete": {
        "actions": {
          "wait_for_snapshot" : {
            "policy": "slm-policy-name"
          }
        }
      }
    }
  }
}

删除 index

delete_searchable_snapshot

boolean, default: true, 是否删除 cold 階段建立的 searchable snapshot。

例如：index 建立 90 天後，删除 index

PUT _ilm/policy/my_policy
{
  "policy": {
    "phases": {
      "delete": {
        "min_age" : "90d",
        "actions": {
          "delete" : { }
        }
      }
    }
  }
}

通過 alias 使用 ILM

建立生命周期政策

warm 階段将 index 配置設定給節點屬性 data 為 warm 的節點， cold 階段将 index 配置設定給節點屬性 data 為 cold 的節點。

節點屬性可以通過 elasticsearch.yml 進行配置或環境變量設定。

# 啟動指令
bin/elasticsearch -Enode.attr.data=warm

# elasticsearch.yml
# node.attr.xxx: xxx
# 建議使用 node.roles 進行配置, 可以參考 通過 data tiers 使用 ILM 這一章節
# node.attr 後續版本可能不在使用

node.attr.data: warm

建立生命周期政策，

在 index 建立 1 天後進入 hot 階段，設定優先級為 100，當 index 主分片大小超過 50gb 或者 index 文檔數超過 500000000 或者 index 建立超過 2 天生成新的 index

warm 階段将 index 遷移至屬性 data 為 warm 的節點

cold 階段将 index 副本數設定為 1 并将 index 遷移至屬性 data 為 cold 的節點

當 hot ，warm，cold 階段的動作都完成并且 index 建立達到 7 天，删除 index。

PUT _ilm/policy/logx_policy
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "1d",
        "actions": {
          "set_priority": {
            "priority": 100
          },
          "rollover": {
            "max_age": "2d",
            "max_docs": 500000000,
            "max_size": "50gb"
          }
        }
      },
      "warm": {
        "min_age": "1d",
        "actions": {
          "set_priority": {
            "priority": 50
          },
          "allocate": {
            "include": {
              "data" : "warm"
            }
          }
        }
      },
      "cold": {
        "min_age": "1d",
        "actions": {
          "set_priority": {
            "priority": 0
          },
          "allocate": {
            "number_of_replicas": 1,
            "include" : {
              "data": "cold"
            }
          }
        }
      }, 
      "delete": {
        "min_age": "7d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

建立索引模闆，将生命周期應用到 index

設定 shard 數為 2，備份數為 1，生命周期政策為 logx_policy，滾動别名為 logx

PUT _index_template/logx-template
{
  "index_patterns" : ["logx-*"],
  "priority" : 1,
  "template": {
    "settings" : {
      "index" : {
        "number_of_shards" : "2",
        "number_of_replicas" : "1",
        "lifecycle.name": "logx_policy",
        "lifecycle.rollover_alias": "logx"
      }
    }
  }
}

建立第一個 index，以下兩種形式任選一種即可， index 格式必須滿足該正則 ^.-\d+$* , example：logs-000001

PUT logx-000001
{
  "aliases": {
    "logx": {
      "is_write_index": true
    }
  }
}
# OR 帶建立日期的 index
# PUT /<logx-{now/d}-1> with URI encoding:
PUT /%3Clogx-%7Bnow%2Fd%7D-1%3E 
{
  "aliases": {
    "logx": {
      "is_write_index": true
    }
  }
}

後續的資料讀寫均使用固定别名 logx

通過 Data stream 使用 ILM

建立生命周期政策，在 index 建立 1 天後進入 hot 階段，設定優先級為 100，當 index 主分片大小超過 50gb 或者 index 文檔數超過 500000000 或者 index 建立超過 2 天生成新的 index，warm 階段将 index 遷移至屬性 data 為 warm 的節點，cold 階段将 index 副本數設定為 1 并将 index 遷移至屬性 data 為 cold 的節點，當 hot ，warm，cold 階段的動作都完成并且 index 建立達到 7 天，删除 index。

PUT _ilm/policy/logx_policy
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "1d",
        "actions": {
          "set_priority": {
            "priority": 100
          },
          "rollover": {
            "max_age": "2d",
            "max_docs": 500000000,
            "max_size": "50gb"
          }
        }
      },
      "warm": {
        "min_age": "1d",
        "actions": {
          "set_priority": {
            "priority": 50
          },
          "allocate": {
            "include": {
              "data" : "warm"
            }
          }
        }
      },
      "cold": {
        "min_age": "1d",
        "actions": {
          "set_priority": {
            "priority": 0
          },
          "allocate": {
            "number_of_replicas": 1,
            "include" : {
              "data": "cold"
            }
          }
        }
      }, 
      "delete": {
        "min_age": "7d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

與通過 alias 的形式差別: 模闆不需要指定 index.rollover_alias，也不需要手動建立第一個index，直接将資料寫入符合模闆的 index 即可，至于這個 index 在 Elasticsearch 中對應幾個index，我們無需關注。

PUT _index_template/logx-template
{
  "index_patterns" : ["logx-*"],
  "priority" : 1,
  "data_stream": { },
  "template": {
    "settings" : {
      "index" : {
        "number_of_shards" : "2",
        "number_of_replicas" : "1",
        "lifecycle.name": "logx_policy"
      }
    }
  }
}

建立 data stream

POST /logx-business/_doc/
{
    "@timestamp":"2021-04-13T11:04:05.000Z",
    "message":"Loginattemptfailed"
}
# OR 
PUT /_data_stream/logx-business

後續的資料讀寫均使用固定 index: logx-business

通過 Data tiers 使用 ILM

data tiers （資料層）是具有相同資料角色的節點的集合

Content tier （内容層）節點處理諸如産品目錄之類的内容的索引和查詢負載。
Hot tier （熱層）節點處理諸如日志或名額之類的時間序列資料的索引負載，并儲存你最近，最常通路的資料。
Warm tier （溫層）節點儲存的時間序列資料通路頻率較低，并且很少需要更新。
Cold tier （冷層）節點儲存時間序列資料，這些資料偶爾會被通路，并且通常不會更新。

推薦冷熱分離采用 data tiers 這種方式，節點可以通過如下配置方式配置：

# elasticsearch.yml 
# data_content, data_hot, data_warm, data_cold
# 配置該節點既屬于内容層又屬于熱層
node.roles: ["data_hot", "data_content"]

warm 階段将 index 遷移至 warm 節點，cold 階段禁用 migrate，将 index 配置設定給 rack_id 為 one 或 two 的節點。

建立生命周期政策，在 index 建立 1 天後進入 hot 階段，設定優先級為 100，當 index 主分片大小超過 50gb 或者 index 文檔數超過 500000000 或者 index 建立超過 2 天生成新的 index，warm 階段将 index 遷移至 warm 節點，cold 階段将 index 副本數設定為 1，禁用 migrate，并将 index 遷移至屬性 data 為 cold 的節點，當 hot ，warm，cold 階段的動作都完成并且 index 建立達到 7 天，删除 index。

PUT _ilm/policy/logx_policy
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "1d",
        "actions": {
          "set_priority": {
            "priority": 100
          },
          "rollover": {
            "max_age": "2d",
            "max_docs": 500000000,
            "max_size": "50gb"
          }
        }
      },
      "warm": {
        "min_age": "1d",
        "actions": {
          "set_priority": {
            "priority": 50
          },
          "migrate" : {
          }
        }
      },
      "cold": {
        "min_age": "1d",
        "actions": {
          "set_priority": {
            "priority": 0
          },
          "allocate": {
            "number_of_replicas": 1,
            "include" : {
              "data": "cold"
            }
          }, 
          "migrate" : {
            "enabled": false
          }
        }
      }, 
      "delete": {
        "min_age": "7d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

設定 shard 數為 2，備份數為 1，生命周期政策為 logx_policy

PUT _index_template/logx-template
{
  "index_patterns" : ["logx-*"],
  "priority" : 1,
  "data_stream": { },
  "template": {
    "settings" : {
      "index" : {
        "number_of_shards" : "2",
        "number_of_replicas" : "1",
        "lifecycle.name": "logx_policy"
      }
    }
  }
}

POST /logx-business/_doc/
{
    "@timestamp":"2021-04-13T11:04:05.000Z",
    "message":"Loginattemptfailed"
}
# OR 
PUT /_data_stream/logx-business

索引生命周期管理-Elastic Stack 實戰手冊

索引生命周期的四個階段

生命周期管理 API

四個階段支援的行為

行為

通過 alias 使用 ILM

通過 Data stream 使用 ILM

通過 Data tiers 使用 ILM

繼續閱讀

黑馬程式員——C#結構及常用基本類型

試分析如何把數組array中的所有元素循環右移p位

Flash AS3 連續加載外部若幹圖檔

手機軟體抓包工具及其使用方法

DB2表壓縮功能

推薦一些VB的學習交流網站

華為筆試軟體

項目管理那些事兒

OS --written test1

OS-written test2

壓縮編碼M-JPEG、MPEG4、H.264

轉詳解C#資料庫存取圖檔三大方式

GNU科學函數庫[參考手冊][v0.1 Build 090129 Beta][GNU Scientific Library]

與專家面對面：Android開發入門問與答

BMP檔案結構及圖像每行位元組計算方法

磁盤結構及在Linux中的命名