天天看點

Elasticsearch 入門教程

點選上方 程式設計牧馬人,選擇 設為星标

優質項目,及時送達

Elasticsearch 入門教程

本文根據官方文檔[1]指南,基于​

​docker​

​​ 容器快速搭建 ​

​Elasticsearch​

​ 環境,并結合阮一峰部落格[2]全文搜尋引擎 Elasticsearch 入門教程 對 ​

​Elasticsearch​

​ 快速入門進行總結。

強烈建議閱讀本文前先學習阮一峰部落格,連結如下:全文搜尋引擎 Elasticsearch 入門教程

安裝

官網安裝教程位址:​

​https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started.html​

基本概念

1. Node 與 Cluster

​Elastic​

​​ 本質上是一個分布式資料庫,允許多台伺服器協同工作,每台伺服器可以運作多個 ​

​Elastic​

​​ 執行個體,單個 ​

​Elastic​

​​ 執行個體稱為一個節點(​

​node​

​​),一組節點構成一個叢集(​

​cluster​

​)。

2. Index

​Elastic​

​​ 會索引所有字段,經過處理後寫入一個反向索引(​

​Inverted Index​

​​)。查找資料的時候,直接查找該索引。是以,​​​

​Elastic​

​​ 資料管理的頂層機關就叫做 ​

​Index​

​​(索引)。它是單個資料庫的同義詞。每個 ​

​Index​

​ (即資料庫)的名字必須是小寫。

下面的指令可以檢視目前節點的所有 ​

​Index​

​。

$ curl -X GET 'http://localhost:9200/_cat/indices?v'      

3. 添加單個資料

POST logs-my_app-default/_doc
{
  "@timestamp": "2099-05-06T16:21:15.000Z",
  "event": {
    "original": "192.0.2.42 - - [06/May/2099:16:21:15 +0000] \"GET /images/bg.jpg HTTP/1.0\" 200 24736"
  }
}      

結果:

{
  "_index": ".ds-logs-my_app-default-2099-05-06-000001",
  "_type": "_doc",
  "_id": "gl5MJXMBMk1dGnErnBW8",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}      

4. 添加多個資料

PUT logs-my_app-default/_bulk
{ "create": { } }
{ "@timestamp": "2099-05-07T16:24:32.000Z", "event": { "original": "192.0.2.242 - - [07/May/2020:16:24:32 -0500] \"GET /images/hm_nbg.jpg HTTP/1.0\" 304 0" } }
{ "create": { } }
{ "@timestamp": "2099-05-08T16:25:42.000Z", "event": { "original": "192.0.2.255 - - [08/May/2099:16:25:42 +0000] \"GET /favicon.ico HTTP/1.0\" 200 3638" } }      

5. 搜尋資料

查詢所有比對資料:​

​logs-my_app-default​

​​,并以​

​@timestamp​

​ 降序顯示

GET logs-my_app-default/_search
{
  "query": {
    "match_all": { }
  },
  "sort": [
    {
      "@timestamp": "desc"
    }
  ]
}      

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": null,
    "hits": [
      {
        "_index": ".ds-logs-my_app-default-2099-05-06-000001",
        "_type": "_doc",
        "_id": "PdjWongB9KPnaVm2IyaL",
        "_score": null,
        "_source": {
          "@timestamp": "2099-05-08T16:25:42.000Z",
          "event": {
            "original": "192.0.2.255 - - [08/May/2099:16:25:42 +0000] \"GET /favicon.ico HTTP/1.0\" 200 3638"
          }
        },
        "sort": [
          4081940742000
        ]
      },
      ...
    ]
  }
}      

6. 解析固定字段,去除一些字段:

GET logs-my_app-default/_search
{
  "query": {
    "match_all": { }
  },
  "fields": [
    "@timestamp"
  ],
  "_source": false,
  "sort": [
    {
      "@timestamp": "desc"
    }
  ]
}      

{
  ...
  "hits": {
    ...
    "hits": [
      {
        "_index": ".ds-logs-my_app-default-2099-05-06-000001",
        "_type": "_doc",
        "_id": "PdjWongB9KPnaVm2IyaL",
        "_score": null,
        "fields": {
          "@timestamp": [
            "2099-05-08T16:25:42.000Z"
          ]
        },
        "sort": [
          4081940742000
        ]
      },
      ...
    ]
  }
}      

​"fields"​

​​挑選字段解析,​

​'_source':false,​

​該字段不再顯示

7. 範圍搜尋 ​

​range​

GET logs-my_app-default/_search
{
  "query": {
    "range": {
      "@timestamp": {
        "gte": "2099-05-05",
        "lt": "2099-05-08"
      }
    }
  },
  "fields": [
    "@timestamp"
  ],
  "_source": false,
  "sort": [
    {
      "@timestamp": "desc"
    }
  ]
}      

查詢過去一天的資料

GET logs-my_app-default/_search
{
  "query": {
    "range": {
      "@timestamp": {
        "gte": "now-1d/d",
        "lt": "now/d"
      }
    }
  },
  "fields": [
    "@timestamp"
  ],
  "_source": false,
  "sort": [
    {
      "@timestamp": "desc"
    }
  ]
}      

8. 建立 索引 ​

​Index​

PUT my_index
{
  "mappings": 
  {
    "properties": 
    {
      "address":
      {
        "type": "ip"
      },
      "port":
      {
        "type": "long"
      }
    }
  }
}      

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "my_index"
}      

9. 将一些文檔加載到其中:

POST my_index/_bulk
{"index":{"_id":"1"}}
{"address":"1.2.3.4","port":"80"}
{"index":{"_id":"2"}}
{"address":"1.2.3.4","port":"8080"}
{"index":{"_id":"3"}}
{"address":"2.4.8.16","port":"80"}      

傳回結果:

{
  "took" : 8,
  "errors" : false,
  "items" : [
    {
      "index" : {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "index" : {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "2",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 1,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "index" : {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "3",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 2,
        "_primary_term" : 1,
        "status" : 201
      }
    }
  ]
}      

10. 使用靜态字元串建立兩個

GET my_index/_search
 {
   "runtime_mappings": {
     "socket": {
       "type": "keyword",
       "script": {
         "source": "emit(doc['address'].value + ':' + doc['port'].value)"
       }
     }
   },
   "fields": [
     "socket"
   ],
   "query": {
     "match": {
       "socket": "1.2.3.4:8080"
     }
   }
}      

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "address" : "1.2.3.4",
          "port" : "8080"
        },
        "fields" : {
          "socket" : [
            "1.2.3.4:8080"
          ]
        }
      }
    ]
  }
}      

上面代碼中,傳回結果的 ​

​took​

​​字段表示該操作的耗時(機關為毫秒),​

​timed_out​

​​字段表示是否逾時,​

​hits​

​字段表示命中的記錄,裡面子字段的含義如下:

  • ​total​

    ​:傳回記錄數,本例是2條。
  • ​max_score​

    ​:最高的比對程度,本例是1.0。
  • ​hits​

    ​:傳回的記錄組成的數組。

傳回的資料中,​

​found​

​​字段表示查詢成功,​

​_source​

​字段傳回原始記錄。

我們在 ​

​runtime_mappings​

​​ 部分中定義了字段 ​

​socket​

​​。 我們使用了一個簡短的 ​

​painless script​

​​,該腳本定義了每個文檔将如何計算 ​

​socket​

​​ 的值(使用 + 表示 ​

​address​

​​ 字段的值與靜态字元串 “:” 和 ​

​port​

​​ 字段的值的串聯)。 然後,我們在查詢中使用了字段 ​

​socket​

​​。 字段 ​

​socket​

​​ 是一個臨時運作時字段,僅對于該查詢存在,并且在運作查詢時進行計算。 在定義要與 ​

​runtime fields​

​​ 一起使用的 ​

​painless script​

​​ 時,必須包括 ​

​emit​

​ 以傳回計算出的值。

​socket​

​​ :運作時加入的字段。​

​source​

​​, ​

​id​

官方文檔:​

​The script itself, which you specify as source for an inline script or id for a stored script. Use the stored script APIs to create and manage stored scripts.​

​"source": "emit(doc['address'].value + ':' + doc['port'].value)" 為内嵌腳本​

11. 如果我們發現 ​

​socket​

​ 是一個我們想在多個查詢中使用的字段,而不必為每個查詢定義它,則可以通過調用簡單地将其添加到映射中:

PUT my_index/_mapping
 {
   "runtime": {
     "socket": {
       "type": "keyword",
       "script": {
         "source": "emit(doc['address'].value + ':' + doc['port'].value)"
       }
     } 
   } 
}      

結果:

{
  "acknowledged" : true
}      

此時在​

​Index mapping​

​​ 檔案裡已經存在​

​socket​

​​字段,然後查詢,不必在運作時定義包含 ​

​socket​

​ 字段,例如

GET my_index/_search
{
  "fields": [
    "socket"
  ],
  "query": {
    "match": {
      "socket": "1.2.3.4:8080"
    }
  }
}      

結果(和使用靜态字元串建立兩個結果一樣):

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "address" : "1.2.3.4",
          "port" : "8080"
        },
        "fields" : {
          "socket" : [
            "1.2.3.4:8080"
          ]
        }
      }
    ]
  }
}      

僅在要顯示 ​

​socket​

​​ 字段的值時才需要語句 ​

​"fields": ["socket"]​

​​。 現在,字段查詢可用于任何查詢,但它不存在于索引中,并且不會增加索引的大小。 僅在查詢需要 ​

​socket​

​​ 以及需要它的文檔時才計算 ​

​socket​

12. ​

​runtime​

​​和 ​

​runtime_mapping​

​差別:

使用​

​runtime​

​​ 時定義的字段會存儲到​

​Index​

​​映射中,而​

​runtime_mapping​

​ 定義的字段隻存在運作查詢中。

映射字段:​

​https://www.elastic.co/guide/en/elasticsearch/reference/7.11/runtime-mapping-fields.html#runtime-mapping-fields​

請求字段: ​

​https://www.elastic.co/guide/en/elasticsearch/reference/7.11/runtime-search-request.html#runtime-search-request​

13. 在查詢時覆寫字段值

PUT my_raw_index
{
  "mappings": {
    "properties": {
      "raw_message": {
        "type": "keyword"
      },
      "address": {
        "type": "ip"
      }
    }
  }
}      

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "my_raw_index"
}      

參考資料

[1]

官方文檔: https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started.html

[2]

繼續閱讀