ELK 之 ElasticSearch

一、概述

1、簡介

ElasticSearch是一個基于Lucene實作的開源、分布式、Restful的全文本搜尋引擎；此外，它還是一個分布式實時文檔存儲，其中每個文檔的每個field均是被索引的資料，且可被搜尋；也是一個帶實時分析功能的分布式搜尋引擎，能夠擴充至數以百計的節點實時處理PB級的資料。

應用場景：當我們建立一個網站或應用程式，并要添加搜尋功能，但是想要完成搜尋工作的建立是非常困難的。我們希望搜尋解決方案要運作速度快、能有一個零配置和一個完全免費的搜尋模式、能夠簡單地使用JSON通過HTTP來索引資料、搜尋伺服器始終可用、一台開始并擴充到數百台，并且實時搜尋，我們要簡單的多租戶，我們希望建立一個雲的解決方案。是以我們利用Elasticsearch來解決所有這些問題以及可能出現的更多其它問題。

設計用途：用于分布式全文檢索

技術支援：通過HTTP使用JSON進行資料索引

主要目的：解決對于搜尋的衆多需求

2、基本元件

索引(index)：文檔容器，換句話說，索引是具有類似屬性的文檔的集合。類似于表。索引名必須使用小寫字母；

類型(type)：類型是索引内部的邏輯分區，其意義完全取決于使用者需求。一個索引内部可定義一個或多個類型。一般來說，類型就是擁有相同的域的文檔的預定義。

文檔(document)：文檔是Lucene索引和搜尋的原子機關，它包含了一個或多個域。是域的容器；基于JSON格式表示。每個域的組成部分：一個名字，一個或多個值；擁有多個值的域，通常稱為多值域；

映射(mapping)：原始内容存儲為文檔之前需要事先進行分析，例如切詞、過濾掉某些詞等；映射用于定義此分析機制該如何實作；除此之外，ES還為映射提供了諸如将域中的内容排序等功能。

4、ES的叢集元件

Cluster：ES的叢集辨別為叢集名稱；預設為"elasticsearch"。節點就是靠此名字來決定加入到哪個叢集中。一個節點隻能屬性于一個叢集。

Node：運作了單個ES執行個體的主機即為節點。用于存儲資料、參與叢集索引及搜尋操作。節點的辨別靠節點名。

Shard：将索引切割成為的實體存儲元件；但每一個shard都是一個獨立且完整的索引；建立索引時，ES預設将其分割為5個shard，使用者也可以按需自定義，建立完成之後不可修改。

shard有兩種類型：primary shard和replica。Replica用于資料備援及查詢時的負載均衡。每個主shard的副本數量可自定義，且可動态修改。

5、ES Cluster工作過程

啟動時，通過多點傳播(預設)或單點傳播方式在9300/tcp查找同一叢集中的其它節點，并與之建立通信。

叢集中的所有節點會選舉出一個主節點負責管理整個叢集狀态，以及在叢集範圍内決定各shards的分布方式。站在使用者角度而言，每個均可接收并響應使用者的各類請求。

叢集有狀态：green, red, yellow

6、其他

ES的預設端口：

參與叢集的事務：9300/tcp

transport.tcp.port

接收請求：9200/tcp

http.port

二、安裝配置ElasticSearch

1、系統環境

作業系統：一台CentOS release 6.9（主節點），兩台CentOS release 6.7（其他節點）

ElasticSearch版本：elasticsearch-5.5.2.rpm

JDK版本：java-1.8.0-openjdk

注：由于ElasticSearch依賴Java環境，是以必須安裝JDK。而且JDK版本必須在1.7以上。

2、安裝并配置java環境

<code>yum </code><code>install</code> <code>java-1.8.0-openjdk-1.8.0.102-4.b14.el7.x86_64.rpm</code>

<code>vim </code><code>/etc/profile</code><code>.d</code><code>/java</code><code>.sh</code>

添加:

<code>export</code> <code>JAVA_HOME=</code><code>/usr/bin/java</code>

注：我的java在/usr/bin下,可以通過which指令查找下确切位置。

3、安裝ElasticSearch

<code>yum </code><code>install</code> <code>elasticsearch-5.5.2.rpm</code>

注:安裝完成後有以下提示資訊

簡單編輯配置elasticsearch檔案:

<code>vim </code><code>/etc/elasticsearch/elasticsearch</code><code>.ym</code>

主節點添加cluster以及node名稱：

其他節點添加cluster以及node名稱(cluster命名一樣，node分别命名)：

另一台：

啟動elasticsearch服務：

<code>service elasticsearch start</code>

檢視監聽端口：

通過CURL通路Elasticsearch進行檢測是否正常：

<code>curl -X GET </code><code>'http://127.0.0.1:9200/?pretty'</code>

Curl通路Elasticsearch格式：

<code>VERB: GET, PUT, DELETE等；</code>

<code>PROTOCOL: http, https</code>

<code>QUERY_STRING：查詢參數，例如?pretty表示用易讀的JSON格式輸出；</code>

<code>BODY: 請求的主體；</code>

<code>[root@CentOS ~]</code><code># curl -X GET http://127.0.0.1:9200/_cat/nodes?h=name,ip,port,uptime,heap,current</code>

<code>Centos 127.0.0.1 9300 33.6m</code>

<code>檢視叢集監控狀況（health）：</code>

<code>curl -XGET </code><code>'http://127.0.0.1:9200/_cluster/health?pretty'</code>

<code>檢視叢集狀态(state):</code>

<code>curl -XGET </code><code>'http://127.0.0.1:9200/_cluster/state/?pretty'</code>

<code>檢視叢集節點狀态(stats):</code>

<code>curl -XGET </code><code>'http://127.0.0.1:9200/_nodes/stats?pretty'</code>

問題：

1.在啟動elasticsearch的時候，記憶體占用過大：

由于安裝的elasticsearch為5.5版本，此版本預設heap設定為2G.

可以編輯jvm.options配置檔案進行設定。

2.不能通路elasticsearch9200端口：

可以修改elasticsearch.yml配置檔案，

<code>network.host: 192.168.137.100</code>

4、Elasticsearch的插件(Plugins)

Elasticsearch擴充性很強，可以通過安裝插件實作功能的擴充。通過插件擴充ES功能如添加自定義的映射類型、自定義分析器、本地腳本、自定義發現方式。

Plugins的安裝:

安裝方式有兩種：

1)直接将插件放置于/usr/share/elasticsearch/plugins目錄中；

2)使用/usr/share/elasticsearch/bin/elasticsearch-plugins腳本進行安裝(可以通過加參數"-h"擷取幫助資訊)；

例：安裝x-pack

<code>/usr/share/elasticsearch/bin/elasticsearch-plugin</code> <code>install</code> <code>x-pack</code>

X-Pack是一個Elastic Stack的擴充，将安全，警報，監視，報告和圖形功能包含在一個易于安裝的軟體包中。

在Elasticsearch 5.0.0之前，您必須安裝單獨的Shield，Watcher和Marvel插件才能獲得在X-Pack中所有的功能。

站點插件通路格式：

<code>http:</code><code>//HOST</code><code>:9200</code><code>/_plugin/</code><code>{plugin_name}</code>

三、ElasticSearch的使用

1、Elasticsearch CRUD操作相關的API

1)建立文檔：

<code>curl -XPUT </code><code>'localhost:9200/students/class1/1?pretty'</code> <code>-d '</code>

<code>> </code><code>"first_name"</code><code>: </code><code>"fan"</code><code>,</code>

<code>> </code><code>"gender"</code><code>: </code><code>"male"</code><code>,</code>

<code>> </code><code>"courses"</code><code>: </code><code>"it"</code>

或

<code>curl -XPUT </code><code>'localhost:9200/students/class1/1'</code> <code>-d </code><code>'{ "first_name": "fan","age": 25,"sex":"male"}'</code>

2)擷取文檔：

<code>curl -XGET </code><code>'localhost:9200/students/class1/1?pretty'</code>

<code> </code><code>"_index"</code> <code>: </code><code>"students"</code><code>,</code>

<code> </code><code>"_type"</code> <code>: </code><code>"class1"</code><code>,</code>

<code> </code><code>"_id"</code> <code>: </code><code>"1"</code><code>, </code><code>#插入時指定的id=1,xxx/class1/1(id)?pretty</code>

<code> </code><code>"_version"</code> <code>: 1,</code>

<code> </code><code>"found"</code> <code>: </code><code>true</code><code>,</code>

<code> </code><code>"_source"</code> <code>: {</code>

<code> </code><code>"first_name"</code> <code>: </code><code>"fan"</code><code>,</code>

3)更新文檔（兩種方法）

1.PUT方法（會覆寫原有文檔）

2.如果隻更新部分内容，可以使用_update API

<code>curl -XPOST </code><code>'localhost:9200/students/class1/1/_update?pretty'</code> <code>-d '</code>

<code> </code><code>"_version"</code> <code>: 2,</code>

<code> </code><code>"result"</code> <code>: </code><code>"updated"</code><code>,</code>

<code> </code><code>"_shards"</code> <code>: {</code>

<code> </code><code>"total"</code> <code>: 2,</code>

<code> </code><code>"successful"</code> <code>: 1,</code>

<code> </code><code>"failed"</code> <code>: 0</code>

4)删除文檔

删除文檔指定id就可以。（删除id為3的内容）

<code>curl -XDELETE </code><code>'localhost:9200/students/class1/3?pretty'</code>

<code> </code><code>"_version"</code> <code>: 4,</code>

<code> </code><code>"result"</code> <code>: </code><code>"deleted"</code><code>,</code>

在重新查找id為3的文檔，結果已經删除。

5）删除索引

檢視索引：(使用_cat API)

<code>curl -XGET </code><code>'localhost:9200/_cat/indices?v'</code>

删除索引:(指定索引節點即可)

<code>curl -XDELETE </code><code>'localhost:9200/students'</code>

<code>{</code><code>"acknowledged"</code><code>:</code><code>true</code><code>}</code>

再次檢視索引:(已經删除)

2、查詢資料

查詢資料我們需要使用elasticsearch的"Query" API，“Query”API是elasticsearch最重要的一部分。主要用于實作搜尋功能。

"Query"API通過Query DSL（JSON based language for building complex queries）來實作，Query DSL可以實作

諸多類型的查詢操作，如simple term query，phrase，range boolean，fuzzy等。

ES的查詢操作執行分為兩個階段：1）分散階段 2）合并階段：

查詢方式：

向ES發起查詢請求的方式有兩種：

1、通過Restful request API查詢，也稱為query string；

2、通過發送REST request body進行；（可以查詢更為複雜的查詢）

例方法一查詢：查詢student索引下所有文檔(查詢較小資料文檔)

<code>[root@Centos6 ~]</code><code># curl -XGET 'localhost:9200/students/_search?pretty'</code>

<code> </code><code>"timed_out"</code> <code>: </code><code>false</code><code>,</code>

<code> </code><code>"total"</code> <code>: 5,</code>

<code> </code><code>"successful"</code> <code>: 5,</code>

<code> </code><code>"max_score"</code> <code>: 1.0,</code>

<code> </code><code>"_index"</code> <code>: </code><code>"students"</code><code>,</code>

<code> </code><code>"_type"</code> <code>: </code><code>"class1"</code><code>,</code>

<code> </code><code>"_score"</code> <code>: 1.0,</code>

<code> </code><code>"_source"</code> <code>: {</code>

<code> </code><code>"third_name"</code> <code>: </code><code>"jing"</code><code>,</code>

<code> </code><code>"sex"</code> <code>: </code><code>"female"</code>

<code> </code><code>"third_name"</code> <code>: </code><code>"fan"</code><code>,</code>

例方法二查詢：

<code>[root@Centos6 ~]</code><code># curl -XGET 'localhost:9200/students/_search?pretty' -d '</code>

<code>> </code><code>"query"</code><code>: {</code><code>"match_all"</code><code>: {}}</code>

多索引、多類型查詢：

/_search：所有索引；

/INDEX_NAME/_search：單索引；

/INDEX1,INDEX2/_search：多索引；

/s*,t*/_search：

/students/class1/_search：單類型搜尋

/students/class1,class2/_search：多類型搜尋

Mapping和Analysis:

什麼是Mapping?

映射

為了能夠把日期字段處理成日期，把數字字段處理成數字，把字元串字段處理成全文本（Full-text

）或精确的字元串值，Elasticsearch需要知道每個字段裡面都包含了什麼類型。這些類型和字段的

資訊存儲（包含）在映射（mapping）中。

正如《資料吞吐》一節所說，索引中每個文檔都有一個類型(type)。每個類型擁有自己的映射

(mapping)或者模式定義(schema definition)。一個映射定義了字段類型，每個字段的資料類型，以

及字段被Elasticsearch處理的方式。映射還用于設定關聯到類型上的中繼資料。

這裡隻是入門。

例如，可以使用映射來定義：

字元串字段是否作為全文本搜尋字段

哪些字段包含數字，日期或地理資訊

文檔中所有字段的值是否應該被索引到_all字段

日期值的格式

自定義規則來控制動态添加的字段的映射

映射類型與type：即一個索引中有多個type，從邏輯上對文檔進行劃分、每個索引有一個或多個映射

類型，類型是對Document劃分的邏輯組，索引中每個文檔都有一個類型(type)，每個類型擁有自己的

映射或者模式定義(schema definition) 。每個映射類型包括：

1.關聯到類型上的中繼資料，比如：_index, _type, _id, and _source

2.字段或屬性的定義，比如：字段類型，每個字段的資料類型，以及字段被ES處理的方式。

資料類型大緻分類：

簡單類型，比如：string, date, long, double, boolean 或 ip等

嵌套對象

特殊類型，比如：geo_point, geo_shape, 或completion等

簡單來講mapping在Elasticsearch中的作用就是限制。

Elasticsearch對每一個文檔，會取得其所有域的所有值，生成一個名為“_all”的域；執行查詢時，如果在query_string未指定查詢的域，則在_all域上執行查詢操作；

<code>GET ...</code><code>/_search</code><code>?q=</code><code>'Xianglong'</code>

<code>GET ...</code><code>/_search</code><code>?q=</code><code>'Xianglong%20Shiba%20Zhang'</code>

<code>GET ...</code><code>/_search</code><code>?q=courses:</code><code>'Xianglong%20Shiba%20Zhang'</code>

<code>GET ...</code><code>/_search</code><code>?q=courses:</code><code>'Xianglong'</code>

前兩個：表示在_all域搜尋；

後兩個：在指定的域上搜尋；

例子：

檢視指定類型的mapping示例：

<code>curl </code><code>'localhost:9200/students/_mapping/class1?pretty'</code>

什麼是Analysis？

分析和分析器

分析(analysis)是這樣一個過程：

首先，标記化一個文本塊為适用于反向索引單獨的詞(term)

然後标準化這些詞為标準形式，提高它們的“可搜尋性”或“查全率”

這個工作是分析器(analyzer)完成的。一個分析器(analyzer)隻是一個包裝用于将三個功能放到一個

包裡：

功能1：字元過濾器

首先字元串經過字元過濾器(character filter)，它們的工作是在标記化前處理字元串。字元過濾器

能夠去除HTML标記，或者轉換"&"為"and"。

功能2：分詞器

下一步，分詞器(tokenizer)被标記化成獨立的詞。一個簡單的分詞器(tokenizer)可以根據空格或逗

号将單詞分開（譯者注：這個在中文中不适用）。

功能3：标記過濾

最後，每個詞都通過所有标記過濾(token filters)，它可以修改詞（例如将"Quick"轉為小寫），去

掉詞（例如停用詞像"a"、"and"、"the"等等），或者增加詞（例如同義詞像"jump"和"leap"）

Elasticsearch提供很多開箱即用的字元過濾器，分詞器和标記過濾器。這些可以組合來建立自定義

的分析器以應對不同的需求。

内建的分析器

不過，Elasticsearch還附帶了一些預裝的分析器，你可以直接使用它們。下面我們列出了最重要的

幾個分析器，來示範這個字元串分詞後的表現差異：

"Set the shape to semi-transparent by calling set_trans(5)"

Elasticsearch内置的分析器：Standard analyzer;Simple analyzer;Whitespace analyzer;Language analyzer

标準分析器(Standard analyzer)

标準分析器是Elasticsearch預設使用的分析器。對于文本分析，它對于任何語言都是最佳選擇（譯

者注：就是沒啥特殊需求，對于任何一個國家的語言，這個分析器就夠用了）。它根據Unicode

Consortium的定義的單詞邊界(word boundaries)來切分文本，然後去掉大部分标點符号。最後，把

所有詞轉為小寫。産生的結果為：

set, the, shape, to, semi, transparent, by, calling, set_trans, 5

簡單分析器(Simple analyzer)

簡單分析器将非單個字母的文本切分，然後把每個詞轉為小寫。産生的結果為：

set, the, shape, to, semi, transparent, by, calling, set, trans

空格分析器(Whitespace analyze)

空格分析器依據空格切分文本。它不轉換小寫。産生結果為：

Set, the, shape, to, semi-transparent, by, calling, set_trans(5)

語言分析器(Language analyzer)

特定語言分析器适用于很多語言。它們能夠考慮到特定語言的特性。例如，english分析器自帶一套

英語停用詞庫——像and或the這些與語義無關的通用詞。這些詞被移除後，因為文法規則的存在，英

語單詞的主體含義依舊能被了解（譯者注：stem English words這句不知道該如何翻譯，查了字典，

我了解的大概意思應該是将英語語句比作一株植物，去掉無用的枝葉，主幹依舊存在，停用詞好比枝

葉，存在與否并不影響對這句話的了解。）。

english分析器将會産生以下結果：

set, shape, semi, transpar, call, set_tran, 5

注意"transparent"、"calling"和"set_trans"是如何轉為詞幹的。

分析器不僅在建立索引時用到；在建構查詢時也會用到.

ElasticSearch查詢文法(Query DSL)

Elasticsearch查詢分為兩類：

query 查詢：執行full-text查詢時，基于相關度來評判其比對結果；查詢執行過程複雜，且不會被緩存；

filter 查詢（過濾查詢）：執行exact查詢時，基于其結果為“yes”或“no”進行評判；速度快，且結果緩存；

查詢語句的結構：

全文查詢：

<code> </code><code>QUERY_NAME: {</code>

<code> </code><code>AGGUMENT: VALUE,</code>

<code> </code><code>ARGUMENT: VALUE,...</code>

某個字段上查詢：

<code> </code><code>FIELD_NAME: {</code>

<code> </code><code>ARGUMENT: VALUE,...</code>

filter 查詢：

term filter：精确比對包含指定term的文檔；{ "term": {"name": "Guo"} }

<code>[root@CentOS ~]</code><code>#curl -X GET 'localhost:9200/students/_search?pretty' -d '</code>

<code>> </code><code>"query"</code><code>:{</code>

結果：

terms filter：用于多值精确比對；{ "terms": { "name": ["Guo", "Rong"] }}

<code>[root@CentOS ~]</code><code># curl -X GET 'localhost:9200/students/_search?pretty' -d '</code>

<code> </code><code>"query"</code><code>:{</code>

<code> </code><code>"terms"</code><code>: {</code>

range filters：用于在指定的範圍内查找數值或時間；

<code>{ </code><code>"range"</code><code>: </code>

範圍比較字元：gt（大于）, lt（小于）, gte（大于等于）, lte（小于等于）

exists and missing filters：存在或者不存在；

<code>"exists"</code><code>: {</code>

boolean filter：

基于boolean的邏輯來合并多個filter子句；

must：其内部所有的子句條件必須同時比對，即and；

<code>"term"</code><code>: { </code><code>"gender"</code><code>: </code><code>"Female"</code> <code>}</code>

must_not：其所有子句必須不比對，即not

should：至少有一個子句比對，即or

<code>should: {</code>

QUERY 查詢：

match_all Query：用于比對所有文檔，沒有指定任何query，預設即為match_all query.

<code>{ </code><code>"match_all"</code><code>: {} }</code>

match Query：在幾乎任何域上執行full-text或exact-value查詢；

如果執行full-text查詢：首先對查詢時的語句做分析；

<code>{ </code><code>"match"</code><code>: {</code><code>"students"</code><code>: </code><code>"Guo"</code> <code>}}</code>

如果執行exact-value查詢：搜尋精确值；此時，建議使用過濾，而非查詢；

<code>{ </code><code>"match"</code><code>: {</code><code>"name"</code><code>: </code><code>"Guo"</code><code>} }</code>

multi_match Query：用于在多個域上執行相同的查詢；

<code><code>"font-family:'微軟雅黑', 'Microsoft YaHei';font-size:14px;"</code><code>>{ </code><code>"multi_match"</code><code>: </code><code>"query"</code><code>: full-text search </code><code>"field"</code><code>: {</code><code>'field1'</code><code>, </code><code>'field2'</code><code>} } { </code><code>"multi_match"</code><code>: </code><code>"query"</code><code>: { </code><code>"students"</code><code>: </code><code>"Guo"</code><code> } </code><code>"field"</code><code>: { </code><code>"name"</code><code>, </code><code>"description"</code><code> } }<</code><code>/span</code><code>> </code>

bool query：

基于boolean邏輯合并多個查詢語句；與bool filter不同的是，查詢子句不是傳回"yes"或"no"，而是其計算出的比對度分值。是以，boolean Query會為各子句合并其score；

關鍵詞：must： must_not： should：

合并filter和query查詢語句（組合查詢）：

一般來講可以filter語句嵌套到query語句用來過濾，但很少把query語句嵌套到filter語句上。

<code><code>"font-family:'微軟雅黑', 'Microsoft YaHei';font-size:14px;"</code><code>>{ </code><code>"filterd"</code><code>: { query: { </code><code>"match"</code><code>: {</code><code>"gender"</code><code>: </code><code>"Female"</code><code>} } filter: { </code><code>"term"</code><code>: {</code><code>"age"</code><code>: 25}} } }<</code><code>/span</code><code>> </code>

查詢語句文法檢查：

<code>GET </code><code>/INDEX/_validate/query</code><code>?pretty</code>

<code>curl -XGET </code><code>'localhost:9200/students/_validate/query?pretty'</code> <code>-d </code><code>'{"query": {"term": {"name": "fan"}}}'</code>

如果想擷取詳細資訊，可以使用以下格式：

<code>GET </code><code>/INDEX/_validate/query</code><code>?explain&pretty</code>

<code>curl -XGET </code><code>'localhost:9200/students/_validate/query?explain&pretty'</code> <code>-d </code><code>'{"query": {"term": {"name": "fan"}}}'</code>

本文轉自 SoulMio 51CTO部落格，原文連結：http://blog.51cto.com/bovin/1963980，如需轉載請自行聯系原作者

ELK 之 ElasticSearch

繼續閱讀

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的簡單使用

neo4j之cypher使用文檔

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

mybatis_入門程式Mybatis入門

vue-cli簡介（中文翻譯）

AOP程式設計_Android優雅權限架構(1)概念基礎，2021金三銀四前言正文大綱正文

Ajax發送和擷取json資料到Spring mvc 1.spring mvc後端2.web前段

Effective Java 8:通用程式設計

OOM三種類型

工廠模式-三種類型

【遞歸】高效率求2的n次幂

win10本地scala和spark安裝安裝scala安裝spark

scala (3) Function 和 Method

JSONObject包導入異常 java.lang.NoClassDefFoundErrorweb項目的導入包的問題