連結

1、Elasticsearch聚合最直覺展示

差別于反向索引的key value的全文檢索，聚合兩個示例如下：

如下圖，是基于某特定分類的聚合統計結果。

如下圖：是基于月份的聚合統計結果。

2、Elasticsearch聚合定義

聚合有助于基于搜尋查詢提供聚合資料。它基于稱為聚合的簡單建構塊，可以組合以建構複雜的資料。

基本文法結構如下：

"aggregations" : {

"<aggregation_name>" : {

"<aggregation_type>" : {

<aggregation_body>

}

[,"meta" : { [<meta_data_body>] } ]?

[,"aggregations" : { [<sub_aggregation>]+ } ]?

}

[,"<aggregation_name_2>" : { ... } ]*

}

3、Elasticsearch聚合分類

3.1 分類1：Metric聚合

基于一組文檔進行聚合。所有的文檔在一個檢索集合裡，文檔被分成邏輯的分組。

類比Mysql中的： MIN(), MAX(), STDDEV(), SUM() 操作。

單值Metric

SELECT AVG(price) FROM products

多值Metric

| |

v v

SELECT MIN(price), MAX(price) FROM products

Metric聚合的DSL類比實作：

{

"aggs":{

"avg_price":{

"avg":{

"field":"price"

}

Metric聚合操作對比:

Aggregation Elasticsearch MySQL

Avg Yes Yes

Cardinality——去重唯一值 Yes (Sample based) Yes (Exact)——類似：distinct

Extended Stats Yes StdDev bounds missing

Geo Bounds Yes for future blog post

Geo Centroid Yes for future blog post

Max Yes Yes

Percentiles Yes Complex SQL or UDF

Percentile Ranks Yes Complex SQL or UDF

Scripted Yes No

Stats Yes Yes

Top Hits——很重要，易被忽視 Yes Complex

Value Count Yes Yes

其中，Top hits子聚合用于傳回分組中Top X比對結果集，且支援通過source過濾標明字段值。

分類2：Bucketing聚合

基于檢索構成了邏輯文檔組，滿足特定規則的文檔放置到一個桶裡，每一個桶關聯一個key。

類比Mysql中的group by操作，

Mysql使用舉例：

基于size 分桶 ...、

SELECT size COUNT(*) FROM products GROUP BY size

+----------------------+

| size | COUNT(*) |

+----------------------+

| S | 123 | <--- set of rows with size = S

| M | 456 |

| ... | ... |

bucket聚合的DSL類比實作：

"query": {

"match": {

"title": "Beach"

"aggs": {

"by_size": {

"terms": {

"field": "size"

}

"by_material": {

"field": "material"

}

Bucketing聚合對比

Childen——父子文檔 Yes for future blog post

Date Histogram——基于時間分桶 Yes Complex

Date Range Yes Complex

Filter Yes n/a (yes)

Filters Yes n/a (yes)

Geo Distance Yes for future blog post

GeoHash grid Yes for future blog post

Global Yes n/a (yes)

Histogram Yes Complex

IPv4 Range Yes Complex

Missing Yes Yes

Nested Yes for future blog post

Range Yes Complex

Reverse Nested Yes for future blog post

Sampler Yes Complex

Significant Terms Yes No

Terms——最常用 Yes Yes

分類3：Pipeline聚合

對聚合的結果而不是原始資料集進行操作。

想象一下，你有一個日間交易的網上商店，想要了解所有産品的按照庫存日期分組的平均價格。

在SQL中你可以寫：

SELECT in_stock_since, AVG(price) FROM products GROUP BY in_stock_since。

ES使用舉例：

以下Demo實作更複雜，按月統計銷售額，并統計出月銷售額>200的資訊。

下一節詳細給出DSL，不再重複。

分類4：Matrix聚合

ES6.4官網釋義：此功能是實驗性的，可在将來的版本中完全更改或删除。

3、Elasticsearch聚合完整舉例

3.1 步驟1：動态Mapping，導入完整資料

POST _bulk

{"index":{"_index":"cars","_type":"doc","_id":"1"}}

{"name":"bmw","date":"2017-06-01", "color":"red", "price":30000}

{"index":{"_index":"cars","_type":"doc","_id":"2"}}

{"name":"bmw","date":"2017-06-30", "color":"blue", "price":50000}

{"index":{"_index":"cars","_type":"doc","_id":"3"}}

{"name":"bmw","date":"2017-08-11", "color":"red", "price":90000}

{"index":{"_index":"cars","_type":"doc","_id":"4"}}

{"name":"ford","date":"2017-07-15", "color":"red", "price":20000}

{"index":{"_index":"cars","_type":"doc","_id":"5"}}

{"name":"ford","date":"2017-07-01", "color":"blue", "price":40000}

{"index":{"_index":"cars","_type":"doc","_id":"6"}}

{"name":"bmw","date":"2017-08-01", "color":"green", "price":10000}

{"index":{"_index":"cars","_type":"doc","_id":"7"}}

{"name":"jeep","date":"2017-07-08", "color":"red", "price":110000}

{"index":{"_index":"cars","_type":"doc","_id":"8"}}

{"name":"jeep","date":"2017-08-25", "color":"red", "price":230000}

3.2 步驟2：确認Mapping

GET cars/_mapping

3.3 步驟3：Matric聚合實作

求車的平均價錢。

POST cars/_search

"size": 0,

"avg_grade": {

"avg": {

"field": "price"

3.4 步驟4：bucket聚合與子聚合實作

按照車品牌分組，組間按照車顔色再二次分組。

"name_aggs": {

"field": "name.keyword"

"aggs": {

"color_aggs": {

"terms": {

"field": "color.keyword"

}

3.5 步驟5：Pipeline聚合實作

按月統計銷售額，并統計出總銷售額大于200000的月份資訊。

POST /cars/_search

"sales_per_month": {

"date_histogram": {

"field": "date",

"interval": "month"

"total_sales": {

"sum": {

"field": "price"

"sales_bucket_filter": {

"bucket_selector": {

"buckets_path": {

"totalSales": "total_sales"

"script": "params.totalSales > 200000"

4、Elasticsearch聚合使用指南

認知前提：知道Elasticsearch聚合遠比Mysql中種類要多，可實作的功能點要多。

遇到聚合問題，基于4個分類，查詢對應的官網API資訊。

以最常見場景為例：

确定是否是分組group by 操作，如果是，使用bucket聚合中的terms聚合實作；

确定是否是按照時間分組操作，如果是，使用bucket聚合中date_histogram的聚合實作;

确定是否是分組，組間再分組操作，如果是，使用bucket聚合中terms聚合内部再terms或者内部top_hits子聚合實作;

确定是否是求最大值、最小值、平均值等，如果是,使用Metric聚合對應的Max, Min,AVG等聚合實作；

确定是否是基于聚合的結果條件進行判定後取結果，如果是，使用pipline聚合結合其他聚合綜合實作；

多嘗試，多在kibana的 dev tool部分多驗證。

參考：

1、

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html

2、

http://blog.ulf-wendel.de/2016/aggregation-features-elasticsearch-vs-mysql-vs-mongodb/

3、

https://elasticsearch.cn/article/629

幹貨 | 通透了解Elasticsearch聚合1、Elasticsearch聚合最直覺展示

1、Elasticsearch聚合最直覺展示

繼續閱讀

SQL優化SQL語句優化的目的

資料遷移方法資料遷移原則資料遷移之雙寫方案資料遷移之級聯同步方案

redis叢集資料一緻性_RedisRaft為Redis叢集帶來強大的資料一緻性

JAVA高效程式設計指南

寶塔面闆mysql恢複2018.1.8更新

Centos7 MySQL 5.7 安裝MySQL 5.7 安裝

查找入職員工時間排名倒數第三的員工所有資訊

Hibernate使用Hibernate的“3個準備，7個步驟”Hibernate API簡介操作實體對象對象識别

雲計算面試題——mysql/存儲引擎/備份

關于SQL語言

SQL語言基礎：常用的資料查詢語句

Ubuntu16.04安裝Apache+MySQL+PHP1. 安裝Apache2. 安裝MySQL3. 安裝PHP4. 安裝phpMyAdmin

MySQL的4種隔離級别？出現問題

neo4j之cypher使用文檔

mysql使用source指令導入.sql檔案

sqlServer根據經緯查距離