一 序
本文屬于極客時間Elasticsearch核心技術與實戰學習筆記系列。
2 Bucket & Metric Aggregation
- Metric 一些系列的統計方法
- Bucket 一組滿足條件的文檔
![](https://img.laitimes.com/img/9ZDMuAjOiMmIsIjOiQnIsICM38FdsYkRGZkRG9lcvx2bjxiNx8VZ6l2cs0TP31ENVhUY2pkMMBjVtJWd0ckW65UbM5WOHJWa5kHT20ESjBjUIF2X0hXZ0xCMx81dvRWYoNHLrdEZwZ1Rh5WNXp1bwNjW1ZUba9VZwlHdssmch1mclRXY39CXldWYtlWPzNXZj9mcw1ycz9WL49zZuBnL4AzN5IjN0ATM3IjNwAjMwIzLc52YucWbp5GZzNmLn9Gbi1yZtl2Lc9CX6MHc0RHaiojIsJye.png)
2.1 Aggregation 的文法
Aggregation 屬于 Search 的一部分。一般情況下,建議将其 Size 指定為 0
2.2 一個例子:工資統計
Elasticsearch核心技術與實戰學習筆記 45 | Bucket & Metric聚合分析及嵌套聚合
左側查詢:分别是查詢最大值、最小值、平均值。指定了函數與field.
右側傳回的結果:hits是20條結果,因為size=0是以文檔不會再搜尋結果展示出來。下面aggregations是傳回的3個聚合的結果。
2.3 Mertric Aggregation
單值分析:隻輸出一個分析結果
- min,max,avg,sum
- Cardinality(類似sql: distinct Count)
多值分析:輸出多個分析結果
- stats ,extended stats
- percentile, percentile rank
- top hits (排在前面的示例)
2.4 Metric 聚合的具體 Demo
定義索引
資料準備:
PUT /employees/_bulk
{ "index" : { "_id" : "1" } }
{ "name" : "Emma","age":32,"job":"Product Manager","gender":"female","salary":35000 }
{ "index" : { "_id" : "2" } }
{ "name" : "Underwood","age":41,"job":"Dev Manager","gender":"male","salary": 50000}
{ "index" : { "_id" : "3" } }
{ "name" : "Tran","age":25,"job":"Web Designer","gender":"male","salary":18000 }
{ "index" : { "_id" : "4" } }
{ "name" : "Rivera","age":26,"job":"Web Designer","gender":"female","salary": 22000}
{ "index" : { "_id" : "5" } }
{ "name" : "Rose","age":25,"job":"QA","gender":"female","salary":18000 }
{ "index" : { "_id" : "6" } }
{ "name" : "Lucy","age":31,"job":"QA","gender":"female","salary": 25000}
{ "index" : { "_id" : "7" } }
{ "name" : "Byrd","age":27,"job":"QA","gender":"male","salary":20000 }
{ "index" : { "_id" : "8" } }
{ "name" : "Foster","age":27,"job":"Java Programmer","gender":"male","salary": 20000}
{ "index" : { "_id" : "9" } }
{ "name" : "Gregory","age":32,"job":"Java Programmer","gender":"male","salary":22000 }
{ "index" : { "_id" : "10" } }
{ "name" : "Bryant","age":20,"job":"Java Programmer","gender":"male","salary": 9000}
{ "index" : { "_id" : "11" } }
{ "name" : "Jenny","age":36,"job":"Java Programmer","gender":"female","salary":38000 }
{ "index" : { "_id" : "12" } }
{ "name" : "Mcdonald","age":31,"job":"Java Programmer","gender":"male","salary": 32000}
{ "index" : { "_id" : "13" } }
{ "name" : "Jonthna","age":30,"job":"Java Programmer","gender":"female","salary":30000 }
{ "index" : { "_id" : "14" } }
{ "name" : "Marshall","age":32,"job":"Javascript Programmer","gender":"male","salary": 25000}
{ "index" : { "_id" : "15" } }
{ "name" : "King","age":33,"job":"Java Programmer","gender":"male","salary":28000 }
{ "index" : { "_id" : "16" } }
{ "name" : "Mccarthy","age":21,"job":"Javascript Programmer","gender":"male","salary": 16000}
{ "index" : { "_id" : "17" } }
{ "name" : "Goodwin","age":25,"job":"Javascript Programmer","gender":"male","salary": 16000}
{ "index" : { "_id" : "18" } }
{ "name" : "Catherine","age":29,"job":"Javascript Programmer","gender":"female","salary": 20000}
{ "index" : { "_id" : "19" } }
{ "name" : "Boone","age":30,"job":"DBA","gender":"male","salary": 30000}
{ "index" : { "_id" : "20" } }
{ "name" : "Kathy","age":29,"job":"DBA","gender":"female","salary": 20000}
查詢最低工資:
hits裡面total是總的資料量,aggregations傳回的是最低工資。
同樣,查找最高的工資:
上面是查詢單個值,如果要查詢多個值,
也可以使用一個聚合查詢,輸出多個值
3 Bucket
按照一定的規則,将文檔配置設定到不同的桶中,進而達到分類的目的。ES 提供的一些常見的 Bucket Aggregation
- Term
- 數字類型
- Range 、Date Range
- Histogram / Data Histogram
支援嵌套:也就在桶裡在做分桶
3.1Terms Aggregation
字段需要打開 fielddata,才能進行 Terms Aggregation
- Keyword 預設支援 doc_values
- Text 需要在 Mapping 中 enable ,會按照分詞後的結果進行分
Demo
- 對 job 和 job.keyword 進行聚合
- 對性别進行 Terms 聚合
- 指定 bucket size
3.2 demo
傳回的buckets裡面有對應的key及數量。
Text 字段進行 terms 聚合查詢,失敗
對 Text 字段打開 fielddata,支援terms aggregation,
你會發現查詢結果跟之前不一樣,因為Text 字段進行分詞後執行 terms 聚合查詢,而keyword是不會進行分詞的。
#指定 bucket 的 size
指定size,不同工種中,年紀最大的3個員工的具體資訊
先用:job.keyword做分桶。再定義子查詢:tophits方式,指定size=3,結果排序方式: age降序
3.3 優化 Terms 聚合的性能
适應條件:在聚合經常發生,性能高的,索引不斷寫入。
預加載cache被打開後,一旦有文檔寫入,term Aggregation 會被提前算好。
4 Range & Histogram聚合
- 按照數字的範圍,進行分桶
- 在 Range Aggregation 中,可以自定義 Key
- Demo:
- 按照工資的 Range 分桶
- 按照工資的間隔(Histogram)分桶
4.1 demo
針對salary進行分桶
上面可以看到,你可以指定key,不指定es也會自動生成預設key.
demo2:Salary Histogram,工資0到10萬,以 5000一個區間進行分桶
min,max指定區間,interval指定間隔。
5 Bucket + Metric Aggregation
Bucket 聚合分析允許通過添加子聚合分析進一步分析,子聚合分析可以是
- Bucket
- Metric
Demo
- 按照工作類型進行分桶,并統計工資資訊
- 先按照工作類型分桶,然後按性别分桶,并統計工資資訊
5.1 demo
嵌套聚合1,按照工作類型分桶,并統計工資資訊
多次嵌套。根據工作類型分桶,然後按照性别分桶,計算工資的統計資訊