ES-聚合查询
聚合查询
aggregations—聚合,可以简写为aggs,提供了一种基于查询条件来对数据进行分桶、计算的方法。
类似于 SQL 中的 group by 再加一些函数方法的操作。聚合可以嵌套,由此可以组成复杂的操作。
聚合查询:大致可以分为三类:
- Bucketing Aggregations(分桶聚合): 聚合对象是文档,将满足条件的文档分到一个桶(组)里,这样就达到分桶的目的。
// 根据年龄进行分组,小于40, 40-60, 大于60。
{
"aggs": {
"age_group": {
"range": {
"field": "age",
"ranges": [
{
"to": 40
},
{
"from": 40,
"to": 60
},
{
"from": 60
}
]
}
}
}
}
- Metric Aggregations(指标聚合): 在桶分聚合的基础上进行指标聚合,比如求桶里面的最大值,平均值等。
// 求分组内的平均值
{
"aggs": {
"age_group": {
"range": {
"field": "age",
"ranges": [
{
"to": 40
},
{
"from": 40,
"to": 60
},
{
"from": 60
}
]
},
"aggs": {
"avg_group": {
"avg": {
"field": "age"
}
}
}
}
}
}
-
Pipeline Aggregations(管道聚合): 聚合对象是桶,对其它聚合操作的输出及其关联指标进行聚合,相当于对其他聚合结果的聚合计算操作。管道聚合大致可以分为两类:parent和sibling
parent: 此类聚合的"输入"是其它聚合(父聚合)的"输出",并对其进行进一步处理。一般不生成新的桶,而是对父聚合桶信息的增强,新聚合出来的结果和父聚合出来的结果在一个层级显示。比如:我们先桶分聚和-根据月份统计每个月的销售总额,然后根据每个月的销售总额(parent输入),为这个月的销售情况打分(parent输出),新聚合出来的结果会和每个月的销售总额在一个层级。
sibling: 此类聚合的"输入"是其它聚合(兄弟聚合)的"输出",并对这些"输入"重新聚合出新的结果,比如:我们先桶分聚和-根据月份统计每个月的销售总额,然后根据每个月的销售总额(sibling输入),重新聚合出最大值,这样就获得了一年中的月最大销售额(sibling输出),新聚合出来的结果会在聚合的最外的层级显示。
管道聚合的参数: buckets_path:指定"输入"的路径 gap_policy:空桶处理策略(skip/insert_zeros) format:该聚合的输出格式定义
// 这个查询的意义:
// 桶分聚和:按照小于40岁,40岁-60岁,大于60岁,这三个范围进行文档分桶,
// metric聚合:在桶分聚和的基础上,分别计算这三个年龄阶段内的年龄的最大值,最小值,平均值,总和等。
// pipeline聚合:这里用的是桶均值聚合(avg_bucket)——基于兄弟聚合的某个指标,求所有桶的指标均值,
{"aggs": {
"age_group": { // 桶分聚合
"range": {
"field": "age",
"ranges": [
{
"to": 40
},
{
"from": 40,
"to": 60
},
{
"from": 60
}
]
},
"aggs": { // metric聚合 metric聚合是基于桶分聚合所以它们在同一个大"aggs"标签下
"avg_stats": {
"stats": {
"field": "age"
}
}
}
},
"avg_group_avg": { // pipeline聚合-sibliing聚合
"avg_bucket": {
"buckets_path": "age_group>avg_stats.avg" // 我们这里的"输入"是每个桶的年龄的平均值,
// "输入"的路径确认就像我们的文件系统路径一样,根据"输出"的位置确认"输入"的路径。
// buckets_path的语法:agg_name[> agg_name]*[. metrics]
}
}
}
}
------------------------------------------查询结果------------------------------------------------
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 9,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "person",
"_type": "person",
"_id": "H5r-GXgBLLjdyTtc74bR",
"_score": 1.0,
"_source": {
"name": "sunlutong",
"age": 27,
"country": "中国"
}
},
{
"_index": "person",
"_type": "person",
"_id": "IJr_GXgBLLjdyTtc7Ibf",
"_score": 1.0,
"_source": {
"name": "马云",
"age": 57,
"country": "中国"
}
},
{
"_index": "person",
"_type": "person",
"_id": "IZoAGngBLLjdyTtcJIbP",
"_score": 1.0,
"_source": {
"name": "马华腾",
"age": 51,
"country": "中国"
}
},
{
"_index": "person",
"_type": "person",
"_id": "IpoAGngBLLjdyTtcZoZ7",
"_score": 1.0,
"_source": {
"name": "比尔盖茨",
"age": 68,
"country": "美国"
}
},
{
"_index": "person",
"_type": "person",
"_id": "I5oeJHgBLLjdyTtcOoZl",
"_score": 1.0,
"_source": {
"name": "Rain",
"age": 38,
"country": "韩国"
}
},
{
"_index": "person",
"_type": "person",
"_id": "JJpNJHgBLLjdyTtc34ag",
"_score": 1.0,
"_source": {
"name": "马龙",
"age": 38,
"country": "中华人民共和国"
}
},
{
"_index": "person",
"_type": "person",
"_id": "JZp5JHgBLLjdyTtcEobD",
"_score": 1.0,
"_source": {
"name": "许家印",
"age": 38
}
},
{
"_index": "person",
"_type": "person",
"_id": "JprsM3gBLLjdyTtcJ4bV",
"_score": 1.0,
"_source": {
"name": "李嘉诚",
"country": "中国香港"
}
},
{
"_index": "person",
"_type": "person",
"_id": "J5q3NHgBLLjdyTtcaoYR",
"_score": 1.0,
"_source": {
"name": "巴菲特",
"country": "美国",
"age": 70
}
}
]
},
"aggregations": {
"age_group": {
"buckets": [
{
"key": "*-40.0",
"to": 40.0,
"doc_count": 4,
"avg_stats": {
"count": 4,
"min": 27.0,
"max": 38.0,
"avg": 35.25,
"sum": 141.0
}
},
{
"key": "40.0-60.0",
"from": 40.0,
"to": 60.0,
"doc_count": 2,
"avg_stats": {
"count": 2,
"min": 51.0,
"max": 57.0,
"avg": 54.0,
"sum": 108.0
}
},
{
"key": "60.0-*",
"from": 60.0,
"doc_count": 2,
"avg_stats": {
"count": 2,
"min": 68.0,
"max": 70.0,
"avg": 69.0,
"sum": 138.0
}
}
]
},
"avg_group_avg": {
"value": 52.75
}
}
}
常见的聚合查询
下图来源