- 概念
bucket:按照某個字段進行bucket劃分,那個字段的值相同的那些資料,就會被劃分到一個bucket中;
metric:對一個bucket執行的某種聚合分析的操作,比如說求平均值,求最大值,求最小值
對這兩個與sql語句進行類比:
select count(*) from access_log group by user_id
bucket:group by user_id --> 那些user_id相同的資料,就會被劃分到一個bucket中
metric:count(*),對每個user_id bucket中所有的資料,計算一個數量
- 聚合資料分析一:
PUT /tvs
{
"mappings": {
"sales": {
"properties": {
"price": {
"type": "long"
},
"color": {
"type": "keyword"
},
"brand": {
"type": "keyword"
},
"sold_date": {
"type": "date"
}
}
}
}
}
POST /tvs/sales/_bulk
{ "index": {}}
{ "price" : 1000, "color" : "紅色", "brand" : "長虹", "sold_date" : "2016-10-28" }
{ "index": {}}
{ "price" : 2000, "color" : "紅色", "brand" : "長虹", "sold_date" : "2016-11-05" }
{ "index": {}}
{ "price" : 3000, "color" : "綠色", "brand" : "小米", "sold_date" : "2016-05-18" }
{ "index": {}}
{ "price" : 1500, "color" : "藍色", "brand" : "TCL", "sold_date" : "2016-07-02" }
{ "index": {}}
{ "price" : 1200, "color" : "綠色", "brand" : "TCL", "sold_date" : "2016-08-19" }
{ "index": {}}
{ "price" : 2000, "color" : "紅色", "brand" : "長虹", "sold_date" : "2016-11-05" }
{ "index": {}}
{ "price" : 8000, "color" : "紅色", "brand" : "三星", "sold_date" : "2017-01-01" }
{ "index": {}}
{ "price" : 2500, "color" : "藍色", "brand" : "小米", "sold_date" : "2017-02-12" }
2、統計哪種顔色的電視銷量最高
GET /tvs/sales/_search
{
"size" : 0,
"aggs" : {
"popular_colors" : {
"terms" : {
"field" : "color"
}
}
}
}
size:隻擷取聚合結果,而不要執行聚合的原始資料
aggs:固定文法,要對一份資料執行分組聚合操作
popular_colors:就是對每個aggs,都要起一個名字,這個名字是随機的
terms:根據字段的值進行分組
field:根據指定的字段的值進行分組
傳回結果:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 8,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"popular_color" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "紅色",
"doc_count" : 4
},
{
"key" : "綠色",
"doc_count" : 2
},
{
"key" : "藍色",
"doc_count" : 2
}
]
}
}
}
預設的排序規則:按照doc_count降序排序
3、統計每種顔色電視的平均價格
GET /tvs/sales/_search
{
"size": 0,
"aggs": {
"popular_color": {
"terms": {
"field": "color",
"size": 10
},
"aggs": {
"ave_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
3、多層次下鑽分析
下鑽分析:已經分了一個組了,比如說顔色的分組,然後還要繼續對這個分組内的資料,再分組,比如一個顔色内,還可以分成多個不同的品牌的組,最後對每個最小粒度的分組執行聚合分析操作,這就叫做下鑽分析
例子:從顔色到品牌進行下鑽分析,每種顔色的平均價格,以及找到每種顔色每個品牌的平均價格
GET /tvs/sales/_search
{
"size": 0,
"aggs": {
"popular_color": {
"terms": {
"field": "color",
"size": 10
},
"aggs": {
"ave_price": {
"avg": {
"field": "price"
}
},
"group_by_brand": {
"terms": {
"field": "brand",
"size": 10
},
"aggs":{
"brand_avg_price":{
"avg": {
"field": "price"
}
}
}
}
}
}
}
}
4、統計每種顔色電視的最大最小價格
GET /tvs/sales/_search
{
"size": 0,
"aggs": {
"group_by_color": {
"terms": {
"field": "color",
"size": 10
},
"aggs": {
"max_price": {
"max": {
"field": "price"
}
},
"min_price":{
"min": {
"field": "price"
}
},
"sum_price":{
"sum": {
"field": "price"
}
}
}
}
}
}
4、使用histogram按價格區間統計電視銷量和銷售額
histogram:類似于terms,也是進行bucket分組操作,接收一個field,按照這個field的值的各個範圍區間,進行bucket分組操作
"histogram":{
"field": "price",
"interval": 2000
},
interval:2000,劃分範圍,0~2000,2000~4000,4000~6000,6000~8000,8000~10000,buckets
GET /tvs/sales/_search
{
"size": 0,
"aggs": {
"price": {
"histogram": {
"field": "price",
"interval": 2000
},
"aggs": {
"revenue": {
"sum": {
"field": "price"
}
}
}
}
}
}
5、使用date_histogram統計每月電視的銷量
按照我們指定的某個date類型的日期field,以及日期interval,按照一定的日期間隔,去劃分bucket;
GET /tvs/sales/_search
{
"size": 0,
"aggs": {
"group_by_soldDate": {
"date_histogram": {
"field": "sold_date",
"interval": "month",
"format": "yyyy-MM-dd",
"min_doc_count": 0,
"extended_bounds": {
"min": "2017-01-01",
"max": "2017-12-31"
}
}
}
}
}
min_doc_count:即使某個日期interval,2017-01-01~2017-01-31中,一條資料都沒有,那麼這個區間也是要傳回的,不然預設是會過濾掉這個區間的
extended_bounds,min,max:劃分bucket的時候,會限定在這個起始日期,和截止日期内
6、統計每季度每個品牌的銷售額
GET /tvs/sales/_search
{
"size": 0,
"aggs": {
"group_by_quarter": {
"date_histogram": {
"field": "sold_date",
"interval": "quarter",
"format": "yyyy-MM-dd",
"min_doc_count": 0,
"extended_bounds": {
"min": "2016-01-01",
"max": "2017-12-31"
}
},
"aggs":{
"group_by_brand":{
"terms": {
"field": "brand"
},
"aggs": {
"per_brand_price": {
"sum": {
"field": "price"
}
}
}
},
"total_sum_quarter":{
"sum": {
"field": "price"
}
}
}
}
}
}
7、統計指定品牌下每種顔色的銷量
GET /tvs/sales/_search
{
"size": 0,
"query": {
"term": {
"brand": {
"value": "小米"
}
}
},
"aggs": {
"group_by_color": {
"terms": {
"field": "color"
}
}
}
}
7、_global bucket:單個品牌與所有品牌銷量對比
GET /tvs/sales/_search
{
"size": 0,
"query": {
"term": {
"brand": {
"value": "長虹"
}
}
},
"aggs": {
"changhong_avg_price": {
"avg": {
"field": "price"
}
},
"all":{
"global": {},
"aggs": {
"all_brand_ave_price":{
"avg": {
"field": "price"
}
}
}
}
}
}
8、過濾+聚合,統計價格大于1200的平均價格
GET /tvs/sales/_search
{
"size": 0,
"query": {
"constant_score": {
"filter": {
"range": {
"price": {
"gte": 1200
}
}
},
"boost": 1
}
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
9、統計品牌最近一個月的平均價格
GET /tvs/sales/_search
{
"size": 0,
"query": {
"term": {
"brand": {
"value": "長虹"
}
}
},
"aggs": {
"recent_150d": {
"filter": {
"range": {
"sold_date": {
"gte": "now-3000d"
}
}
},
"aggs": {
"recent_3000d_ave_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
10、統計出來每個顔色的電視的銷售額,需要按照銷售額降序排序
GET /tvs/sales/_search
{
"size": 0,
"aggs": {
"group_by_color": {
"terms": {
"field": "color",
"order": {
"avg_price": "desc"
}
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
11、按下鑽最深層次的metric排序
GET /tvs/sales/_search
{
"size": 0,
"aggs": {
"group_by_color": {
"terms": {
"field": "color"
},
"aggs": {
"group_by_brand": {
"terms": {
"field": "brand",
"order": {
"avg_price": "desc"
}
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
}
}