天天看點

elastic search中的bucket和metric

  • 概念

bucket:按照某個字段進行bucket劃分,那個字段的值相同的那些資料,就會被劃分到一個bucket中;

metric:對一個bucket執行的某種聚合分析的操作,比如說求平均值,求最大值,求最小值

對這兩個與sql語句進行類比:

select count(*) from access_log group by user_id

bucket:group by user_id --> 那些user_id相同的資料,就會被劃分到一個bucket中

metric:count(*),對每個user_id bucket中所有的資料,計算一個數量

  • 聚合資料分析一:
PUT /tvs
{
	"mappings": {
		"sales": {
			"properties": {
				"price": {
					"type": "long"
				},
				"color": {
					"type": "keyword"
				},
				"brand": {
					"type": "keyword"
				},
				"sold_date": {
					"type": "date"
				}
			}
		}
	}
}

POST /tvs/sales/_bulk
{ "index": {}}
{ "price" : 1000, "color" : "紅色", "brand" : "長虹", "sold_date" : "2016-10-28" }
{ "index": {}}
{ "price" : 2000, "color" : "紅色", "brand" : "長虹", "sold_date" : "2016-11-05" }
{ "index": {}}
{ "price" : 3000, "color" : "綠色", "brand" : "小米", "sold_date" : "2016-05-18" }
{ "index": {}}
{ "price" : 1500, "color" : "藍色", "brand" : "TCL", "sold_date" : "2016-07-02" }
{ "index": {}}
{ "price" : 1200, "color" : "綠色", "brand" : "TCL", "sold_date" : "2016-08-19" }
{ "index": {}}
{ "price" : 2000, "color" : "紅色", "brand" : "長虹", "sold_date" : "2016-11-05" }
{ "index": {}}
{ "price" : 8000, "color" : "紅色", "brand" : "三星", "sold_date" : "2017-01-01" }
{ "index": {}}
{ "price" : 2500, "color" : "藍色", "brand" : "小米", "sold_date" : "2017-02-12" }
           

2、統計哪種顔色的電視銷量最高

GET /tvs/sales/_search
{
    "size" : 0,
    "aggs" : { 
        "popular_colors" : { 
            "terms" : { 
              "field" : "color"
            }
        }
    }
}
           

size:隻擷取聚合結果,而不要執行聚合的原始資料

aggs:固定文法,要對一份資料執行分組聚合操作

popular_colors:就是對每個aggs,都要起一個名字,這個名字是随機的

terms:根據字段的值進行分組

field:根據指定的字段的值進行分組

傳回結果:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 8,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "popular_color" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "紅色",
          "doc_count" : 4
        },
        {
          "key" : "綠色",
          "doc_count" : 2
        },
        {
          "key" : "藍色",
          "doc_count" : 2
        }
      ]
    }
  }
}
           

預設的排序規則:按照doc_count降序排序

3、統計每種顔色電視的平均價格

GET /tvs/sales/_search
{
  "size": 0,
  "aggs": {
    "popular_color": {
      "terms": {
        "field": "color",
        "size": 10
      },
      "aggs": {
        "ave_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}
           

3、多層次下鑽分析

下鑽分析:已經分了一個組了,比如說顔色的分組,然後還要繼續對這個分組内的資料,再分組,比如一個顔色内,還可以分成多個不同的品牌的組,最後對每個最小粒度的分組執行聚合分析操作,這就叫做下鑽分析

例子:從顔色到品牌進行下鑽分析,每種顔色的平均價格,以及找到每種顔色每個品牌的平均價格

GET /tvs/sales/_search
{
  "size": 0,
  "aggs": {
    "popular_color": {
      "terms": {
        "field": "color",
        "size": 10
      },
      "aggs": {
        "ave_price": {
          "avg": {
            "field": "price"
          }
        },
        "group_by_brand": {
          "terms": {
            "field": "brand",
            "size": 10
          },
          "aggs":{
            "brand_avg_price":{
              "avg": {
                "field": "price"
              }
            }
        }
        }
      }
    }
  }
}
           

4、統計每種顔色電視的最大最小價格

GET /tvs/sales/_search
{
  "size": 0,
  "aggs": {
    "group_by_color": {
      "terms": {
        "field": "color",
        "size": 10
      },
      "aggs": {
        "max_price": {
          "max": {
            "field": "price"
          }
        },
        "min_price":{
          "min": {
            "field": "price"
          }
        },
        "sum_price":{
          "sum": {
            "field": "price"
          }
        }
      }
    }
  }
}
           

4、使用histogram按價格區間統計電視銷量和銷售額

histogram:類似于terms,也是進行bucket分組操作,接收一個field,按照這個field的值的各個範圍區間,進行bucket分組操作

"histogram":{ 

  "field": "price",

  "interval": 2000

},

interval:2000,劃分範圍,0~2000,2000~4000,4000~6000,6000~8000,8000~10000,buckets

GET /tvs/sales/_search
{
  "size": 0,
  "aggs": {
    "price": {
      "histogram": {
        "field": "price",
        "interval": 2000
      },
      "aggs": {
        "revenue": {
          "sum": {
            "field": "price"
          }
        }
      }
    }
  }
}
           

5、使用date_histogram統計每月電視的銷量

按照我們指定的某個date類型的日期field,以及日期interval,按照一定的日期間隔,去劃分bucket;

GET /tvs/sales/_search
{
  "size": 0,
  "aggs": {
    "group_by_soldDate": {
      "date_histogram": {
        "field": "sold_date",
        "interval": "month",
        "format": "yyyy-MM-dd",
        "min_doc_count": 0,
        "extended_bounds": {
          "min": "2017-01-01",
          "max": "2017-12-31"
        }
      }
    }
  }
}
           

min_doc_count:即使某個日期interval,2017-01-01~2017-01-31中,一條資料都沒有,那麼這個區間也是要傳回的,不然預設是會過濾掉這個區間的

extended_bounds,min,max:劃分bucket的時候,會限定在這個起始日期,和截止日期内

6、統計每季度每個品牌的銷售額

GET /tvs/sales/_search
{
  "size": 0,
  "aggs": {
    "group_by_quarter": {
      "date_histogram": {
        "field": "sold_date",
        "interval": "quarter",
        "format": "yyyy-MM-dd", 
        "min_doc_count": 0,
        "extended_bounds": {
          "min": "2016-01-01",
          "max": "2017-12-31"
        }
      },
      "aggs":{
      "group_by_brand":{
        "terms": {
          "field": "brand"
        },
        "aggs": {
          "per_brand_price": {
            "sum": {
              "field": "price"
            }
          }
        }
      },
      "total_sum_quarter":{
        "sum": {
          "field": "price"
        }
      }
    }
    }
  }
}
           

7、統計指定品牌下每種顔色的銷量

GET /tvs/sales/_search
{
  "size": 0, 
  "query": {
    "term": {
      "brand": {
        "value": "小米"
      }
    }
  },
  "aggs": {
    "group_by_color": {
      "terms": {
        "field": "color"
      }
    }
  }
}
           

7、_global bucket:單個品牌與所有品牌銷量對比

GET /tvs/sales/_search
{
  "size": 0,
  "query": {
    "term": {
      "brand": {
        "value": "長虹"
      }
    }
  },
  "aggs": {
    "changhong_avg_price": {
      "avg": {
        "field": "price"
      }
    },
    "all":{
      "global": {},
      "aggs": {
        "all_brand_ave_price":{
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}
           

8、過濾+聚合,統計價格大于1200的平均價格

GET /tvs/sales/_search
{
  "size": 0,
  "query": {
    "constant_score": {
      "filter": {
        "range": {
          "price": {
            "gte": 1200
          }
        }
      },
      "boost": 1
    }
  },
  "aggs": {
    "avg_price": {
      "avg": {
        "field": "price"
      }
    }
  }
}
           

9、統計品牌最近一個月的平均價格

GET /tvs/sales/_search
{
  "size": 0, 
  "query": {
    "term": {
      "brand": {
        "value": "長虹"
      }
    }
  },
  "aggs": {
    "recent_150d": {
      "filter": {
        "range": {
          "sold_date": {
            "gte": "now-3000d"
          }
        }
      },
      "aggs": {
        "recent_3000d_ave_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}
           

10、統計出來每個顔色的電視的銷售額,需要按照銷售額降序排序

GET /tvs/sales/_search
{
  "size": 0, 
  "aggs": {
    "group_by_color": {
      "terms": {
        "field": "color",
        "order": {
          "avg_price": "desc"
        }
      },
      "aggs": {
        "avg_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
}
}
           

11、按下鑽最深層次的metric排序

GET /tvs/sales/_search 
{
  "size": 0,
  "aggs": {
    "group_by_color": {
      "terms": {
        "field": "color"
      },
      "aggs": {
        "group_by_brand": {
          "terms": {
            "field": "brand",
            "order": {
              "avg_price": "desc"
            }
          },
          "aggs": {
            "avg_price": {
              "avg": {
                "field": "price"
              }
            }
          }
        }
      }
    }
  }
}
           

繼續閱讀