今天在用druid做group by查詢時,報錯如下
{
"error":"Resourcelimitexceeded",
"errorMessage":"Notenoughdictionaryspacetoexecutethisquery.Tryincreasingdruid.query.groupBy
.maxMergingDictionarySizeorenablediskspillingbysettingdruid.query.groupBy.maxOnDiskStoragetoapositivenumber.",
"errorClass":"io.druid.query.ResourceLimitExceededException",
"host":"ubuntu:8083"
}
到網上查了下原因是查詢結果在記憶體中裝不下
解決辦法是For this you need to increase the buffer sizes on all historical and realtime nodes and broker nodes.
去druid的每台機器上改配置,但是目下我不能改啊。
然後我就想怎麼能減小durid的查詢結果集,既然一天的資料查不出來,那我就用六個小時的,然後我發現列印結果如下
然後我就意識到并不是因為整個結果集太大,而是因為個别不規範的posid太大,于是我在filter中用正規表達式,隻保留所有數字開頭的posid,問題迎刃而解
"filter": { "type":"regex", "dimension":"posid", "pattern":"^[0-9]*$" }
下面我粘貼一個完整的請求格式
curl -X POST \
http://ip:port/druid/v2/ \
-H 'cache-control: no-cache' \
-H 'content-type: application/json' \
-d '{
"aggregations":[
{
"fieldName":"request_num",
"name":"request_num",
"type":"longSum"
},
{
"fieldName":"response_num",
"name":"response_num",
"type":"longSum"
}
],
"context":{
},
"dataSource":"table_name",
"dimensions":[
"posid"
],
"filter": {
"type":"regex",
"dimension":"posid",
"pattern":"^[0-9]*$"
},
"granularity": "month",
"intervals":"2018-05-01T00:00:00.000Z/2018-06-01T00:00:00.000Z",
"queryType":"groupBy"
}'| python -m json.tool