今天在用druid做group by查询时,报错如下
{
"error":"Resourcelimitexceeded",
"errorMessage":"Notenoughdictionaryspacetoexecutethisquery.Tryincreasingdruid.query.groupBy
.maxMergingDictionarySizeorenablediskspillingbysettingdruid.query.groupBy.maxOnDiskStoragetoapositivenumber.",
"errorClass":"io.druid.query.ResourceLimitExceededException",
"host":"ubuntu:8083"
}
到网上查了下原因是查询结果在内存中装不下
解决办法是For this you need to increase the buffer sizes on all historical and realtime nodes and broker nodes.
去druid的每台机器上改配置,但是目下我不能改啊。
然后我就想怎么能减小durid的查询结果集,既然一天的数据查不出来,那我就用六个小时的,然后我发现打印结果如下
然后我就意识到并不是因为整个结果集太大,而是因为个别不规范的posid太大,于是我在filter中用正则表达式,只保留所有数字开头的posid,问题迎刃而解
"filter": { "type":"regex", "dimension":"posid", "pattern":"^[0-9]*$" }
下面我粘贴一个完整的请求格式
curl -X POST \
http://ip:port/druid/v2/ \
-H 'cache-control: no-cache' \
-H 'content-type: application/json' \
-d '{
"aggregations":[
{
"fieldName":"request_num",
"name":"request_num",
"type":"longSum"
},
{
"fieldName":"response_num",
"name":"response_num",
"type":"longSum"
}
],
"context":{
},
"dataSource":"table_name",
"dimensions":[
"posid"
],
"filter": {
"type":"regex",
"dimension":"posid",
"pattern":"^[0-9]*$"
},
"granularity": "month",
"intervals":"2018-05-01T00:00:00.000Z/2018-06-01T00:00:00.000Z",
"queryType":"groupBy"
}'| python -m json.tool