天天看點

Elasticsearch 聚合損失精度的問題解決思路以docker容器的方式快速啟動es執行個體示範結果分析

因為java 浮點類型(double/float)類型在做運算時會存在丢失精度的問題。

es是使用java開發實作,是以同樣的問題在es也存在。現在以示例的方式展現在es中如何規避這個問題。

es版本: 6.5.4

以docker容器的方式快速啟動es

docker run --name es6 --net host -e "discovery.type=single-node" docker.io/elasticsearch:6.5.4
           

執行個體示範

建立索引:

curl -X PUT http://127.0.0.1:9200/index01
           

删除索引:

curl -XDELETE http://127.0.0.1:9200/index01
           

建立mapping

curl -XPOST 'http://127.0.0.1:9200/index01/type01/_mapping?pretty' -H "Content-Type: application/json"  \
-d '
{
    "type01": {
        "properties": {
            "tm": {
                "type": "date",
                "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
            },
            "name": {
                "type": "keyword"
            },
            "address": {
                "type": "text"
            },
            "price1": {
                "type": "double"
            },
            "price2": {
                "type": "scaled_float",
                "scaling_factor": 100
            }
        }
    }
}'
           

檢視mapping

curl -XGET http://127.0.0.1:9200/app_dataheart_factoring_business_waybill?pretty
           

插入資料:

curl -XPOST http://127.0.0.1:9200/index01/type01/01?pretty -H "Content-Type: application/json" \
    -d '{
        "name":"zhangsan",
        "price1":1.0,
        "price2":1.0,
        "tm":"2022-01-01",
        "address":"beijing daxing"
        }'
        
curl -XPOST http://127.0.0.1:9200/index01/type01/02?pretty -H "Content-Type: application/json" \
    -d '{
        "name":"zhangsan",
        "price1":20.2,
        "price2":20.2,
        "tm":"2022-01-01",
        "address":"beijing daxing"
        }'
        
curl -XPOST http://127.0.0.1:9200/index01/type01/03?pretty -H "Content-Type: application/json" \
    -d '{
        "name":"zhangsan",
        "price1":300.03,
        "price2":300.03,
        "tm":"2022-01-01",
        "address":"beijing daxing"
        }'
           

查詢資料:

curl -XGET http://127.0.0.1:9200/index01/type01/_search?pretty
           

聚合:

double類型聚合

curl -XGET http://127.0.0.1:9200/index01/type01/_search?pretty -H "Content-Type: application/json" \
    -d '{
            "aggs": {
                "sum_price1": {
                    "sum":{
                            "field": "price1"
                          }
                    }
                }
            }
        }'
           

結果:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "index01",
        "_type" : "type01",
        "_id" : "01",
        "_score" : 1.0,
        "_source" : {
          "name" : "zhangsan",
          "price1" : 1.0,
          "price2" : 1.0,
          "tm" : "2022-01-01",
          "address" : "beijing daxing"
        }
      },
      {
        "_index" : "index01",
        "_type" : "type01",
        "_id" : "03",
        "_score" : 1.0,
        "_source" : {
          "name" : "zhangsan",
          "price1" : 300.03,
          "price2" : 300.03,
          "tm" : "2022-01-01",
          "address" : "beijing daxing"
        }
      },
      {
        "_index" : "index01",
        "_type" : "type01",
        "_id" : "02",
        "_score" : 1.0,
        "_source" : {
          "name" : "zhangsan",
          "price1" : 20.2,
          "price2" : 20.2,
          "tm" : "2022-01-01",
          "address" : "beijing daxing"
        }
      }
    ]
  },
  "aggregations" : {
    "sum_price1" : {
      "value" : 321.22999999999996
    }
  }
}

           

scaled_float類型聚合

curl -XGET http://127.0.0.1:9200/index01/type01/_search?pretty -H "Content-Type: application/json" \
    -d '{
            "aggs": {
                "sum_price2": {
                    "sum":{
                            "field": "price2"
                          }
                    }
                }
            }
        }'

           

結果:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "index01",
        "_type" : "type01",
        "_id" : "01",
        "_score" : 1.0,
        "_source" : {
          "name" : "zhangsan",
          "price1" : 1.0,
          "price2" : 1.0,
          "tm" : "2022-01-01",
          "address" : "beijing daxing"
        }
      },
      {
        "_index" : "index01",
        "_type" : "type01",
        "_id" : "03",
        "_score" : 1.0,
        "_source" : {
          "name" : "zhangsan",
          "price1" : 300.03,
          "price2" : 300.03,
          "tm" : "2022-01-01",
          "address" : "beijing daxing"
        }
      },
      {
        "_index" : "index01",
        "_type" : "type01",
        "_id" : "02",
        "_score" : 1.0,
        "_source" : {
          "name" : "zhangsan",
          "price1" : 20.2,
          "price2" : 20.2,
          "tm" : "2022-01-01",
          "address" : "beijing daxing"
        }
      }
    ]
  },
  "aggregations" : {
    "sum_price2" : {
      "value" : 321.23
    }
  }
}

           

結果分析

  • double類型在做運算時會存在丢失精度的問題
  • scaled_float類型,在指定合适的縮放因子的前提下可以規避浮點類型運算丢失精度的問題

注意:

  • 特别注意,需要知道導入price2字段的資料的最大精度,scaling_factor不能小于最大精度的小數位位數,否則可能丢失精度。
  • 另外scaled_float縮放類型的浮點型,使用注意:必須指定縮放因子scaling_factor。

    ES索引時,原始值會乘以該縮放因子并四舍五入得到新值,ES内部儲存的是這個新值,但傳回結果仍是原始值。例如:scale_factor為10的scaled_float字段将在内部存儲2.34為23,

    查詢時,ES都會将查詢參數*10再四舍五入得到的值與23比對,若能比對到傳回結果為2.34。

參考:

ES資料類型

繼續閱讀