ES查詢時_id含有特殊字元的問題

2023-06-11 07:17:57

問題

根據ID擷取不到文檔, 報錯

GET index/_doc/2_900002151162=I1B8PIUB1-66M04493WI71zDLXZliUSgmR9S9eVMLh2/NK3FcIhRi4yf8VU=

注: id是字元串經過處理而來

提示資訊

{
  "error": "no handler found for uri [/card_record/_doc/2_900002151162=I1B8PIUB1-66M04493WI71zDLXZliUSgmR9S9eVMLh2/NK3FcIhRi4yf8VU=?pretty] and method [GET]"
}

問題複現

使用index接口索引文檔

IndexRequest request = new IndexRequest();
request.index("index-test2");
request.id("1+1");
Map<String, Object> doc = new HashMap<>();
doc.put("doc", 1);
doc.put("type","index");
request.source(doc);
try {
    IndexResponse response = client.index(request, RequestOptions.DEFAULT);
    System.out.println("response = " + JSONObject.toJSONString(response));
} catch (IOException e) {
    e.printStackTrace();
}

使用bulk索引文檔

IndexRequest request = new IndexRequest();
request.index("index-test2");
request.id("1+1");
Map<String, Object> doc = new HashMap<>();
doc.put("doc", 2);
doc.put("type","bulk");
request.source(doc);

BulkRequest bulkRequest=new BulkRequest();
bulkRequest.add(request);
try {
    BulkResponse response = client.bulk(bulkRequest, RequestOptions.DEFAULT);
    System.out.println("response = " + JSONObject.toJSONString(response));
} catch (IOException e) {
    e.printStackTrace();
}

kibana上檢視

POST index-test2/_search
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "index-test2",
        "_type" : "_doc",
        "_id" : "1 1",
        "_score" : 1.0,
        "_source" : {
          "doc" : 1,
          "type" : "index"
        }
      },
      {
        "_index" : "index-test2",
        "_type" : "_doc",
        "_id" : "1+1",
        "_score" : 1.0,
        "_source" : {
          "doc" : 2,
          "type" : "bulk"
        }
      }
    ]
  }
}

原因分析

類似于浏覽器請求, ES服務端接收請求時, 會對浏覽器參數做decode操作, 這個是與ES無關;
可以想到解決辦法是, 調用index, get, exists等用戶端方法時, 先對 _id 調用URLEncode方法;
檢視用戶端源碼, 發現上述方案行不通, 因為ES内部會調用endpoint方法; 且, 這個方法與URLEncode是不同的;

static String endpoint(String index, String type, String id, String endpoint) {
    return new EndpointBuilder().addPathPart(index, type, id).addPathPartAsIs(endpoint).build();
}

實際試驗,
- 做URLEncode之後, 可能含有%, endpoint方法會對%作轉義, 也就是說, "1+1"會被轉成: "1%252B1"
- 服務端接收後, 會去處理 _id 為 "1%2B1"的文檔;

{
        "_index" : "index-test2",
        "_type" : "_doc",
        "_id" : "1%2B1",
        "_score" : 1.0,
        "_source" : {
          "doc" : 1,
          "type" : "index"
        }
      }

通過bulk接口索引, 不會出現浏覽器轉義問題;

總結,

含有特殊字元的 _id, index, bulk索引後, _id不一緻;
含有特殊字元的 _id, bulk索引後, 使用get等方法可能查詢不到;
同理, 若是按id更新, 則可能會更新失敗; 使用index作更新時, 可能會插入一條新資料;

解決方案

手動生成ID時, 避免使用特殊字元;
- 從入口處, 統一生成 _id;
- 從ES内部查詢出來後, 不需要再做特殊處理;

ES查詢時_id含有特殊字元的問題

問題

問題複現

原因分析

解決方案

繼續閱讀

ElasticSearch：部署ElasticSearch & Kibana

ES分詞插件IK Analyzer安裝

【elasticsearch】The number of object passed must be even but was [1]1.概述

跟據經緯度實作附近搜尋Java實作

【最新 v7.9】Elasticsearch的基本概念與配置

圖解elasticsearch的_source、_all、store和index

深入elasticsearch源碼之環境搭建

elasticsearch 的 Percolator操作

es使用項目中遇到的問題

15.profile-api

解決es 高亮查詢片段問題

【轉】ElasticSearch是什麼以及應用場景

ElasticSearch是什麼以及應用場景ES是如何産生的？ES 基礎一網打盡ES特點和優勢為什麼要用ES？ES的應用場景是怎樣的？

延雲行業搜尋資料庫在大資料生态中位置和重要性大資料的挑戰大資料技術的現狀延雲行業搜尋資料庫

尚矽谷—韓順平—圖解 Java設計模式（結構型）（55～）

30天了解30種技術系列---(10)面向Cloud的搜尋引擎 ElasticSearch