ES 14 - (底層原理) Elasticsearch内部如何處理不同type的資料

Elasticsearch的type有什麼用處? 通過type元字段實作了什麼功能? 底層如何使用? 有哪些注意事項/最佳實踐? 本篇文章對這些内容作一個簡單的探讨.

1 type的作用
2 type的底層資料結構
3 探究type的存儲結構
- 3.1 建立索引并配置映射
- 3.2 添加資料
- 3.3 檢視存儲結構
4 關于type的最佳實踐
版權聲明

在Elasticsearch的索引(index)中, 通過辨別元字段

_type

來區分不同的type, 是以我們可以把具有相同字段(field)的文檔劃分到同一個type下.

==> 因而

_type

也稱作映射類型, 即每個type都有各自的mapping.

但即使是類似的資料, 也有可能存在不同的field, 比如:

商品中有電子商品有電壓field;

服裝商品有洗滌方式field;

生鮮商品有營養成分field… 這些不同的field要如何處理呢?

==> 在之前的博文中有提到過: 同一index的不同type中, 同名的field的映射配置必須相同. 這是為什麼呢?

Elasticsearch底層所使用的核心工具庫——Lucene中并沒有type的說法, 它在建立索引的時候, 會把所有field的值當做opaque bytes(不透明位元組)類型來處理:

在存儲document時, ES會将該document所屬的type作為一個 type
字段進行存儲;

在搜尋document時, ES通過
_type 來進行過濾和篩選.

每個index中的所有type都是存儲在一起的, 是以:

在Elasticsearch 6.0之前: 同一個index的不同type中, 同名的field的映射配置( _type
)必須相同.

在Elasticsearch 6.0開始: 一個index中不能擁有多個type.

說明: 從Elasticsearch 6.0開始, 不允許在一個index中建立多個type ——隻能建立一個, 否則将發生錯誤:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Rejecting mapping update to [website] as the final mapping would have more than 1 type: [manager, writer]"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "Rejecting mapping update to [website] as the final mapping would have more than 1 type: [manager, writer]"
  },
  "status": 400
}

這裡示範所用的版本是6.6.0, 特此說明.

PUT website
{
    "mappings": {      // Elasticsearch 6.0之後的版本中, 隻添加這一個type
        "writer": {
            "properties": {
                "id": { "type": "long" },
                "name": { "type": "text" },
                "age": { "type": "integer" },
                "sex": { "type": "text", "index": false }
            }
        }, 
        "manager": {   // 省去此type
            "properties": {
                "id": { "type": "long" },
                "name": { "type": "text" },
                "age": { "type": "integer" },
                "sex": { "type": "text", "index": false }, 
                "authorize": { "type": "text", "index": false}
            }
        }
    }
}

PUT website/writer/1
{
    "id": 1001,
    "name": "tester",
    "age": 18,
    "sex": "female"
}
// Elasticsearch 6.0之後的版本中, 不添加下述文檔:
PUT website/manager/1
{
    "id": 1001,
    "name": "shou feng",
    "age": 20,
    "sex": "male",
    "authorize": "all"
}

// 搜尋所有資料
GET website/_search

// 搜尋結果如下:
{
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "website",
        "_type" : "writer",    // _type是writer
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "id" : 1001,
          "name" : "tester",
          "age" : 18,
          "sex" : "female"
        }
      },
      {
        "_index": "website",
        "_type": "manager",			// _type為manager
        "_id": "1",
        "_score": 1,
        "_source": {
          "id": 1001,
          "name": "shou feng",
          "age": 20,
          "sex": "male",
          "authorize": "all"
        }
      }
    ]
  }
}

将結構類似的type存放在同一個index下 —— 這些type的大部分field應該是相同的.

如果将兩個field完全不同的type存入同一個index下, 在Lucene底層存儲時, 每個document中都将有一大部分field是空值, 這将導緻嚴重的性能問題, 并且占用磁盤空間:

例如: 上述

website/writer

的每個document中, 都有"authorize"字段, 隻是它們的值都為空.

—— 從這個角度出發, 大概就能猜出 ES限制一個index中隻能有一個type 的原因了吧, 也就是更友善地組織文檔資料、節省磁盤空間😊

作者: 馬瘦風

出處: 部落格園馬瘦風的部落格

您的支援是對部落客的極大鼓勵, 感謝您的閱讀.

本文版權歸部落客所有, 歡迎轉載, 但請保留此段聲明, 并在文章頁面明顯位置給出原文連結, 否則部落客保留追究相關人員法律責任的權利.

ES 14 - (底層原理) Elasticsearch内部如何處理不同type的資料

繼續閱讀

k8s部署es叢集和kibana

ElasticSearch：部署ElasticSearch & Kibana

ES分詞插件IK Analyzer安裝

【elasticsearch】The number of object passed must be even but was [1]1.概述

跟據經緯度實作附近搜尋Java實作

【最新 v7.9】Elasticsearch的基本概念與配置

圖解elasticsearch的_source、_all、store和index

深入elasticsearch源碼之環境搭建

elasticsearch 的 Percolator操作

es使用項目中遇到的問題

15.profile-api

【轉】ElasticSearch是什麼以及應用場景

ElasticSearch是什麼以及應用場景ES是如何産生的？ES 基礎一網打盡ES特點和優勢為什麼要用ES？ES的應用場景是怎樣的？

延雲行業搜尋資料庫在大資料生态中位置和重要性大資料的挑戰大資料技術的現狀延雲行業搜尋資料庫

尚矽谷—韓順平—圖解 Java設計模式（結構型）（55～）

30天了解30種技術系列---(10)面向Cloud的搜尋引擎 ElasticSearch