Elasticsearch-索引設定mappings

前言

本文基于elasticsearch7.3.0版本

映射是定義文檔及其包含的字段如何存儲和索引的過程

檢視索引mappings

GET index_test/_mapping

元字段meta-fields

元字段用于自定義如何處理與文檔相關的中繼資料，主要包括_source,_index,_type,_id

_source

_source字段存儲了在索引時傳遞的原始JSON文檔

_source字段本身沒有被索引(是以不是可搜尋的)，但它被存儲起來，以便在執行時可以傳回

#  建立索引
PUT index_test
{
  "mappings": {
    "_source": {
      // true，預設值，表示使用_source field
      // false，表示禁用，禁用_source field将導緻查詢時不會傳回索引時的原始JSON文檔
      // 設定成false時注意：雖然字段不存儲在_source field中,但是我們仍然可以搜尋這個字段
      "enabled": true, 
      // 指定哪些字段儲存在_sorce field裡面
      // 當includes不存在或者存在時數組大小為空，此時存儲的字段就是去除excludes之外的字段
      // 當includes存在且數組大小不為空，此時存儲的字段就是includes指定的字段
      "includes": ["field1"],
      // 指定哪些字段不儲存在_source field裡面
      "excludes": ["field3"]
    },
    "properties": {
      "field1": {
        "type": "keyword"
      },
      "field2": {
        "type": "keyword"
      },
      "field3": {
        "type": "keyword"
      }
    }
  }
}

  // includes不存在，excludes中的字段排除掉，是以儲存：field1，field2
    "_source": {
      "excludes": ["field3"]
    }

  // includes存在但是數組為空，excludes中的字段排除掉，是以儲存：field1，field2
  "_source": {
      "includes": [],
      "excludes": ["field3"]
    }

  // includes存在且數組不為空，儲存的字段就是includes指定的字段，是以儲存：field1
  "_source": {
      "includes": ["field1"],
      "excludes": ["field3"]
    }

_index

表示文檔存儲在哪個索引中，也可以用來查詢和聚合使用

# 聚合每個索引中的文檔數
GET _search
{
  "size": 0, 
  "aggs": {
    "index_terms_aggs": {
      "terms": {
        "field": "_index"
      }
    }
  }
}

_type

表示文檔存儲在索引中的哪個類型裡，隻有一個類型_doc

_id

文檔唯一辨別

字段類型

核心資料類型

字元串

text：索引時會分詞，類比查詢中的match，查詢時會分詞
keyword：索引時不會分詞，原樣索引，類比查詢中的term，查詢時不會分詞

# 建立索引
PUT index_test
{
  "mappings": {
    "properties": {
      "content":{
        // text, 索引時會分詞, 可以設定analyzer
        "type": "text"
      },
      "tag":{
        // keyword, 索引時不會分詞, 不能指定analyzer, 否則建立索引時會報錯
        "type": "keyword"
      }
    }
  }
}

# 會分詞，這裡mappings中沒有指定analyzer，也沒有指定default分析器，是以使用内置standard分析器
POST index_test/_analyze
{
  "text":["這是測試文本，測試分詞效果"],
  "field": "content"
}

# 不會分詞
POST index_test/_analyze
{
  "text":["這是測試文本，測試分詞效果"],
  "field": "tag"
}

注意：keyword類型不能設定analyzer，否則建立索引時會報錯

# keyword類型指定分析器報錯
PUT index_test
{
  "mappings": {
    "properties": {
      "content":{
        "type": "text"
      },
      "tag":{
        "type": "keyword",
        "analyzer": "ik_max_word"
      }
    }
  }
}

{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "Mapping definition for [tag] has unsupported parameters:  [analyzer : ik_max_word]"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "Failed to parse mapping [_doc]: Mapping definition for [tag] has unsupported parameters:  [analyzer : ik_max_word]",
    "caused_by": {
      "type": "mapper_parsing_exception",
      "reason": "Mapping definition for [tag] has unsupported parameters:  [analyzer : ik_max_word]"
    }
  },
  "status": 400
}

預設建立的mappings, 每個字元串字段解析成text, 并且會預設一個fields為keyword

// my_test索引事先不存在
POST my_test/_doc/1
{
  "userName": "jack",
  "password": "123456"
}

// 檢視mappings
GET my_test/_mapping

// 結果
{
  "my_test" : {
    "mappings" : {
      "properties" : {
        "password" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "userName" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

數字

整數類型

類型	取值範圍	描述
byte	-128 ~ 127(-27 ~ 27-1)	1個位元組,一個有符号的8 bit整數
short	-32768 ~ 32767(-215 ~ 215-1)	2個位元組,一個有符号的16 bit整數
integer	-231 ~ 231-1	4個位元組,一個有符号的32 bit整數
long	-263 ~ 263-1	8個位元組,一個有符号的64 bit整數

如何選擇類型

就整數類型而言(byte, short, integer和long)，應該選擇适合用例的最小類型。這将有助于索引和搜尋效率更高。但是，請注意，存儲是基于存儲的實際值進行優化的，是以選擇一種類型而不是另一種類型将不會對存儲需求産生影響。

比如:明确一個值最小值為0,最大為100,選擇byte

浮點類型

類型	取值範圍
double	64 bit雙精度IEEE 754浮點數
float	32 bit單精度IEEE 754浮點數
half_float	16 bit半精度IEEE 754浮點數
scaled_float	縮放類型的的浮點數,由一個double和一個long型縮放因子組成

如何選擇類型

對于浮點類型，優先考慮使用帶縮放因子的scaled_float浮點類型

如果scaled_float不是很合适，那麼你應該在浮點類型中選擇适合用例的最小類型：double, float和half_float

對于double、float和half_float，-0.0和+0.0是不同的值，使用term查詢查找-0.0不會比對+0.0，同樣range查詢中上邊界是-0.0不會比對+0.0，下邊界是+0.0不會比對-0.0。

scaled_float說明:比如價格需要保留三位小數,price為132.889,縮放因子為1000,存起來就是132889

PUT my_test
{
  "mappings": {
    "properties": {
      "price":{
        "type": "scaled_float",
        // 設定縮放因子,這個參數必須有,否則建立索引時報錯
        "scaling_factor": 1000
      }
    }
  }
}

PUT my_test/_doc/1
{
  // 實際存儲的值 : 132.889 * 1000(縮放因子) = 132889
  "price": 132.889
}

日期

ElasticSearch 内部會将日期資料轉換為UTC，并存儲為milliseconds-since-the-epoch的long型整數(時間戳)

詳細請參考這篇部落格:Elasticsearch-日期資料類型和時區詳解

布爾

布爾字段接受JSON true和false值，但也可以接受被解釋為true或false的字元串

布爾	值
真	true, “true”
假	false, “false”

二進制

這個binary類型接受base 64編碼的字元串，可用來存儲二進制形式的資料

預設情況下，字段隻存儲不索引，是以也不能被搜尋

注意: base 64編碼的二進制值必須沒有嵌入的換行符。\n

範圍

類型	取值
integer_range	帶符号的32 bit整數的範圍，-231 ~ 231-1
long_range	有符号64 bit整數的範圍，-263 ~ 263-1
float_range	32 bit單精度IEEE 754浮點數範圍
double_range	64 bit雙精度IEEE 754浮點數範圍
date_range	表示milliseconds-since-the-epoch的long型整數, 即64 bit整數毫秒的無符号日期值的範圍
ip_range	支援以下任一項的IP值範圍IPv 4或IPv 6(或混合)位址

PUT range_index
{
  "mappings": {
    "properties": {
      "expected_attendees": {
        "type": "integer_range"
      },
      "time_frame": {
        "type": "date_range", 
        "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
      }
    }
  }
}

PUT range_index/_doc/1?refresh
{
  "expected_attendees" : { 
    "gte" : 10,
    "lte" : 20
  },
  "time_frame" : { 
    "gte" : "2015-10-31 12:00:00", 
    "lte" : "2015-11-01"
  }
}

複合資料類型

數組

在Elasticearch中，沒有專門的array資料類型。預設情況下，任何字段都可以包含零或多個值，但是數組中的所有值必須具有相同的資料類型，ElasticSearch不支援元素為多個資料類型：[ 10, “some string” ]

數組可能包含null值，這些值要麼為配置的null_value或者完全跳過。空數組[]被視為缺失字段–一個沒有值的字段

常見數組:

字元串數組：[“one”, “two” ]
整數數組：[1, 2 ]
嵌套數組：[1, [ 2, 3]這相當于[1, 2, 3 ]
對象數組：[{ “name”: “Mary”, “age”: 12 }, { “name”: “John”, “age”: 10 }]

注意:對象數組不像預期的那樣工作：不能獨立于數組中的其他對象查詢每個對象。如果您需要能夠做到這一點，那麼您應該使用nested資料類型，而不是object資料類型。

對象

JSON文檔本質上是分層的：文檔可能包含内部對象，而内部對象也可能包含内部對象本身

不需要顯示的設定字段type為object, 因為這是預設值

# my_index事先不存在
PUT my_index/_doc/1
{
  "region": "US",
  "manager": {
    "age": 30,
    "name": {
      "first": "John",
      "last": "Smith"
    }
  }
}

# 在内部，這個文檔被索引為一個簡單的鍵值對清單，如下所示：
{
  "region":             "US",
  "manager.age":        30,
  "manager.name.first": "John",
  "manager.name.last":  "Smith"
}

# 檢視這個索引mappings
GET my_index/_mapping

# 結果
{
  "my_index" : {
    "mappings" : {
      "properties" : {
        "manager" : {
          "properties" : {
            "age" : {
              "type" : "long"
            },
            "name" : {
              "properties" : {
                "first" : {
                  "type" : "text",
                  "fields" : {
                    "keyword" : {
                      "type" : "keyword",
                      "ignore_above" : 256
                    }
                  }
                },
                "last" : {
                  "type" : "text",
                  "fields" : {
                    "keyword" : {
                      "type" : "keyword",
                      "ignore_above" : 256
                    }
                  }
                }
              }
            }
          }
        },
        "region" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

對象數組

nested資料類型是一種特殊的object資料類型，它允許對對象數組進行索引，使它們可以彼此獨立地查詢

object資料類型存在的問題

# my_index索引事先不存在, user字段被動态添加為object字段類型
PUT my_index/_doc/1
{
  "group": "fans",
  "user": [
    {
      "first": "John",
      "last": "Smith"
    },
    {
      "first": "Alice",
      "last": "White"
    }
  ]
}

# 内部存儲, user.first和user.last之間沒有了關聯關系
{
  "group" :        "fans",
  "user.first" : [ "alice", "john" ],
  "user.last" :  [ "smith", "white" ]
}

# 查詢, 此時不能正确地比對alice AND smith, 本來應該查詢不出來資料的, 這時候查詢出來了id為1的文檔
GET my_index/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "user.first": "Alice"
          }
        },
        {
          "match": {
            "user.last": "Smith"
          }
        }
      ]
    }
  }
}

如果需要索引對象數組并維護數組中每個對象的獨立性，則應使用nested資料類型，而不是object資料類型。在内部，嵌套對象将數組中的每個對象索引為單獨的隐藏文檔，這意味着每個嵌套對象可以獨立于其他對象進行查詢，nested查詢:

# 建立索引
PUT my_index
{
  "mappings": {
    "properties": {
      "user": {
        // 設定字段類型
        "type": "nested",
        "properties": {
      "first": {
        "type": "keyword"
      },
      "last": {
        "type": "keyword"
      }
    }
      }
    }
  }
}

# 索引文檔
PUT my_index/_doc/1
{
  "group": "fans",
  "user": [
    {
      "first": "John",
      "last": "Smith"
    },
    {
      "first": "Alice",
      "last": "White"
    }
  ]
}

# 使用nested查詢
GET my_index/_search
{
  "query": {
    "nested": {
      "path": "user",
      "query": {
        "bool": {
          // 此查詢比對，因為Alice和White在同一個嵌套對象中
          "must": [
            {
              "match": {
                "user.first": "Alice"
              }
            },
            {
              "match": {
                "user.last": "White"
              }
            }
          ]
        }
      },
      // inner_hits允許我們高亮顯示比對的嵌套文檔
      "inner_hits": {
        "highlight": {
          "fields": {
            "user.first": {}
          }
        }
      }
    }
  }
}

地理資料類型

地點

類型字段geo_point接受經緯度對

lat:緯度
lon:經度

有五種方法可以指定地理點，如下所示：

PUT my_index
{
  "mappings": {
    "properties": {
      "location": {
        "type": "geo_point"
      }
    }
  }
}

# 表示為對象的geo_point
PUT my_index/_doc/1
{
  "text": "Geo-point as an object",
  "location": { 
    "lat": 41.12,
    "lon": -71.34
  }
}

# geo_point表示為字元串, 格式為: "lat,lon"(緯度,經度)
PUT my_index/_doc/2
{
  "text": "Geo-point as a string",
  "location": "41.12,-71.34" 
}

# geohash表示geo_point
PUT my_index/_doc/3
{
  "text": "Geo-point as a geohash",
  "location": "drm3btev3e86" 
}

# 數組表示的geo_point, 格式為: [lon, lat]([經度, 緯度])
PUT my_index/_doc/4
{
  "text": "Geo-point as an array",
  "location": [ -71.34, 41.12 ] 
}

# POINT字元串, 存儲時報錯
# {
#  "type": "parse_exception",
#  "reason": "unsupported symbol [o] in geohash [POINT(-71.34 41.12)]"
# }
PUT my_index/_doc/5
{
  "text": "Geo-point as a WKT POINT primitive",
  "location" : "POINT (-71.34 41.12)" 
}

# 查詢
GET my_index/_search
{
  "query": {
    "geo_bounding_box": { 
      "location": {
        "top_left": {
          "lat": 42,
          "lon": -72
        },
        "bottom_right": {
          "lat": 40,
          "lon": -74
        }
      }
    }
  }
}

地形

這個geo_shape資料類型為任意幾何形狀(如矩形和多邊形)的索引和搜尋提供了便利。當要索引的資料或正在執行的查詢包含的形狀不隻是點時，就應該使用它。

可以對類型使用地質形狀查詢.

特殊資料類型

ip

ip字段可以索引/存儲IPv 4或IPv 6位址

completion

這個completion suggester(建議)提供自動完成/搜尋即用類型的功能。這是一個導航功能，引導使用者在輸入時獲得相關結果，進而提高搜尋精度。它不是用于拼寫更正，也不是指像term或phrase suggester。

理想情況下，自動完成功能應該和使用者輸入一樣快，以提供與使用者已經輸入的内容相關的即時回報。是以，completion suggester是優化的速度。該建議程式使用的資料結構支援快速查找，但建構成本很高，并且存儲在記憶體中。

token_count

token_count類型是一個integer類型，該字段接受字元串值，對它們進行分析，然後對字元串中的标記數進行索引。

PUT my_index
{
  "mappings": {
    "properties": {
      "name": { 
        // 使用内置的standard分析器
        "type": "text",
        "fields": {
          "length": { 
            "type":     "token_count",
            "analyzer": "standard"
          }
        }
      }
    }
  }
}

# name分析後有2個token
PUT my_index/_doc/1
{ "name": "John Smith" }

# name分析後有3個token
PUT my_index/_doc/2
{ "name": "Rachel Alice Williams" }

# 查詢name包含3個token的文檔
GET my_index/_search
{
  "query": {
    "term": {
      "name.length": 3 
    }
  }
}

映射參數

analyzer和search_analyzer

詳細參考:Elasticsearch-分析器詳解

analyzer: 指定字段索引和查詢時的分析器
search_analyzer: 指定字段查詢時的分析器, 優先級高于analyzer

字段索引時分析器優先級:

mappings analyzer
settings default分析器
内置standard分析器

字段查詢時分析器優先級

查詢時指定的分析器
mappings search_analyzer
mappings analyzer
settings default_search分析器
settings default分析器
内置standard分析器

format

自定義日期格式化格式

index

index選項控制字段值是否被索引，elasticsearch預設索引所有字段，它接受true或false, 預設為true

沒有被索引的字段不可查詢, 但是仍然會存儲在_source中

enabled

ElasticSearch試圖索引給它的所有字段，但有時你隻想存儲字段而不對其進行索引

這個enabled設定，該設定隻能應用于頂層映射定義和object字段，導緻Elasticearch完全跳過對字段内容的解析。仍然可以從_source字段，但它不能搜尋，也不能存儲在store字段中

PUT my_index
{
  "mappings": {
    "properties": {
      "session_data": { 
        "type": "object",
        // 這個session_data字段被禁用
        "enabled": false,
        // 這個時候store隻能為false, 為true則建立索引時報錯
        "store": false
      }
    }
  }
}

# enabled為false, store為true時, 建立索引報錯
{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "Mapping definition for [session_data] has unsupported parameters:  [store : true]"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "Failed to parse mapping [_doc]: Mapping definition for [session_data] has unsupported parameters:  [store : true]",
    "caused_by": {
      "type": "mapper_parsing_exception",
      "reason": "Mapping definition for [session_data] has unsupported parameters:  [store : true]"
    }
  },
  "status": 400
}

# 任意資料都可以傳遞給session_data字段，因為它将被完全忽略
PUT my_index/_doc/session_1
{
  "session_data": { 
    "arbitrary_object": {
      "some_array": [ "foo", "bar", { "baz": 2 } ]
    }
  }
}

# 注意，由于Elasticsearch完全跳過解析字段内容，是以可以将非對象資料添加到禁用的字段中
# 這個session_data也将忽略不是JSON對象的值
PUT my_index/_doc/session_2
{
  "session_data": "none"
}

整個映射也可能被禁用，在這種情況下，文檔存儲在_source字段，這意味着可以檢索該字段，但沒有以任何方式對其内容進行索引

PUT my_index
{
  "mappings": {
    // 整個映射被禁用
    "enabled": false 
  }
}

PUT my_index/_doc/session_1
{
  "user_id": "kimchy",
  "session_data": {
    "arbitrary_object": {
      "some_array": [ "foo", "bar", { "baz": 2 } ]
    }
  },
  "last_updated": "2015-12-06T18:20:22"
}

# 可以檢索文檔
GET my_index/_doc/session_1 

# 檢查映射會發現沒有添加任何字段
GET my_index/_mapping 

# 傳回
{
  "my_index" : {
    "mappings" : {
      "enabled" : false
    }
  }
}

copy_to

copy_to參數允許你将多個字段的值複制到一組字段中，然後可以将該字段作為單個字段進行查詢

PUT my_index
{
  "mappings": {
    "properties": {
      "first_name": {
        "type": "text",
        "copy_to": "full_name" 
      },
      "last_name": {
        "type": "text",
        "copy_to": "full_name" 
      },
      "full_name": {
        "type": "text",
        "store": true
      }
    }
  }
}

PUT my_index/_doc/1
{
  "first_name": "John",
  "last_name": "Smith"
}

# full_name字段不會出現在_source中
GET my_index/_search
{
  "query": {
    "match": {
      "full_name": { 
        "query": "John Smith",
        "operator": "and"
      }
    }
  }
}

# 傳回
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.5753642,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.5753642,
        "_source" : {
          "first_name" : "John",
          "last_name" : "Smith"
        }
      }
    ]
  }
}

# copy的内容會存儲在store字段中
GET my_index/_search
{
  "stored_fields": ["full_name"], 
  "query": {
    "match": {
      "full_name": { 
        "query": "John Smith",
        "operator": "and"
      }
    }
  }
}

# 傳回
{
  "took" : 815,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.5753642,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.5753642,
        "fields" : {
          "full_name" : [
            "John",
            "Smith"
          ]
        }
      }
    ]
  }
}

store

預設情況下，字段值被索引使它們可以被搜尋，但它們不是存儲，這意味着可以查詢字段，但無法檢索原始字段值

但通常這不是問題，預設情況下，字段值已經存儲在_source裡面，是以store預設值為false

如果你隻想檢索單個字段或幾個字段的值，而不是整個_source字段的值。則可以使用source filtering

使用情況一:沒有存儲在_source中的字段, 可以使用store存儲, 比如說:copy_to字段

使用情況二:隻想檢索指定字段, 通常這個可以使用source filtering代替

store查詢和_source查詢差別:

IO的差別: 查詢整個_source字段或者_source部分字段都是一次IO, store查詢每個字段一次IO, 也就是查詢多少個字段就有多少次IO
傳回值的差別: 查詢_source原樣傳回, 為了保持一緻性, 存儲的字段總是作為數組傳回, 因為無法知道原始字段值是單個值、多個值還是空數組

PUT my_index
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "store": true 
      },
      "content": {
        "type": "text"
      }
    }
  }
}

PUT my_index/_doc/1
{
  "title":   "Some short title",
  "content": "A very long content field..."
}

# 從store中查詢
GET my_index/_search
{
  "stored_fields": ["title"] 
}

# 傳回
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "fields" : {
          // 一個字段一次IO
          // 為了保持一緻性，存儲的字段總是作為數組傳回, 因為無法知道原始字段值是單個值、多個值還是空數組
          // 如果需要原始值，則應從_source擷取字段值
          "title" : [
            "Some short title"
          ]
        }
      }
    ]
  }
}

# 從_source中查詢
GET my_index/_search
{
    "_source": {
        "includes": ["title"] 
    }
}

# 傳回
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
      // 整個_source字段是一次IO
        "_source" : {
          "title" : "Some short title"
        }
      }
    ]
  }
}

fields

為了不同的目的，以不同的方式對同一個字段進行索引通常是有用的。這就是multi-fields

例如，字元串可以映射為text字段用于全文搜尋，并作為keyword用于排序或聚合的字段

注意:多個字段不會更改_source

PUT my_index
{
  "mappings": {
    "properties": {
      "city": {
        "type": "text",
        "fields": {
          "raw": { 
            "type":  "keyword"
          }
        }
      }
    }
  }
}

PUT my_index/_doc/1
{
  "city": "New York"
}

PUT my_index/_doc/2
{
  "city": "York"
}

GET my_index/_search
{
  "query": {
    "match": {
      "city": "york" 
    }
  },
  "sort": {
    "city.raw": "asc" 
  },
  "aggs": {
    "Cities": {
      "terms": {
        "field": "city.raw" 
      }
    }
  }
}

multi-fields多分析器

multi-fields的另一個用例是以不同的方式分析同一個字段，以獲得更好的相關性。

例如，我們可以用standard分析器将文本分解成文字，然後再用english分析器詞根形式

PUT my_index
{
  "mappings": {
    "properties": {
      "text": { 
        "type": "text",
        "fields": {
          "english": { 
            "type":     "text",
            "analyzer": "english"
          }
        }
      }
    }
  }
}

PUT my_index/_doc/1
{ "text": "quick brown fox" } 

PUT my_index/_doc/2
{ "text": "quick brown foxes" } 

GET my_index/_search
{
  "query": {
    "multi_match": {
      "query": "quick brown foxes",
      "fields": [ 
        "text",
        "text.english"
      ],
      "type": "most_fields" 
    }
  }
}

null_value

null值不能被索引或搜尋。當字段設定為null，(空數組[]或數組為null)它被視為該字段沒有值

null_value參數允許你把null顯式替換為指定的值，以便可以對其進行索引和搜尋
null_value需要與字段有相同的資料類型。例如，long字段不能指定null_value為字元串
null_value隻影響資料的索引方式，不會修改_source檔案

PUT my_index
{
  "mappings": {
    "properties": {
      "status_code": {
        "type":       "keyword",
        // 顯式替換為字元串"NULL"
        "null_value": "NULL" 
      }
    }
  }
}

PUT my_index/_doc/1
{
  // 替換
  "status_code": null
}

PUT my_index/_doc/2
{
  // 空數組不包含顯式null，是以不會用null_value
  "status_code": [] 
}

# 查詢"NULL", 傳回文檔1, 不傳回文檔2 
GET my_index/_search
{
  "query": {
    "term": {
      "status_code": "NULL" 
    }
  }
}

doc_values

預設情況下大多數字段都是被索引的，這使得它們可以被搜尋。

反向索引允許查詢在唯一排序的term清單中查找搜尋詞，并從該清單中立即通路包含該term的文檔清單。

在腳本中對字段值的排序、聚合和通路需要不同的資料通路模式。

我們不需要查找term和查找文檔，而是需要能夠查找文檔并找到它在某個field中的term。

PUT my_index
{
  "mappings": {
    "properties": {
      "status_code": { 
        // status_code字段doc_values預設啟用
        "type":       "keyword"
      },
      "session_id": { 
        "type":       "keyword",
        // session_id的doc_values禁用，但仍然可以查詢, 不能用于排序和聚合
        "doc_values": false
      }
    }
  }
}