Elasticsearch（二） mapping创建和解析

mapping创建

PUT my_index{
	"settings": {
		"number_of_shards ": 5,
		"number_of_replicas": 1
	},
	"mappings": {
		"my_doc": {
			"properties": {
				"title": {
					"type": "text",
					"normalizer": "my_normalizer"
				},
				"name": {
					"type": "text",
					"analyzer": "standard",
					"boost": 2
				},
				"age": {
					"type": "integer"
				},
				"created": {
					"type": "date",
					"format": "strict_date_optional_time||epoch_millis"
				}
			}
		}
	}
}

curl写法（后面类似以下写法，不再列出）

curl -X PUT "localhost:9200/my_index" -H 'Content-Type: application/json' -d'
{
   # 与上面一致
}
'

my_index index名称
setting 配置
my_doc type名
properties 字段配置

mapping参数

字段中的type（es数据类型）

1.text 字符串，分词，全文索引

2.keyword 关键字，不分词，适合id,email等这种不分词的字段

3.numeric 数字类型有integer、long、short、byte、double、float等类型

4.date 时间类型

5.boolean 布尔类型

6.binary 接收base64编码的字符串

7.rang 具体有integer_range，float_range，long_range，double_range，date_range，ip_range，可存储范围数据,如下插入

PUT index/type/id
{
  "field_name" : { 
    "gte" : 10,
    "lte" : 20
  }
}

8.数组类型，es实际上不存在array类型，es每个类型都支持转成数组类型，也就是不管定义成integer还是text等都可以以数组形式存进去，如果需要存integer数组，那只需要将这个字段定义成integer就可以了

9.object 对象类型 json格式

10.

nested

嵌套类型，object嵌套数组

11.

geo_point

经纬度可存入对象json,字符串,数组

12.

ip

可存ipv4 ipv6地址

13.token_count integet类型,统计词个数

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "name": { 
          "type": "text",
          "fields": {
            "length": { 
              "type":     "token_count",
              "analyzer": "standard"
            }
          }
        }
      }
    }
  }
}

PUT my_index/_doc/2
{ "name": "Rachel Alice Williams" }

# 查询name字段有三个词的数据
GET my_index/_search
{
  "query": {
    "term": {
      "name.length": 3 
    }
  }
}

14.

join

用于在同个索引下创建父子关系的类型

# 定义父子关系   question父 answer子
PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "my_join_field": { 
          "type": "join",
          "relations": {
            "question": "answer" 
          }
        }
      }
    }
  }
}

# 插入父数据
PUT my_index/_doc/1
{
  "text": "This is a question",
  "my_join_field": {
    "name": "question"
  }
}

# 插入子数据  routing指向根节点   parent指向直接父节点
PUT my_index/_doc/2?routing=1
{
  "text": "This is an answer",
  "my_join_field": {
    "name": "answer", 
    "parent": "1" 
  }
}

其他数据类型还有Alias，

mapper-murmur3，

mapper-annotated-text,

Percolator type，Completion,Geo-Shape datatype

详细参考官方文档 https://www.elastic.co/guide/en/elasticsearch/reference/6.x/mapping-types.html

analyzer 分析器，作用是分词和词条标准化（比如dogs标准化为dog,大写标准化小写），默认是standard分析器，还可以在创建mapping时配置自定义分析器，并使用自定义分析器。

PUT my_index
{
   "settings":{
      "analysis":{
         "analyzer":{
            "my_stop_analyzer":{ 
               "type":"custom",
               "tokenizer":"standard",
               "filter":[
                  "lowercase",
                  "english_stop"
               ]
            }
         },
         "filter":{
            "english_stop":{
               "type":"stop",
               "stopwords":"_english_"
            }
         }
      }
   },
   "mappings":{
      "_doc":{
         "properties":{
            "title": {
               "type":"text",
               "analyzer":"my_analyzer"
            }
         }
      }
   }
}

normalizer 与analyzer类似，但是分词结果都是单一词
boost 设置查询相关性权重，默认是1
coerce 是否强制转换，默认为true，比如integer类型参数可以直接传递字符串数字，会自动转为数字。设置为false则不会强转，则参数值传递字符串会报错。
copy_to 可将字段指向一个组别，之后的查询可直接查询该组别，查询范围会从组别包含的字段中进行查询，多个值用空格隔开，类似_all。

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "first_name": {
          "type": "text",
          "copy_to": "full_name" 
        },
        "last_name": {
          "type": "text",
          "copy_to": "full_name" 
        }
      }
    }
  }
}

PUT my_index/_doc/1
{
  "first_name": "John",
  "last_name": "Smith"
}

GET my_index/_search
{
  "query": {
    "match": {
      "full_name": { 
        "query": "John Smith",
        "operator": "and"
      }
    }
  }
}

查询结果为

"_source": {
          "first_name": "John",
          "last_name": "Smith"
}

doc_values 一般与keyword结合使用，默认为true,即查询可通过该字段进行排序和聚合查询，但是设置为fasle则不可通过该字段排序和聚合查询，但是相应会比较节省内存空间。
dynamic 默认为true,es默认可以动态新增字段，改为false则不可动态插入不存在的字段名，该参数与properties参数同级

PUT my_index
{
  "mappings": {
    "_doc": {
      "dynamic": false, 
      "properties": {
        "user":{
           "type":"text"
       }
    }
  }
}

enabled 默认为true，设置为false则该字段不可被索引
eager_global_ordinals 设置为true可以提高查询速度，但会降低更新速度，只可用在keyword类型使用，text类型只能在设置fileddata参数为ture时使用
fileddata 与text配合使用，默认text类型不可支持排序和聚合查询，设置fileddata后即可，实现效果与新增keyword类型fields一致。

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "my_field": { 
          "type": "text",
          "fields": {
            "keyword": { 
              "type": "keyword"
            }
          }
        }
      }
    }
  }
}

format 一般与日期类型一起使用，格式化日期

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "date": {
          "type":   "date",
          "format": "yyyy-MM-dd"
        }
      }
    }
  }
}

ignore_malformed 忽略异常插入的数据，默认为false，即插入异常数据会报错，比如integer类型插入非数字字符串，则会报错，设置为true则允许错误数据，可以插入。

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "number_one": {
          "type": "integer",
          "ignore_malformed": true
        },
        "number_two": {
          "type": "integer"
        }
      }
    }
  }
}

ignore_above 限制字段字符串长度,只能和keyword类型一起用，插入的数据超过限制数的数据不报错，但是不会被存储，搜索不到

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "message": {
          "type": "keyword",
          "ignore_above": 20 
        }
      }
    }
  }
}

index 默认为true,设置为false则该字段不会存索引，即不可被搜索到。
fields 为一个字段设定一个子字段，一般是由于当前字段数据类型不满足某种查询时使用，比如text类型的字段，想要排序就需要设置keyword类型的fields 通过此field进行排序，如下

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "city": {
          "type": "text",
          "fields": {
            "raw": { 
              "type":  "keyword"
            }
          }
        }
      }
    }
  }
}

PUT my_index/_doc/1
{
  "city": "New York"
}

GET my_index/_search
{
  "query": {
    "match": {
      "city": "york" 
    }
  },
  "sort": {
    "city.raw": "asc" 
  },
  "aggs": {
    "Cities": {
      "terms": {
        "field": "city.raw" 
      }
    }
  }
}

norms 规范化，默认为ture,如果该字段不参与计分，则可以设置为false以节省硬盘空间
null_value 指定该字段为null或空数组时对应的索引值，默认null不可被索引，使用方式如下

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "status_code": {
          "type":       "keyword",
          "null_value": "NULL" 
        }
      }
    }
  }
}

PUT my_index/_doc/1
{
  "status_code": null
}

PUT my_index/_doc/2
{
  "status_code": [] 
}

GET my_index/_search
{
  "query": {
    "term": {
      "status_code": "NULL" 
    }
  }
}

position_increment_gap 词间差距，当一个字段有多个值，默认该值为100,如下,因为Abraham 和 Lincoln分别在数组的两个词中，因此他们的伪差距为100，所以搜索不出来

PUT my_index/_doc/1
{
    "names": [ "John Abraham", "Lincoln Smith"]
}

GET my_index/_search
{
    "query": {
        "match_phrase": {
            "names": {
                "query": "Abraham Lincoln" 
            }
        }
    }
}

通过此方式修改该字段伪差距

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "names": {
          "type": "text",
          "position_increment_gap": 0 
        }
      }
    }
  }
}

修改为0后，便可跨词搜索出来，但是必须和Abraham 和 Lincoln一样是相邻的词

properties 配置type字段或子字段，自字段可如下配置嵌套类型和object

PUT my_index
{
  "mappings": {
    "_doc": { 
      "properties": {
        "manager": { 
          "properties": {
            "age":  { "type": "integer" },
            "name": { "type": "text"  }
          }
        },
        "employees": { 
          "type": "nested",
          "properties": {
            "age":  { "type": "integer" },
            "name": { "type": "text"  }
          }
        }
      }
    }
  }
}

search_analyzer 配置字段查询分析器，与analyzer一样，但只用在搜索时，同时存在则搜索以这个为主（注 es新增数据索引和搜索都是通过指定分析器进行分词）
similarity 配置字段相关度计算算法，默认是BM25,还可配置classic（TF/IDF算法），boolean（不计算相关度，只看查询内容是否完全匹配）
store 是否存储源数据，默认是true，即会存储源数据，设置为false则该字段不保存数据，一般用于映射
term_vector 词条向量，默认为false 参数有以下几个

no 默认，不存储词条向量
yes 分词对应的字段会被存储
with_positions 分词和每个词的位置会被存储
with_offsets 分词和分词的起始和结束字符偏移量会被存储
with_positions_offsets 分词位置和向量会被存储

Meta FIelds 元字段

_all _all字段会将其他字段的值连接成一个大字符串，使用空格分隔，然后进行分析和索引，但是不进行存储，即可通过_all字段进行搜索所有字段服务的值，但查不到该字段的值，设置为true,则可通过查询_all查询所有。该字段中的值都会被当做字符串处理，比如日期类型2018-09-10会被当做字符串分割为2018 09 10 三个词。需要注意的是6.0以上版本该字段已经弃用，如果需要实现类似效果，可在需要用的字段配置move_to参数。例子如下

GET my_index/_search
{
  "query": {
    "match": {
      "_all": "Tom Terry"
    }
  }
}

_field_names 索引包含除null之外的任何值的文档中每个字段的名称,可通过指定字段名查询，使用如下，已禁用。

GET my_index/_search
{
  "query": {
    "terms": {
      "_field_names": [ "name" ]
    }
  }
}

_ignored 6.4版本后新增的元字段，通过此字段可查询到之前被忽略的异常而插入的数据信息(注：字段通过配置ignore_malformed参数，可忽略异常插入错误数据，比如将非数字的字符串插入数字类型的字段)

GET _search
{
  "query": {
    "exists": {
      "field": "_ignored"
    }
  }
}

_id 通过该字段可以做id查询如下查询id为1 2的数据

GET my_index/_search
{
  "query": {
    "terms": {
      "_id": [ "1", "2" ] 
    }
  }
}

_index 通过此元字段可以进行index层面的排序、聚合、查询等。如下例子，指定查询index，通过index聚合查询。

GET index_1,index_2/_search
{
  "query": {
    "terms": {
      "_index": ["index_1", "index_2"] 
    }
  },
  "aggs": {
    "indices": {
      "terms": {
        "field": "_index", 
        "size": 10
      }
    }
  }
}

_meta 用于存储一些特定信息如下存储类信息。

PUT my_index
{
  "mappings": {
    "user": {
      "_meta": { 
        "class": "MyApp::User",
        "version": {
          "min": "1.0",
          "max": "1.3"
        }
      }
    }
  }
}

_routing 指定分片路由字段，分片计算方式如下，默认的_routing 是_id，即通过id值计算路由

shard_num = hash(_routing) % num_primary_shards

通过以下方式可以指定路由键为user1,插入数据

PUT my_index/_doc/1?routing=user1&refresh=true 
{
  "title": "This is a document"
}

GET my_index/_doc/1?routing=user1

通过以下方式请求可以指定只在user1和user2路由键相关联的分片上查找

GET my_index/_search?routing=user1,user2 
{
  "query": {
    "match": {
      "title": "document"
    }
  }
}

如果带路由键插入，但是不带路由键查询，会根据id计算分片，查找不到可能会导致遍历所有分片，因此可以在创建mapping时指定必须带路由键操作。

PUT my_index2
{
  "mappings": {
    "_doc": {
      "_routing": {
        "required": true 
      }
    }
  }
}

_source 源数据，如果enabled设置为false则不存储数据，配置如下

PUT tweets
{
  "mappings": {
    "_doc": {
      "_source": {
        "enabled": false
      }
    }
  }
}

一般用于配置某些字段不存储数据配置如下,includes表示存储数据的字段，excludes标识不存储数据的字段。

PUT logs
{
  "mappings": {
    "_doc": {
      "_source": {
        "includes": [
          "*.count",
          "meta.*"
        ],
        "excludes": [
          "meta.description",
          "meta.other.*"
        ]
      }
    }
  }
}


PUT logs/_doc/1
{
  "requests": {
    "count": 10,
    "foo": "bar" 
  },
  "meta": {
    "name": "Some metric",
    "description": "Some metric description", 
    "other": {
      "foo": "one", 
      "baz": "two" 
    }
  }
}

_type 与_id类似，可通过type类型进行查询,排序，聚合等。6.0版本后废弃
_uid 唯一id,即在同个index中，uid可以唯一标识任意type中的数据。用法与_type _id类似。6.0版本后废弃

GET my_index/_search
{
  "query": {
    "terms": {
      "_uid": [ "_doc#1", "_doc#2" ] 
    }
}

Elasticsearch（二） mapping创建和解析

mapping创建

mapping参数

Meta FIelds 元字段

继续阅读

ElasticSearch：部署ElasticSearch & Kibana

ES分词插件IK Analyzer安装

【elasticsearch】The number of object passed must be even but was [1]1.概述

跟据经纬度实现附近搜索Java实现

【最新 v7.9】Elasticsearch的基本概念与配置

图解elasticsearch的_source、_all、store和index

深入elasticsearch源码之环境搭建

elasticsearch 的 Percolator操作

es使用项目中遇到的问题

15.profile-api

解决es 高亮查询片段问题

【转】ElasticSearch是什么以及应用场景

ElasticSearch是什么以及应用场景ES是如何产生的？ES 基础一网打尽ES特点和优势为什么要用ES？ES的应用场景是怎样的？

延云行业搜索数据库在大数据生态中位置和重要性大数据的挑战大数据技术的现状延云行业搜索数据库

尚硅谷—韩顺平—图解 Java设计模式（结构型）（55～）

30天了解30种技术系列---(10)面向Cloud的搜索引擎 ElasticSearch