天天看点

elasticsearch 之 索引管理:基于scoll、bulk、索引别名技术实现零停机重建索引

目录

  • 1、思路
  • 2、实验
  • 3、总结

1、思路

一个field的设置是不能被修改的,如果要修改一个Field,那么应该重新按照新的mapping,建立一个index_new,然后将数据批量查询出来,重新用bulk api写入index_new中批量查询的时候,建议采用scroll api,并且采用多线程并发的方式来reindex数据,每次scoll就查询指定日期的一段数据,交给一个线程即可,当数据导入index_new完毕时,客户端在切换到新的index_new即可。

2、实验

(1)一开始,依靠dynamic mapping,插入数据,但是不小心有些数据是2017-01-01这种日期格式的,所以title这种field被自动映射为了【date】类型,实际上它应该是【string】类型的。

导入数据

PUT /my_index/my_type/1

{

  "title":"2019-01-01"

}

PUT /my_index/my_type/2

{

  "title":"2019-01-02"

}

PUT /my_index/my_type/3

{

  "title":"2019-01-03"

}

查看mapping 类型

get /my_index/_mapping/my_type

{

  "my_index": {

    "mappings": {

      "my_type": {

        "properties": {

          "title": {

            "type": "date"

          }

        }

      }

    }

  }

}

(2)当后期向索引中加入string类型的title值的时候,就会报错

插入数据

PUT /my_index/my_type/4

{

  "title":"hello elasticsearch"

}

反馈结果

{

  "error": {

    "root_cause": [

      {

        "type": "mapper_parsing_exception",

        "reason": "failed to parse [title]"

      }

    ],

    "type": "mapper_parsing_exception",

    "reason": "failed to parse [title]",

    "caused_by": {

      "type": "illegal_argument_exception",

      "reason": "Invalid format: \"hello elasticsearch\""

    }

  },

  "status": 400

}

(3)如果此时想修改title的类型,是不可能的

PUT /my_index/_mapping/my_type

{

  "properties": {

    "title":{

      "type": "text"

    }

  }

}

反馈信息

{

  "error": {

    "root_cause": [

      {

        "type": "illegal_argument_exception",

        "reason": "mapper [title] of different type, current_type [date], merged_type [text]"

      }

    ],

    "type": "illegal_argument_exception",

    "reason": "mapper [title] of different type, current_type [date], merged_type [text]"

  },

  "status": 400

}

(4)此时,唯一的办法,就是进行reindex,也就是说,重新建立一个索引,将旧索引的数据查询出来,再导入新索引

(5)如果说旧索引的名字,是old_index,新索引的名字是new_index,终端java应用,已经在使用old_index在操作了,难道还要去停止java应用,修改使用的index为new_index,才重新启动java应用吗?这个过程中,就会导致java应用停机,可用性降低

(6)所以说,给java应用一个别名,这个别名是指向旧索引的,java应用先用着,java应用先用goods_index alias来操作,此时实际指向的是旧的my_index

PUT /my_index/_alias/goods_index

反馈信息

{

  "acknowledged": true

}

(7)新建一个index,调整其title的类型为string

PUT /my_index_new

{

  "mappings": {

    "my_type":{

      "properties": {

        "title":{

          "type": "text"

        }

      }

    }

  }

}

(8)使用scroll api将数据批量查询出来,

实例查询一条即可:

GET /my_index/_search?scroll=1m

{

  "query": {

    "match_all": {}

  },

  "sort": ["_doc"],

  "size": 1

}

反馈信息

{

  "_scroll_id": "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAAO5Fm1vRkdXOXJzU2VxaVRaaXBsLXZvZlEAAAAAAAADuhZtb0ZHVzlyc1NlcWlUWmlwbC12b2ZRAAAAAAAAA7cWbW9GR1c5cnNTZXFpVFppcGwtdm9mUQAAAAAAAAO7Fm1vRkdXOXJzU2VxaVRaaXBsLXZvZlEAAAAAAAADuBZtb0ZHVzlyc1NlcWlUWmlwbC12b2ZR",

  "took": 3,

  "timed_out": false,

  "_shards": {

    "total": 5,

    "successful": 5,

    "failed": 0

  },

  "hits": {

    "total": 3,

    "max_score": null,

    "hits": [

      {

        "_index": "my_index",

        "_type": "my_type",

        "_id": "2",

        "_score": null,

        "_source": {

          "title": "2019-01-02"

        },

        "sort": [

        ]

      }

    ]

  }

}

(9)采用bulk api将scoll查出来的一批数据,批量写入新索引

POST /_bulk

{ "index":  { "_index": "my_index_new", "_type": "my_type", "_id": "2" }}

{ "title":    "2017-01-02" }

反馈信息

{

  "took": 2161,

  "errors": false,

  "items": [

    {

      "index": {

        "_index": "my_index_new",

        "_type": "my_type",

        "_id": "2",

        "_version": 1,

        "result": "created",

        "_shards": {

          "total": 2,

          "successful": 1,

          "failed": 0

        },

        "created": true,

        "status": 201

      }

    }

  ]

}

(10)反复循环8~9,查询一批又一批的数据出来,采取bulk api将每一批数据批量写入新索引

(11)将goods_index alias切换到my_index_new上去,java应用会直接通过index别名使用新的索引中的数据,java应用程序不需要停机,零提交,高可用

POST /_aliases

{

    "actions": [

        { "remove": { "index": "my_index", "alias": "goods_index" }},

        { "add":    { "index": "my_index_new", "alias": "goods_index" }}

    ]

}

反馈信息

{

  "acknowledged": true

}

(12)直接通过goods_index别名来查询,是否ok

GET /goods_index/my_type/_search

{

  "query": {

    "match_all": {}

  }

}

反馈信息:

{

  "took": 1,

  "timed_out": false,

  "_shards": {

    "total": 5,

    "successful": 5,

    "failed": 0

  },

  "hits": {

    "total": 1,

    "max_score": 1,

    "hits": [

      {

        "_index": "my_index_new",

        "_type": "my_type",

        "_id": "2",

        "_score": 1,

        "_source": {

          "title": "2017-01-02"

        }

      }

    ]

  }

}

查看新的index 的 type类型

GET /goods_index/_mapping/my_type

{

  "my_index_new": {

    "mappings": {

      "my_type": {

        "properties": {

          "title": {

            "type": "text"

          }

        }

      }

    }

  }

}

插入string类型的数据

PUT /goods_index/my_type/6

{

  "title":"hello elasticsearch"

}

反馈信息

{

  "_index": "my_index_new",

  "_type": "my_type",

  "_id": "6",

  "_version": 1,

  "result": "created",

  "_shards": {

    "total": 2,

    "successful": 1,

    "failed": 0

  },

  "created": true

}

查询所有信息

GET /goods_index/my_type/_search

{

  "query": {

    "match_all": {}

  }

}

反馈信息

{

  "took": 2,

  "timed_out": false,

  "_shards": {

    "total": 5,

    "successful": 5,

    "failed": 0

  },

  "hits": {

    "total": 2,

    "max_score": 1,

    "hits": [

      {

        "_index": "my_index_new",

        "_type": "my_type",

        "_id": "2",

        "_score": 1,

        "_source": {

          "title": "2017-01-02"

        }

      },

      {

        "_index": "my_index_new",

        "_type": "my_type",

        "_id": "6",

        "_score": 1,

        "_source": {

          "title": "hello elasticsearch"

        }

      }

    ]

  }

}

3、总结

string 类型数据 可以 添加到新的index在中,并且原来的信息已经导入(本次实例导入原来的一条数据)

继续阅读