ElasticSearch

简介
安装
安装包结构
配置文件
系统配置
图形界面head插件
REST相关API
- 索引库_创建表
- 映射_创建表字段
- 文档_记录
IK分词器
- 检查分词器
- 下载、安装
- 测试
- 自定义分词
示例
- 字符串text
- 字符串keyword
映射调整方案
- 添加字段并赋值

简介

ElasticSearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎，基于RESTful web接口。Elasticsearch是用Java开发的，并作为Apache许可条款下的开放源码发布，是当前流行的企业级搜索引擎。设计用于云计算中，能够达到实时搜索，稳定，可靠，快速，安装使用方便。

安装

1、新版本要求至少jdk1.8以上。

2、支持tar、zip、rpm等多种安装方式。

在windows下开发建议使用ZIP安装方式。

3、支持docker方式安装

详细参见：https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html

下载ES: Elasticsearch 6.2.1

https://www.elastic.co/downloads/past-releases

解压 elasticsearch-6.2.1.zip

安装包结构

bin：脚本目录，包括：启动、停止等可执行脚本

config：配置文件目录

data：索引目录，存放索引文件的地方

logs：日志目录

modules：模块目录，包括了es的功能模块

plugins :插件目录，es支持插件机制

配置文件

（1）elasticsearch.yml ：用于配置Elasticsearch运行参数

cluster.name: plxc
node.name: node_1
network.host: 0.0.0.0
http.port: 9200 #设置对外服务的http端口
transport.tcp.port: 9300  #集群结点之间通信端口
node.master: true  #是否有资格被选举成为master结点
node.data: true # 指定该节点是否存储索引数据
#discovery.zen.ping.unicast.hosts: ["0.0.0.0:9300", "0.0.0.0:9301", "0.0.0.0:9302"] #设置集群中master节点的初始列表
discovery.zen.ping.timeout: 3s #设置ES自动发现节点连接超时的时间
discovery.zen.minimum_master_nodes: 1  #主结点数量的最少值
bootstrap.memory_lock: false
node.max_local_storage_nodes: 1
path.data: D:\ElasticSearch\elasticsearch‐6.2.1\data
path.logs: D:\ElasticSearch\elasticsearch‐6.2.1\logs
http.cors.enabled: true  #开启cors跨域访问支持
http.cors.allow‐origin: /.*/

（2）jvm.options ：用于配置Elasticsearch JVM设置

设置最小及最大的JVM堆内存大小：
在jvm.options中设置 -Xms和-Xmx：
1） 两个值设置为相等
2） 将 Xmx 设置为不超过物理内存的一半

（3）log4j2.properties：用于配置Elasticsearch日志

日志文件设置，ES使用log4j，注意日志级别的配置。

系统配置

在linux上根据系统资源情况，可将每个进程最多允许打开的文件数设置大些。

sudo su
ulimit ‐n 65536
su elasticsearch  # 切换elasticsearch用户
vim /etc/security/limits.conf  # 添加如下行
elasticsearch ‐ nofile 65536

图形界面head插件

head插件是ES的一个可视化管理插件，用来监视ES的状态，并通过head客户端和ES服务进行交互，比如创建映射、创建索引等，head的项目地址在https://github.com/mobz/elasticsearch-head

从ES6.0开始，head插件支持使得node.js运行。

（1）安装node.js

（2）下载，运行，默认端口9100

git clone git://github.com/mobz/elasticsearch-head.git 
cd elasticsearch-head 
npm install 
npm run start

REST相关API

索引库_创建表

同一个索引库中存储了相同类型的文档。它就相当于MySQL中的表，或相当于Mongodb中的集合。

put http://localhost:9200/plxc
{
    "settings":{
	    "index":{
		    "number_of_shards":1,
		    "number_of_replicas":0
	    }
    }
}

number_of_shards：设置分片的数量，在集群中通常设置多个分片

number_of_replicas：设置副本的数量，设置副本是为了提高ES的高可靠性

映射_创建表字段

注意：6.0之前的版本有type（类型）概念，type相当于关系数据库的表，ES官方将在ES9.0版本中彻底删除type。

post  http://localhost:9200/plxc/doc/_mapping
{
    "properties": {
	    "name": {
	   	 	"type": "text"
	    },
	    "description": {
	   		 "type": "text"
	    },
	    "studymodel": {
	    	"type": "keyword"
	    }
    }
}

文档_记录

（1）创建记录

如果不指定id值ES会自动生成ID

put 或Post http://localhost:9200/plxc/doc/1023e58161bcf7f40161bcf8b77c3123 
{
    "name":"Bootstrap开发框架",
    "description":"Bootstrap是由Twitter推出的一个前台页面开发框架，在行业之中使用较为广泛。",
    "studymodel":"101011"
}

（2）搜索指定主键记录

get http://localhost:9200/plxc/doc/1023e58161bcf7f40161bcf8b77c3123

（3）查询所有记录

get http://localhost:9200/plxc/doc/_search

（4）查询名称中包括spring 关键字的的记录

get http://localhost:9200/plxc/doc/_search?q=name:bootstrap

IK分词器

ik分词器有两种分词模式：ik_max_word和ik_smart模式。

1、ik_max_word

会将文本做最细粒度的拆分，比如会将“中华人民共和国人民大会堂”拆分为“中华人民共和国、中华人民、中华、华人、人民共和国、人民、共和国、大会堂、大会、会堂等词语。

2、ik_smart

会做最粗粒度的拆分，比如会将“中华人民共和国人民大会堂”拆分为中华人民共和国、人民大会堂。

检查分词器

post  localhost:9200/_analyze
{
	"text":"测试分词器"
}

结果将测试这个分开，表示当前未使用到中文分词器

下载、安装

Github地址：https://github.com/medcl/elasticsearch-analysis-ik

解压，并将解压的文件拷贝到ES安装目录的plugins下的ik目录下

测试

post  localhost:9200/_analyze
{
	"text":"测试分词器",
	"analyzer":"ik_max_word"
}

自定义分词

iK分词器自带一个main.dic的文件，此文件为词库文件。

在上边的目录中新建一个my.dic文件（注意文件格式为utf-8（不要选择utf-8 BOM））其中每行为指定词汇。

修改IKAnalyzer.cfg.xml配置文件，ext_dict指定值为my.dic，重启ES。

示例

post：http://localhost:9200/plxc/doc/_mapping
{
    "properties": {
	    "description": {
		    "type": "text",
		    "analyzer": "ik_max_word",
		    "search_analyzer": "ik_smart"
	    },
	    "name": {
		    "type": "text",
		    "analyzer": "ik_max_word",
		    "search_analyzer": "ik_smart"
	    },
	    "pic":{
		    "type":"text",
		    "index":false
	    },
	    "price": {
	  	 	 "type": "float"
	    },
	    "studymodel": {
	   		 "type": "keyword"
	    },
	    "timestamp": {
		    "type": "date",
		    "format": "yyyy‐MM‐dd HH:mm:ss||yyyy‐MM‐dd||epoch_millis"
	    }
    }
}

字符串text

text：

通过analyzer属性指定分词器，指在索引和搜索都使用。

通过search_analyzer属性指定分词器，单独想定义搜索时使用的分词器。

通过index属性指定是否索引，商品图片地址只被用来展示图片，不进行搜索图片，此时可以将index设置为false。

字符串keyword

keyword：

通常搜索keyword是按照整体搜索，所以创建keyword字段的索引时是不进行分词的。

比如：邮政编码、手机号码、身份证等。keyword字段通常用于过虑、排序、聚合等。

映射调整方案

添加字段并赋值

（1）添加字段

PUT http://localhost:9200/plxc/doc/_mapping
{
     "properties": {
        "TimeFormat": {
            "type": "date",
            "format": "yyyy-MM-dd HH:mm:ss"
        }
    }
}

（2）赋值工具类示例

public static void updateHourByScroll(String Type) throws IOException, ExecutionException, InterruptedException {
    System.out.println("scroll 模式启动！");
    Date  begin = new Date();
    SearchResponse scrollResponse = client.prepareSearch(Index).setTypes(TYPE)
            .setSearchType(SearchType.SCAN).setSize(5000).setScroll(TimeValue.timeValueMinutes(1))
            .execute().actionGet();
    long count = scrollResponse.getHits().getTotalHits();//第一次不返回数据
    for(int i=0,sum=0; sum<count; i++) {
        scrollResponse = client.prepareSearchScroll(scrollResponse.getScrollId())
                .setScroll(TimeValue.timeValueMinutes(8))
                .execute().actionGet();
        sum += scrollResponse.getHits().hits().length;
 
        SearchHits searchHits = scrollResponse.getHits();
        List<UpdateRequest> list = new ArrayList<UpdateRequest>();
        for (SearchHit hit : searchHits) {
            String id = hit.getId();
            Map<String, Object> source = hit.getSource();
            if (source.containsKey("TimeFormat")) {   //这个很重要，如果中间过程失败了，在执行时，起到过滤作用，提高效率。
                System.out.println("TimeFormat已经存在！");
            }else{
            Integer year = Integer.valueOf(source.get("Year").toString());
            Integer month = Integer.valueOf(source.get("Mon").toString());
            Integer day = Integer.valueOf(source.get("Day").toString());
            Integer hour = 0;
             if(source.containsKey(""Hour"")){   //处理Hour不存在的情况
                  hour = Integer.valueOf(source.get("Hour").toString());
             }else{
                  hour = 0;
             }
 
            String time = getyear_month_day_hour(year, month, day, hour); //这个方法自定义，用来生成新字段TimeFormat的值，按需修改即可。
            System.out.println(time);
            UpdateRequest uRequest = new UpdateRequest()
                    .index(Index)
                    .type(Type)
                    .id(id)
                    .doc(jsonBuilder().startObject().field("TimeFormat", time).endObject());
            list.add(uRequest); 
            //client.update(uRequest).get();  //注释上一行，就是单个提交，大数据量效率很低，用一个list来使用bulk，批量提高效率
        }
    }
        // 批量执行
        BulkRequestBuilder bulkRequest = client.prepareBulk();
        for (UpdateRequest uprequest : list) {
            bulkRequest.add(uprequest);
        }
 
        BulkResponse bulkResponse = bulkRequest.execute().actionGet();
 
        if (bulkResponse.hasFailures()) {
            System.out.println("批量错误！");
        }
 
        System.out.println("总量" + count + " 已经查到" + sum);
    }
    Date  end = new Date();
    System.out.println("耗时: "+(end.getTime()-begin.getTime()));
}

ElasticSearch简介安装安装包结构配置文件系统配置图形界面head插件REST相关APIIK分词器示例映射调整方案

ElasticSearch

简介

安装

安装包结构

配置文件

系统配置

图形界面head插件

REST相关API

索引库_创建表

映射_创建表字段

文档_记录

IK分词器

检查分词器

下载、安装

测试

自定义分词

示例

字符串text

字符串keyword

映射调整方案

添加字段并赋值

继续阅读

HDFS命令行工具

Linux下ssh秘钥方式登录远程服务器

Linux命令集锦：scp命令一、语法二、实例

docker 命令集锦

LINUX常见命令集锦

windows开始→运行→输入的命令集锦 winver---------检查Windows版本 w

更改LYNC SIP地址

Storm编译打包过程中遇到的一些问题及解决方法

ansible配置文件说明及ad hoc命令

vsftpd dead but subsys locked 的解决方法

Shell编程——sort排序、uniq忽略重复、tr替换压缩删除、cut指定删除字段、正则表达式元字符sort 命令uniq 命令tr 命令cut 命令正则表达式

Linxu常用命令技巧汇总

httpd服务的部署、启动、配置和简单优化一、部署二、启动三、配置文件

《Linux命令行与Shell脚本编程大全第2版.布卢姆》pdf

nginx 安装错误信息解决

Ambari介绍和架构原理