ElasticSearch

簡介
安裝
安裝包結構
配置檔案
系統配置
圖形界面head插件
REST相關API
- 索引庫_建立表
- 映射_建立表字段
- 文檔_記錄
IK分詞器
- 檢查分詞器
- 下載下傳、安裝
- 測試
- 自定義分詞
示例
- 字元串text
- 字元串keyword
映射調整方案
- 添加字段并指派

簡介

ElasticSearch是一個基于Lucene的搜尋伺服器。它提供了一個分布式多使用者能力的全文搜尋引擎，基于RESTful web接口。Elasticsearch是用Java開發的，并作為Apache許可條款下的開放源碼釋出，是目前流行的企業級搜尋引擎。設計用于雲計算中，能夠達到實時搜尋，穩定，可靠，快速，安裝使用友善。

安裝

1、新版本要求至少jdk1.8以上。

2、支援tar、zip、rpm等多種安裝方式。

在windows下開發建議使用ZIP安裝方式。

3、支援docker方式安裝

詳細參見：https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html

下載下傳ES: Elasticsearch 6.2.1

https://www.elastic.co/downloads/past-releases

解壓 elasticsearch-6.2.1.zip

安裝包結構

bin：腳本目錄，包括：啟動、停止等可執行腳本

config：配置檔案目錄

data：索引目錄，存放索引檔案的地方

logs：日志目錄

modules：子產品目錄，包括了es的功能子產品

plugins :插件目錄，es支援插件機制

配置檔案

（1）elasticsearch.yml ：用于配置Elasticsearch運作參數

cluster.name: plxc
node.name: node_1
network.host: 0.0.0.0
http.port: 9200 #設定對外服務的http端口
transport.tcp.port: 9300  #叢集結點之間通信端口
node.master: true  #是否有資格被選舉成為master結點
node.data: true # 指定該節點是否存儲索引資料
#discovery.zen.ping.unicast.hosts: ["0.0.0.0:9300", "0.0.0.0:9301", "0.0.0.0:9302"] #設定叢集中master節點的初始清單
discovery.zen.ping.timeout: 3s #設定ES自動發現節點連接配接逾時的時間
discovery.zen.minimum_master_nodes: 1  #主結點數量的最少值
bootstrap.memory_lock: false
node.max_local_storage_nodes: 1
path.data: D:\ElasticSearch\elasticsearch‐6.2.1\data
path.logs: D:\ElasticSearch\elasticsearch‐6.2.1\logs
http.cors.enabled: true  #開啟cors跨域通路支援
http.cors.allow‐origin: /.*/

（2）jvm.options ：用于配置Elasticsearch JVM設定

設定最小及最大的JVM堆記憶體大小：
在jvm.options中設定 -Xms和-Xmx：
1） 兩個值設定為相等
2） 将 Xmx 設定為不超過實體記憶體的一半

（3）log4j2.properties：用于配置Elasticsearch日志

日志檔案設定，ES使用log4j，注意日志級别的配置。

系統配置

在linux上根據系統資源情況，可将每個程序最多允許打開的檔案數設定大些。

sudo su
ulimit ‐n 65536
su elasticsearch  # 切換elasticsearch使用者
vim /etc/security/limits.conf  # 添加如下行
elasticsearch ‐ nofile 65536

圖形界面head插件

head插件是ES的一個可視化管理插件，用來監視ES的狀态，并通過head用戶端和ES服務進行互動，比如建立映射、建立索引等，head的項目位址在https://github.com/mobz/elasticsearch-head

從ES6.0開始，head插件支援使得node.js運作。

（1）安裝node.js

（2）下載下傳，運作，預設端口9100

git clone git://github.com/mobz/elasticsearch-head.git 
cd elasticsearch-head 
npm install 
npm run start

REST相關API

索引庫_建立表

同一個索引庫中存儲了相同類型的文檔。它就相當于MySQL中的表，或相當于Mongodb中的集合。

put http://localhost:9200/plxc
{
    "settings":{
	    "index":{
		    "number_of_shards":1,
		    "number_of_replicas":0
	    }
    }
}

number_of_shards：設定分片的數量，在叢集中通常設定多個分片

number_of_replicas：設定副本的數量，設定副本是為了提高ES的高可靠性

映射_建立表字段

注意：6.0之前的版本有type（類型）概念，type相當于關系資料庫的表，ES官方将在ES9.0版本中徹底删除type。

post  http://localhost:9200/plxc/doc/_mapping
{
    "properties": {
	    "name": {
	   	 	"type": "text"
	    },
	    "description": {
	   		 "type": "text"
	    },
	    "studymodel": {
	    	"type": "keyword"
	    }
    }
}

文檔_記錄

（1）建立記錄

如果不指定id值ES會自動生成ID

put 或Post http://localhost:9200/plxc/doc/1023e58161bcf7f40161bcf8b77c3123 
{
    "name":"Bootstrap開發架構",
    "description":"Bootstrap是由Twitter推出的一個前台頁面開發架構，在行業之中使用較為廣泛。",
    "studymodel":"101011"
}

（2）搜尋指定主鍵記錄

get http://localhost:9200/plxc/doc/1023e58161bcf7f40161bcf8b77c3123

（3）查詢所有記錄

get http://localhost:9200/plxc/doc/_search

（4）查詢名稱中包括spring 關鍵字的的記錄

get http://localhost:9200/plxc/doc/_search?q=name:bootstrap

IK分詞器

ik分詞器有兩種分詞模式：ik_max_word和ik_smart模式。

1、ik_max_word

會将文本做最細粒度的拆分，比如會将“中華人民共和國人民大會堂”拆分為“中華人民共和國、中華人民、中華、華人、人民共和國、人民、共和國、大會堂、大會、會堂等詞語。

2、ik_smart

會做最粗粒度的拆分，比如會将“中華人民共和國人民大會堂”拆分為中華人民共和國、人民大會堂。

檢查分詞器

post  localhost:9200/_analyze
{
	"text":"測試分詞器"
}

結果将測試這個分開，表示目前未使用到中文分詞器

下載下傳、安裝

Github位址：https://github.com/medcl/elasticsearch-analysis-ik

解壓，并将解壓的檔案拷貝到ES安裝目錄的plugins下的ik目錄下

測試

post  localhost:9200/_analyze
{
	"text":"測試分詞器",
	"analyzer":"ik_max_word"
}

自定義分詞

iK分詞器自帶一個main.dic的檔案，此檔案為詞庫檔案。

在上邊的目錄中建立一個my.dic檔案（注意檔案格式為utf-8（不要選擇utf-8 BOM））其中每行為指定詞彙。

修改IKAnalyzer.cfg.xml配置檔案，ext_dict指定值為my.dic，重新開機ES。

示例

post：http://localhost:9200/plxc/doc/_mapping
{
    "properties": {
	    "description": {
		    "type": "text",
		    "analyzer": "ik_max_word",
		    "search_analyzer": "ik_smart"
	    },
	    "name": {
		    "type": "text",
		    "analyzer": "ik_max_word",
		    "search_analyzer": "ik_smart"
	    },
	    "pic":{
		    "type":"text",
		    "index":false
	    },
	    "price": {
	  	 	 "type": "float"
	    },
	    "studymodel": {
	   		 "type": "keyword"
	    },
	    "timestamp": {
		    "type": "date",
		    "format": "yyyy‐MM‐dd HH:mm:ss||yyyy‐MM‐dd||epoch_millis"
	    }
    }
}

字元串text

text：

通過analyzer屬性指定分詞器，指在索引和搜尋都使用。

通過search_analyzer屬性指定分詞器，單獨想定義搜尋時使用的分詞器。

通過index屬性指定是否索引，商品圖檔位址隻被用來展示圖檔，不進行搜尋圖檔，此時可以将index設定為false。

字元串keyword

keyword：

通常搜尋keyword是按照整體搜尋，是以建立keyword字段的索引時是不進行分詞的。

比如：郵政編碼、手機号碼、身份證等。keyword字段通常用于過慮、排序、聚合等。

映射調整方案

添加字段并指派

（1）添加字段

PUT http://localhost:9200/plxc/doc/_mapping
{
     "properties": {
        "TimeFormat": {
            "type": "date",
            "format": "yyyy-MM-dd HH:mm:ss"
        }
    }
}

（2）指派工具類示例

public static void updateHourByScroll(String Type) throws IOException, ExecutionException, InterruptedException {
    System.out.println("scroll 模式啟動！");
    Date  begin = new Date();
    SearchResponse scrollResponse = client.prepareSearch(Index).setTypes(TYPE)
            .setSearchType(SearchType.SCAN).setSize(5000).setScroll(TimeValue.timeValueMinutes(1))
            .execute().actionGet();
    long count = scrollResponse.getHits().getTotalHits();//第一次不傳回資料
    for(int i=0,sum=0; sum<count; i++) {
        scrollResponse = client.prepareSearchScroll(scrollResponse.getScrollId())
                .setScroll(TimeValue.timeValueMinutes(8))
                .execute().actionGet();
        sum += scrollResponse.getHits().hits().length;
 
        SearchHits searchHits = scrollResponse.getHits();
        List<UpdateRequest> list = new ArrayList<UpdateRequest>();
        for (SearchHit hit : searchHits) {
            String id = hit.getId();
            Map<String, Object> source = hit.getSource();
            if (source.containsKey("TimeFormat")) {   //這個很重要，如果中間過程失敗了，在執行時，起到過濾作用，提高效率。
                System.out.println("TimeFormat已經存在！");
            }else{
            Integer year = Integer.valueOf(source.get("Year").toString());
            Integer month = Integer.valueOf(source.get("Mon").toString());
            Integer day = Integer.valueOf(source.get("Day").toString());
            Integer hour = 0;
             if(source.containsKey(""Hour"")){   //處理Hour不存在的情況
                  hour = Integer.valueOf(source.get("Hour").toString());
             }else{
                  hour = 0;
             }
 
            String time = getyear_month_day_hour(year, month, day, hour); //這個方法自定義，用來生成新字段TimeFormat的值，按需修改即可。
            System.out.println(time);
            UpdateRequest uRequest = new UpdateRequest()
                    .index(Index)
                    .type(Type)
                    .id(id)
                    .doc(jsonBuilder().startObject().field("TimeFormat", time).endObject());
            list.add(uRequest); 
            //client.update(uRequest).get();  //注釋上一行，就是單個送出，大資料量效率很低，用一個list來使用bulk，批量提高效率
        }
    }
        // 批量執行
        BulkRequestBuilder bulkRequest = client.prepareBulk();
        for (UpdateRequest uprequest : list) {
            bulkRequest.add(uprequest);
        }
 
        BulkResponse bulkResponse = bulkRequest.execute().actionGet();
 
        if (bulkResponse.hasFailures()) {
            System.out.println("批量錯誤！");
        }
 
        System.out.println("總量" + count + " 已經查到" + sum);
    }
    Date  end = new Date();
    System.out.println("耗時: "+(end.getTime()-begin.getTime()));
}

ElasticSearch簡介安裝安裝包結構配置檔案系統配置圖形界面head插件REST相關APIIK分詞器示例映射調整方案

ElasticSearch

簡介

安裝

安裝包結構

配置檔案

系統配置

圖形界面head插件

REST相關API

索引庫_建立表

映射_建立表字段

文檔_記錄

IK分詞器

檢查分詞器

下載下傳、安裝

測試

自定義分詞

示例

字元串text

字元串keyword

映射調整方案

添加字段并指派

繼續閱讀

HDFS指令行工具

Linux下ssh秘鑰方式登入遠端伺服器

Linux指令集錦：scp指令一、文法二、執行個體

docker 指令集錦

LINUX常見指令集錦

windows開始→運作→輸入的指令集錦 winver---------檢查Windows版本 w

更改LYNC SIP位址

Storm編譯打包過程中遇到的一些問題及解決方法

ansible配置檔案說明及ad hoc指令

vsftpd dead but subsys locked 的解決方法

Shell程式設計——sort排序、uniq忽略重複、tr替換壓縮删除、cut指定删除字段、正規表達式元字元sort 指令uniq 指令tr 指令cut 指令正規表達式

Linxu常用指令技巧彙總

httpd服務的部署、啟動、配置和簡單優化一、部署二、啟動三、配置檔案

《Linux指令行與Shell腳本程式設計大全第2版.布盧姆》pdf

nginx 安裝錯誤資訊解決

Ambari介紹和架構原理