elasticsearch 配置檔案

elasticsearch的config檔案夾裡面有兩個配置檔案：elasticsearch.yml和logging.yml，第一個是es的基本配置檔案，第二個是日志配置檔案，es也是使用log4j來記錄日志的，是以logging.yml裡的設定按普通log4j配置檔案來設定就行了。下面主要講解下elasticsearch.yml這個檔案中可配置的東西。

配置es的叢集名稱，預設是elasticsearch，es會自動發現在同一網段下的es，如果在同一網段下有多個叢集，就可以用這個屬性來區分不同的叢集。

cluster.name: elasticsearch

節點名，預設随機指定一個name清單中名字，該清單在es的jar包中config檔案夾裡name.txt檔案中，其中有很多作者添加的有趣名字。

node.name: "franz kafka"

指定該節點是否有資格被選舉成為node，預設是true，es是預設叢集中的第一台機器為master，如果這台機挂了就會重新選舉master。

node.master: true

指定該節點是否存儲索引資料，預設為true。

node.data: true

設定預設索引分片個數，預設為5片。

index.number_of_shards: 5

設定預設索引副本個數，預設為1個副本。

index.number_of_replicas: 1

設定配置檔案的存儲路徑，預設是es根目錄下的config檔案夾。

path.conf: /path/to/conf

設定索引資料的存儲路徑，預設是es根目錄下的data檔案夾

path.data: /path/to/data

可以設定多個存儲路徑，用逗号隔開，例：

path.data: /path/to/data1,/path/to/data2

設定臨時檔案的存儲路徑，預設是es根目錄下的work檔案夾。

path.work: /path/to/work

設定日志檔案的存儲路徑，預設是es根目錄下的logs檔案夾

path.logs: /path/to/logs

設定插件的存放路徑，預設是es根目錄下的plugins檔案夾

path.plugins: /path/to/plugins

強制所有記憶體鎖定，不要搞什麼swap的來影響性能

設定為true來鎖住記憶體。因為當jvm開始swapping時es的效率會降低，是以要保證它不swap，可以把es_min_mem和es_max_mem兩個環境變量設定成同一個值，并且保證機器有足夠的記憶體配置設定給es。同時也要允許elasticsearch的程序可以鎖住記憶體，linux下可以通過`ulimit

-l unlimited`指令。

bootstrap.mlockall: true

設定綁定的ip位址，可以是ipv4或ipv6的，預設為0.0.0.0。

network.bind_host: 192.168.0.1

設定其它節點和該節點互動的ip位址，如果不設定它會自動判斷，值必須是個真實的ip位址。

network.publish_host: 192.168.0.1

這個參數是用來同時設定bind_host和publish_host上面兩個參數。

network.host: 192.168.0.1

設定節點間互動的tcp端口，預設是9300。

transport.tcp.port: 9300

設定是否壓縮tcp傳輸時的資料，預設為false，不壓縮。

transport.tcp.compress: true

設定對外服務的http端口，預設為9200。

http.port: 9200

設定内容的最大容量，預設100mb

http.max_content_length: 100mb

是否使用http協定對外提供服務，預設為true，開啟。

http.enabled: false

網絡配置

#network.tcp.keep_alive : true

#network.tcp.send_buffer_size : 8192

#network.tcp.receive_buffer_size : 8192

自動發現相關配置

#discovery.zen.fd.connect_on_network_disconnect : true

#discovery.zen.initial_ping_timeout : 10s

#discovery.zen.fd.ping_interval : 2s

#discovery.zen.fd.ping_retries : 10

the gateway snapshot interval (only applies to shared gateways).

#index.gateway.snapshot_interval : 1s

分片異步重新整理時間間隔

#index.refresh_interval : -1

set to an actual value (like 0-all) or false to disable it.

index.auto_expand_replicas

set to true to have the index read only. false to allow writes and metadata changes.

index.blocks.read_only

set to true to disable read operations against the index.

index.blocks.read

set to true to disable write operations against the index.

index.blocks.write

set to true to disable metadata operations against the index.

index.blocks.metadata

lucene index term間隔，僅用于新建立的doc

index.term_index_interval

lucene reader term index divisor

index.term_index_divisor

when to flush based on operations.

index.translog.flush_threshold_ops

when to flush based on translog (bytes) size.

index.translog.flush_threshold_size

when to flush based on a period of not flushing.

index.translog.flush_threshold_period

disables flushing. note, should be set for a short interval and then enabled.

index.translog.disable_flush

the maximum size of filter cache (per segment in shard). set to -1 to disable.

index.cache.filter.max_size

the expire after access time for filter cache. set to -1 to disable.

index.cache.filter.expire

merge policy

all the settings for the merge policy currently configured. a different merge policy can’t be set.

a node matching any rule will be allowed to host shards from the index.

index.routing.allocation.include.*

a node matching any rule will not be allowed to host shards from the index.

index.routing.allocation.exclude.*

only nodes matching all rules will be allowed to host shards from the index.

index.routing.allocation.require.*

controls the total number of shards allowed to be allocated on a single node. defaults to unbounded (-1).

index.routing.allocation.total_shards_per_node

when using local gateway a particular shard is recovered only if there can be allocated quorum shards in the cluster. it can be set to quorum (default),

quorum-1 (or half), full and full-1. number values are also supported, e.g. 1.

index.recovery.initial_shards

disables temporarily the purge of expired docs.

index.ttl.disable_purge

預設索引合并因子

#index.merge.policy.merge_factor : 100

#index.merge.policy.min_merge_docs : 1000

#index.merge.policy.use_compound_file : true

#indices.memory.index_buffer_size : 5%

gateway相關配置

當叢集期望節點達不到的時候，叢集就會處于block，無法正常索引和查詢，說明叢集中某個節點未能正常啟動，這正是我們期望的效果，block住，避免照成資料的不一緻。

gateway的類型，預設為local即為本地檔案系統，可以設定為本地檔案系統，分布式檔案系統，hadoop的hdfs，和amazon的s3伺服器，其它檔案系統的設定方法下次再詳細說。

gateway.type: local

設定叢集中n個節點啟動時進行資料恢複，預設為1。

gateway.recover_after_nodes: 1

設定初始化資料恢複程序的逾時時間，預設是5分鐘。

gateway.recover_after_time: 5m

設定這個叢集中節點的數量，預設為2，一旦這n個節點啟動，就會立即進行資料恢複。

gateway.expected_nodes: 2

初始化資料恢複時，并發恢複線程的個數，預設為4。

cluster.routing.allocation.node_initial_primaries_recoveries: 4

添加删除節點或負載均衡時并發恢複線程的個數，預設為4。

cluster.routing.allocation.node_concurrent_recoveries: 2

設定資料恢複時限制的帶寬，如入100mb，預設為0，即無限制。

indices.recovery.max_size_per_sec: 0

設定這個參數來限制從其它分片恢複資料時最大同時打開并發流的個數，預設為5。

indices.recovery.concurrent_streams: 5

設定這個參數來保證叢集中的節點可以知道其它n個有master資格的節點。預設為1，對于大的叢集來說，可以設定大一點的值（2-4）。

discovery.zen.minimum_master_nodes: 1

設定叢集中自動發現其它節點時ping連接配接逾時時間，預設為3秒，對于比較差的網絡環境可以高點的值來防止自動發現時出錯。

discovery.zen.ping.timeout: 3s

discovery.zen.ping.multicast.enabled: false

設定是否打開多點傳播發現節點，預設是true。

當禁用multcast廣播的時候，可以手動設定叢集的節點ip

設定叢集中master節點的初始清單，可以通過這些節點來自動發現新加入叢集的節點。

discovery.zen.ping.unicast.hosts: ["host1", "host2:port", "host3[portx-porty]"]

下面是一些查詢時的慢日志參數設定

index.search.slowlog.level: trace

index.search.slowlog.threshold.query.warn: 10s

index.search.slowlog.threshold.query.info: 5s

index.search.slowlog.threshold.query.debug: 2s

index.search.slowlog.threshold.query.trace: 500ms

index.search.slowlog.threshold.fetch.warn: 1s

index.search.slowlog.threshold.fetch.info: 800ms

index.search.slowlog.threshold.fetch.debug:500ms

index.search.slowlog.threshold.fetch.trace: 200ms

1.設定cache大小和過期時間。

index.cache.field.max_size

index.cache.field.expire

例如設定：

//index中每個segment中可包含的最大的entries數目

index.cache.field.max_size: 50000

//過期時間為10分鐘

index.cache.field.expire: 10m

2.改變cache類型。

index.cache.field.type: soft

預設類型為resident，字面意思是常駐（居民），一直增加，直到記憶體耗盡。改為soft就是當記憶體不足的時候，先clear掉占用的，然後再往記憶體中放。設定為soft後，相當于設定成了相對的記憶體大小。resident的話，除非記憶體夠大。

3.對資料進行處理。

文章中提到的是減小字段值長度，如将大寫轉成小寫。

這點上，實際中可能将資料精煉。當然，也可以把要做facet的字段做一個轉化，用int型代替。

關于string轉化int呢，可以參考m大神的: https://github.com/medcl/elasticsearch-analysis-string2int

elasticsearch 配置檔案

繼續閱讀

關于Gradle配置的小結

Java小案例——随機數猜測随機數猜測

nginx location中斜線的位置的重要性

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的簡單使用

neo4j之cypher使用文檔

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

mybatis_入門程式Mybatis入門

AOP程式設計_Android優雅權限架構(1)概念基礎，2021金三銀四前言正文大綱正文

Effective Java 8:通用程式設計

OOM三種類型

工廠模式-三種類型

【遞歸】高效率求2的n次幂

win10本地scala和spark安裝安裝scala安裝spark

scala (3) Function 和 Method