elasticsearch-jdbc實作MySQL同步到ElasticSearch深入詳解

1、如何實作mysql與elasticsearch的資料同步？

逐條轉換為json顯然不合适，需要借助第三方工具或者自己實作。核心功能點：同步增、删、改、查同步。

2、mysql與elasticsearch同步的方法有哪些？優缺點對比？

目前該領域比較牛的插件有：

1）elasticsearch-jdbc，嚴格意義上它已經不是第三方插件。已經成為獨立的第三方工具。

https://github.com/jprante/elasticsearch-jdbc

2）elasticsearch-river-mysql插件

https://github.com/scharron/elasticsearch-river-mysql

3）go-mysql-elasticsearch（國内作者siddontang）

https://github.com/siddontang/go-mysql-elasticsearch

1-3同步工具/插件對比：

go-mysql-elasticsearch仍處理開發不穩定階段。

為什麼選擇elasticsearch-jdbc而不是elasticsearch-river-mysql插件的原因？

（參考：

http://stackoverflow.com/questions/23658534/using-elasticsearch-river-mysql-to-stream-data-from-mysql-database-to-elasticsea

）

1）通用性角度：elasticsearch-jdbc更通用，

2）版本更新角度：elasticsearch-jdbc GitHub活躍度很高，最新的版本2.3.3.02016年5月28日相容Elasticsearch2.3.3版本。

而elasticsearch-river-mysql 2012年12月13日後便不再更新。

綜上，選擇elasticsearch-jdbc作為mysql同步Elasticsearch的工具理所當然。

elasticsearch-jdbc的缺點與不足（他山之石）：

1）go-mysql-elasticsearch作者siddontang在部落格提到的：

elasticsearch-river-jdbc的功能是很強大，但并沒有很好的支援增量資料更新的問題，它需要對應的表隻增不減，而這個幾乎在項目中是不可能辦到的。

http://www.jianshu.com/p/05cff717563c

2）部落客leotse90在博文中提到elasticsearch-jdbc的缺點：那就是删除操作不能同步（實體删除）！

http://leotse90.com/2015/11/11/ElasticSearch

與MySQL資料同步以及修改表結構/

我截止2016年6月16日沒有測試到，不妄加評論。

elasticsearch-jdbc實作MySQL同步到ElasticSearch深入詳解

3、elasticsearch-jdbc如何使用？要不要安裝？

3.1 和早期版本不同點

elasticsearch-jdbcV2.3.2.0版本不需要安裝。以下筆者使用的elasticsearch也是2.3.2測試。

作業系統：CentOS release 6.6 (Final)

看到這裡，你可能會問早期的版本有什麼不同呢？很大不同。從我搜集資料來看，不同點如下：

1）早期1.x版本，作為插件，需要安裝。

2）配置也會有不同。

3.2 elasticsearch-jdbc使用(同步方法一）

前提：

1）elasticsearch 2.3.2 安裝成功，測試ok。

2）mysql安裝成功，能實作增、删、改、查。

可供測試的資料庫為test，表為cc，具體資訊如下：

mysql> select * from cc;
+----+------------+
| id | name |
+----+------------+
| 1 | laoyang |
| 2 | dluzhang |
| 3 | dlulaoyang |
+----+------------+
3 rows in set (0.00 sec)

第一步：下載下傳工具。

位址：

http://xbib.org/repository/org/xbib/elasticsearch/importer/elasticsearch-jdbc/2.3.2.0/elasticsearch-jdbc-2.3.2.0-dist.zip

第二步：導入Centos。路徑自己定，筆者放到根目錄下，解壓。unzip elasticsearch-jdbc-2.3.2.0-dist.zip

第三步：設定環境變量。

[root@5b9dbaaa148a /]# vi /etc/profile 
export JDBC_IMPORTER_HOME=/elasticsearch-jdbc-2.3.2.0

使環境變量生效：

[root@5b9dbaaa148a /]# source /etc/profile

第四步：配置使用。詳細參考：

1）根目錄下建立檔案夾odbc_es 如下：

[root@5b9dbaaa148a /]# ll /odbc_es/ 
drwxr-xr-x 2 root root 4096 Jun 16 03:11 logs 
-rwxrwxrwx 1 root root 542 Jun 16 04:03 mysql_import_es.sh

2）建立腳本mysql_import_es.sh，内容如下；

[root@5b9dbaaa148a odbc_es]# cat mysql_import_es.sh
’#!/bin/sh
bin=$JDBC_IMPORTER_HOME/bin
lib=$JDBC_IMPORTER_HOME/lib
echo '{
"type" : "jdbc",
"jdbc": {
"elasticsearch.autodiscover":true,
"elasticsearch.cluster":"my-application", #簇名，詳見：/usr/local/elasticsearch/config/elasticsearch.yml
"url":"jdbc:mysql://10.8.5.101:3306/test", #mysql資料庫位址
"user":"root", #mysql使用者名
"password":"123456", #mysql密碼
"sql":"select * from cc",
"elasticsearch" : {
  "host" : "10.8.5.101",
  "port" : 9300
},
"index" : "myindex", #新的index
"type" : "mytype" #新的type
}
}'| java \
  -cp "${lib}/*" \
  -Dlog4j.configurationFile=${bin}/log4j2.xml \
  org.xbib.tools.Runner \
  org.xbib.tools.JDBCImporter

3）為 mysql_import_es.sh 添加可執行權限。

[root@5b9dbaaa148a odbc_es]# chmod a+x mysql_import_es.sh

4）執行腳本mysql_import_es.sh

[root@5b9dbaaa148a odbc_es]# ./mysql_import_es.sh

第五步：測試資料同步是否成功。

使用elasticsearch檢索查詢：

[root@5b9dbaaa148a odbc_es]# curl -XGET 'http://10.8.5.101:9200/myindex/mytype/_search?pretty'
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
  "total" : 8,
  "successful" : 8,
  "failed" : 0
  },
  "hits" : {
  "total" : 3,
  "max_score" : 1.0,
  "hits" : [ {
  "_index" : "myindex",
  "_type" : "mytype",
  "_id" : "AVVXKgeEun6ksbtikOWH",
  "_score" : 1.0,
  "_source" : {
  "id" : 1,
  "name" : "laoyang"
  }
  }, {
  "_index" : "myindex",
  "_type" : "mytype",
  "_id" : "AVVXKgeEun6ksbtikOWI",
  "_score" : 1.0,
  "_source" : {
  "id" : 2,
  "name" : "dluzhang"
  }
  }, {
  "_index" : "myindex",
  "_type" : "mytype",
  "_id" : "AVVXKgeEun6ksbtikOWJ",
  "_score" : 1.0,
  "_source" : {
  "id" : 3,
  "name" : "dlulaoyang"
  }
  } ]
  }
}

出現以上包含mysql資料字段的資訊則為同步成功。

4、 elasticsearch-jdbc 同步方法二

[root@5b9dbaaa148a odbc_es]# cat mysql_import_es_simple.sh
#!/bin/sh
bin=$JDBC_IMPORTER_HOME/bin
lib=$JDBC_IMPORTER_HOME/lib
  java \
  -cp "${lib}/*" \
  -Dlog4j.configurationFile=${bin}/log4j2.xml \
  org.xbib.tools.Runner \
  org.xbib.tools.JDBCImporter statefile.json

[root@5b9dbaaa148a odbc_es]# cat statefile.json
{
"type" : "jdbc",
"jdbc": {
"elasticsearch.autodiscover":true,
"elasticsearch.cluster":"my-application",
"url":"jdbc:mysql://10.8.5.101:3306/test",
"user":"root",
"password":"123456",
"sql":"select * from cc",
"elasticsearch" : {
  "host" : "10.8.5.101",
  "port" : 9300
},
"index" : "myindex_2",
"type" : "mytype_2"
}
}

腳本和json檔案分開，腳本執行前先加載json檔案。

執行方式：直接運作腳本 ./mysql_import_es_simple.sh 即可。

5、Mysql與elasticsearch等價查詢

目标：實作從表cc中查詢id=3的name資訊。

1）MySQL中sql語句查詢：

mysql> select * from cc where id=3;
+----+------------+
| id | name |
+----+------------+
| 3 | dlulaoyang |
+----+------------+
1 row in set (0.00 sec)

2）elasticsearch檢索：

[root@5b9dbaaa148a odbc_es]# curl http://10.8.5.101:9200/myindex/mytype/_search?pretty -d '
{
"filter" : { "term" : { "id" : "3" } }
}'
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
  "total" : 8,
  "successful" : 8,
  "failed" : 0
  },
  "hits" : {
  "total" : 1,
  "max_score" : 1.0,
  "hits" : [ {
  "_index" : "myindex",
  "_type" : "mytype",
  "_id" : "AVVXKgeEun6ksbtikOWJ",
  "_score" : 1.0,
  "_source" : {
  "id" : 3,
  "name" : "dlulaoyang"
  }
  } ]
  }
}

常見錯誤：

錯誤日志位置：/odbc_es/logs

日志内容：

[root@5b9dbaaa148a logs]# tail -f jdbc.log

04:03:39,570org.xbib.elasticsearch.helper.client.BaseTransportClient after auto-discovery connected to [{5b9dbaaa148a}{aksn2ErNRlWjUECnp_8JmA}{10.8.5.101}{10.8.5.101:9300}{master=true}]

Bug1、02:46:23,894importer.jdbc error while processing request: cluster state is RED and not YELLOW, from here on, everything will fail!

原因：

you created an index with replicas but you had only one node in the cluster. One way to solve this problem is by allocating them on a second node. Another way is by turning replicas off.

你建立了帶副本 replicas 的索引，但是在你的簇中隻有一個節點。

解決方案：

方案一：允許配置設定‘它們’到第二個節點。

方案二：關閉副本replicas（非常可行）。如下：

curl -XPUT 'localhost:9200/_settings' -d '
{
  "index" : {
  "number_of_replicas" : 0
  }
}

Bug2、13:00:37,137importer.jdbc error while processing request: no cluster nodes available, check settings {autodiscover=false, client.transport.ignore_cluster_name=false, client.transport.nodes_sampler_interval=5s, client.transport.ping_timeout=5s, cluster.name=elasticsearch,

org.elasticsearch.client.transport.NoNodeAvailableException: no cluster nodes available, check

見上腳本中新增：

“elasticsearch.cluster”:”my-application”, #簇名，和/usr/local/elasticsearch/config/elasticsearch.yml 簇名保持一緻。

參考：

http://stackoverflow.com/questions/11944915/getting-an-elasticsearch-cluster-to-green-cluster-setup-on-os-x

作者：銘毅天下

轉載請标明出處，原文位址：

http://blog.csdn.net/laoyang360/article/details/51694519

elasticsearch-jdbc實作MySQL同步到ElasticSearch深入詳解

1、如何實作mysql與elasticsearch的資料同步？

2、mysql與elasticsearch同步的方法有哪些？優缺點對比？

目前該領域比較牛的插件有：

1-3同步工具/插件對比：

elasticsearch-jdbc的缺點與不足（他山之石）：

3、elasticsearch-jdbc如何使用？要不要安裝？

3.1 和早期版本不同點

3.2 elasticsearch-jdbc使用(同步方法一）

4、 elasticsearch-jdbc 同步方法二

5、Mysql與elasticsearch等價查詢

繼續閱讀

Java小案例——随機數猜測随機數猜測

nginx location中斜線的位置的重要性

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的簡單使用

neo4j之cypher使用文檔

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

mybatis_入門程式Mybatis入門

mysql使用source指令導入.sql檔案

AOP程式設計_Android優雅權限架構(1)概念基礎，2021金三銀四前言正文大綱正文

Effective Java 8:通用程式設計

OOM三種類型

工廠模式-三種類型

【遞歸】高效率求2的n次幂

win10本地scala和spark安裝安裝scala安裝spark

scala (3) Function 和 Method