solr配置與mongodb的安裝
solr安裝配置到目前已經非常簡單,參考官方文檔:http://lucene.apache.org/solr/quickstart.html,官方文檔中用的是cloud這個樣例(-e 指定),最後,我采用的是techproducts,基本指令如下:
注意:如果unzip沒有安裝,請先安裝:apt-get install unzip
root@xxx:xxx# ls solr-*
solr-5.3.1.zip solr-5.3.1.zip
root@xxx:xxx# unzip -q solr-5.3.1.zip
root@xxx:xxx# cd solr-5.3.1/
root@xxx:/home/software/solr-5.3.1# bin/solr start -e techproducts –noprompt
your current version of java is too old to run this version of solr
we found version 1.7.0_79, using command '/usr/java/jdk1.7.0_79/bin/java'
please install latest version of java 8 or set java_home properly.
debug information:
java_home: /usr/java/jdk1.7.0_79
active path:
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/java/jdk1.7.0_79/bin:/usr/local/mongodb/bin
如果出現上面的提示語,請安裝jdk1.7以上版本
安裝好之後執行:
root@xxx:/home/software/solr-5.3.1# bin/solr start -e techproducts -noprompt
這時候從浏覽器通路:
上面的techproducts就是一個solrcore
1、solr的安裝和啟動停止
按照官方文檔所說,如果你像用完後關閉solr,并清除這個樣例底下的資料,那麼請運作:
root@xxx:/home/software/solr-5.3.1# pwd
/home/software/solr-5.3.1
root@xxx:/home/software/solr-5.3.1# bin/solr stop -all
sending stop command to solr running on port 8983 ... waiting 5 seconds to allow jetty process 1816 to stop gracefully.
root@xxx:/home/software/solr-5.3.1# rm -rf example/techproducts/
注意:如果在停止所有之後執行:bin/solr start -all -noprompt 預設到,然後通路:http://localhost:8983/solr/ ,添加solrcore,它最後會到/home/software/solr-5.3.1/server/solr中去找。若沒有,拷貝:/home/software/solr-5.3.1/example/techproducts/techproducts 到 /home/software/solr-5.3.1/server/solr 并将名稱techproducts改成docdetection,例如:
并修改docdetection中的core.properties的内容為:
#written by corepropertieslocator
#fri mar 31 18:36:50 utc 2017
name=docdetection
config=solrconfig.xml
schema=schema.xml
datadir=data
如果想建立多個,可以在docdetection同級目錄下建立多個。比如:
core.properties的内容如下:
name=test
進入/home/software/solr-5.3.1/server/solr-webapp/webapp/web-inf,修改web.xml中的
<env-entry>
<env-entry-name>solr/home</env-entry-name>
<env-entry-value>/home/software/solr-5.3.1/server/solr</env-entry-value>
<env-entry-type>java.lang.string</env-entry-type>
</env-entry>
接着重新整理:http://localhost:8983/solr,最終的界面如下:
2、solr與mongodb的整合
從solr官方給的quickstart文檔上來看,它可以搜尋xml,json, csv等多種文檔,但絲毫看不出這東西還能跟mongodb整合,但是萬能的人類總是能想辦法把他們弄到一起,或許真的有全能神吧。
參考位址:http://www.cnblogs.com/sysuys/p/3403670.html
為了讓solr和mongodb進行整合,需要mongo-connector,參考位址是:https://github.com/10gen-labs/mongo-connector/wiki/getting-started
1) 、建立mongodb的replica set(副本集)
安裝python-pip 和 git
root@xxx:~# apt-get install python-pip
root@xxx:~# apt-get install git
reading package lists... done
building dependency tree
reading state information... done
the following extra packages will be installed:
root@izm5effj2tm01xy2qqmnlzz:~#
指定副本集合啟動:
mongod –replset docdetection 在本次實驗中,我把他放到背景運作了,參考位址:http://blog.csdn.net/tototuzuoquan/article/details/55805811
mongodb的終止和啟動很簡單,你要是上面啟動的,它就在前台運作,你需要再次結束時,直接ctrl + c,如果啟動時加上&,它就在背景運作,當然也就得用pkill或者kill了。
然後再mongo shell下對副本集進行初始化:
root@xxx:/etc# mongo
mongodb shell version: 3.2.11
connecting to: test
server has startup warnings:
> rs.initiate();
{
"info2" : "no configuration specified. using a default configuration for the set",
"me" : "izm5effj2tm01xy2qqmnlzz:27017",
"ok" : 1
}
docdetection:other>
這個時候mongodb這一邊就弄好了,很簡單,就要加一個副本集。
2)、安裝mongo-connector
2.1)、mongo-connector安裝(推薦)
可以在安裝的時候,讓mongo-connector作為一個背景程序,可以按照下面的步驟進行安裝:
編輯config.json進行檢視
root@xxx:/home/software/mongo-connector-master#pwd
/home/software/mongo-connector-master
root@xxx:/home/software/mongo-connector-master# pip install mongo_connector[solr]
如果查找mongo-connector在哪兒,可以使用下面的方式:
root@xxx:/etc/init.d# find / -name mongo-connector
/home/software/mongo-connector-master/scripts/mongo-connector
/usr/local/bin/mongo-connector
<a target="_blank" href="mailto:root@xxx:/etc/init.d#">root@xxx:/etc/init.d#</a>
下面是安裝elastic2-doc-manager 這個doc-manager
root@xxx:/home/software/mongo-connector-master# pip install elastic2-doc-manager
注意:如果提示沒有python-pip,apt-get一下便好了。但是先别急着用,因為這個東西要讀取solr的配置檔案,是以solr中的一些地方弄好了,再用這個就隻是一條指令罷了。
注意:網上說通過pip安裝,但是沒有說解除安裝的,看下pip的說明:
root@xxx:/home/software/mongo-connector-master# pip --help
usage:
pip <command> [options]
可以通過下面的方式進行解除安裝:
root@xxx:/home/software/mongo-connector-master# pip uninstall mongo-connector
2.2)、第二種安裝mongodb-connector的方式:
cd mongo-connector
#安裝前修改mongo_connector/constants.py的變量:設定default_commit_interval = 0
python setup.py install
root@xxx:/home/software# unzip mongo-connector-master.zip
root@xxx:/home/software# chmod +x setup.py
root@xxx:/home/software# cd mongo-connector-master/
root@xxx:/home/software/mongo-connector-master# python setup.py install
root@xxx:/home/software/mongo-connector-master# python setup.py install_service
running install_service
creating /var/log/mongo-connector
copying ./config.json -> /etc/mongo-connector.json
copying ./scripts/mongo-connector -> /etc/init.d
root@xxx:/home/software/mongo-connector-master# chmod +x /etc/init.d/mongo-connector
執行下面的指令確定系統的啟動配置被更新了:
root@xxx:/home/software/mongo-connector-master# update-rc.d mongo-connector defaults
update-rc.d: warning: default start runlevel arguments (2 3 4 5) do not match mongo-connector default-start values (3 4 5)
adding system startup for /etc/init.d/mongo-connector ...
/etc/rc0.d/k20mongo-connector -> ../init.d/mongo-connector
/etc/rc1.d/k20mongo-connector -> ../init.d/mongo-connector
/etc/rc6.d/k20mongo-connector -> ../init.d/mongo-connector
/etc/rc2.d/s20mongo-connector -> ../init.d/mongo-connector
/etc/rc3.d/s20mongo-connector -> ../init.d/mongo-connector
/etc/rc4.d/s20mongo-connector -> ../init.d/mongo-connector
/etc/rc5.d/s20mongo-connector -> ../init.d/mongo-connector
root@izm5effj2tm01xy2qqmnlzz:/home/software/mongo-connector-master#
如果想移除背景運作的可以執行下面的操作:
python setup.py uninstall_service
通過這個指令可以移除/etc/init.d/mongo-connector 和 /etc/mongo-connector.json
3)solr一端的配置:
查找schema.xml,并修改這個檔案
root@xxx:/home/software/solr-5.3.1# find ./ -name "schema.xml"
./example/example-dih/solr/rss/conf/schema.xml
./example/example-dih/solr/tika/conf/schema.xml
./example/example-dih/solr/solr/conf/schema.xml
./example/example-dih/solr/mail/conf/schema.xml
./example/example-dih/solr/db/conf/schema.xml
./example/techproducts/solr/techproducts/conf/schema.xml
./server/solr/configsets/sample_techproducts_configs/conf/schema.xml
./server/solr/configsets/basic_configs/conf/schema.xml
root@izm5effj2tm01xy2qqmnlzz:/home/software/solr-5.3.1#
打開
vi ./server/solr/configsets/sample_techproducts_configs/conf/schema.xml
将(linux上的查找方式是: esc --à :/<uniquekey>)
<uniquekey>id</uniquekey>
改成帶有下劃線的id:
再添加(linux上到達最第行的指令: esc --à shift + g):
<field name="_id" type="string" indexed="true" stored="true" />
<field name="_ts" type="long" indexed="true" stored="true" />
<field name="ns" type="string" indexed="true" stored="true"/>
添加後的效果如下:
注釋掉原來的(指令是: esc --à :/name="id")
<!--
<field name="id"type="string" indexed="true" stored="true"required="true" multivalued="false" />
-->
截圖如下:
不然往solr中添加一個json,或者xml都要求有這個字段id,因為required=”true”
schema.xml的修改就是這些
修改solrconfig.xml
打開:
vi ./server/solr/configsets/sample_techproducts_configs/conf/solrconfig.xml
将(關于下面的class如果配置錯了,也将出不來solr和mongodb的資料同步,參考官網:https://github.com/mongodb-labs/mongo-connector/wiki/usage%20with%20solr#make-sure-the-lukerequesthandler-is-enabled)
<requesthandler name="/admin/luke" class="org.apache.solr.handler.admin.lukerequesthandler" />
解注釋,如果沒有,就添加一行,這個東西要被mongo-connector用到,mongo-connector會請求擷取上面的schema.xml,正是這個handler來處理這個請求,是以說這個很重要。
最後:
最後,我們按照之前說的關閉solr,清除example/techproducts目錄,重新再次啟動solr,重新開機techproducts樣例會産生一些錯誤,那是因為修改了schema.xml,裡面uniquekey變成了_id,而不是id,是以會産生這些錯誤,但這些都可以忽略,不産生錯誤就說明有問題。之後你會發現,那兩個配置檔案被複制成了exmaple/techproducts這個樣例的配置檔案,就像上文說的。
root@xxx:/home/software/solr-5.3.1# cd /home/software/solr-5.3.1
root@xxx:/home/software/solr-5.3.1#
4)使用mongo-connector連接配接solr與mongodb.
在目前情況下,請運作(其中:mongo-connector 的參考位址是:http://blog.csdn.net/hyman_yx/article/details/51684218):
mongo-connector -m localhost:27017 --auto-commit-interval=0 -t http://localhost:8983/solr/techproducts -d solr_doc_manager &
mongo-connector -m localhost:27017 --auto-commit-interval=0 -t http://localhost:8983/solr/docdetection -d solr_doc_manager &
注意:
若有時候發現重新建立索引的時候不給力,需要執行下面的指令(同時要删除索引,重新建立):
root@izm5effj2tm01xy2qqmnlzz:/home/software/solr-5.3.1# rm -rf mongo-connector.log
root@izm5effj2tm01xy2qqmnlzz:/home/software/solr-5.3.1# rm -rf oplog.timestamp
root@izm5effj2tm01xy2qqmnlzz:/home/software/solr-5.3.1/server/solr/docdetection/data# pwd
/home/software/solr-5.3.1/server/solr/docdetection/data
root@izm5effj2tm01xy2qqmnlzz:/home/software/solr-5.3.1/server/solr/docdetection/data# rm -rf *
執行完成之後的效果如下:
檢視mongo-connector進去的内容
經過以上步驟配置之後,終于可以看到(至此,配置成功):
在mongodb中的内容為:
如果有時候你發現你的solr沒有自動同步資料,那是因為solr預設配置中,預設把自動同步給關閉了,這時候需要對solrconfig.xml自動同步的開關進行設定,可以以下操作
進入solr的目錄(注意:我的solr是放在/home/software/solr-5.3.1)中的:
cd /home/software/solr-5.3.1
查找solrconfig.xml
find ./ -name solrconfig.xml,結果如下:
./example/files/conf/solrconfig.xml
./example/example-dih/solr/rss/conf/solrconfig.xml
./example/example-dih/solr/tika/conf/solrconfig.xml
./example/example-dih/solr/solr/conf/solrconfig.xml
./example/example-dih/solr/mail/conf/solrconfig.xml
./example/example-dih/solr/db/conf/solrconfig.xml
./example/techproducts/solr/techproducts/conf/solrconfig.xml
./server/solr/configsets/sample_techproducts_configs/conf/solrconfig.xml
./server/solr/configsets/basic_configs/conf/solrconfig.xml
./server/solr/configsets/data_driven_schema_configs/conf/solrconfig.xml
修改上面紅色标注出來的檔案中的如下内容進行修改:
<autocommit>
<maxtime>${solr.autocommit.maxtime:15000}</maxtime>
<opensearcher>false</opensearcher>
</autocommit>
<!-- softautocommit is like autocommit except it causes a
'soft' commit which only ensures that changes are visible
but does not ensure that data is synced to disk. this is
faster and more near-realtime friendly than a hard commit.
<autosoftcommit>
<maxtime>${solr.autosoftcommit.maxtime:-1}</maxtime>
</autosoftcommit>
修改1:vimexample/techproducts/solr/techproducts/conf/solrconfig.xml
<maxtime>${solr.autocommit.maxtime:15000}</maxtime>
<opensearcher>false</opensearcher>
<maxtime>${solr.autosoftcommit.maxtime:-1}</maxtime>
修改為:
<maxtime>300000</maxtime>
<maxdocs>10000</maxdocs>
<opensearcher>true</opensearcher>
but does not ensure that data is synced to disk. this is
<maxdocs>1000</maxdocs>
<maxtime>60000</maxtime>
修改2:/server/solr/configsets/sample_techproducts_configs/conf/solrconfig.xml
<maxtime>${solr.autocommit.maxtime:15000}</maxtime>
<maxtime>${solr.autosoftcommit.maxtime:-1}</maxtime>
<maxtime>300000</maxtime>
修改3:vim server/solr/configsets/basic_configs/conf/solrconfig.xml
<maxtime>${solr.autocommit.maxtime:15000}</maxtime>
<opensearcher>false</opensearcher>
'soft' commit which only ensures that changes are visible
but does not ensure that data is synced to disk. this is
faster and more near-realtime friendly than a hard commit.
<maxtime>${solr.autosoftcommit.maxtime:-1}</maxtime>
修改4:vimserver/solr/configsets/data_driven_schema_configs/conf/solrconfig.xml
<opensearcher>false</opensearcher>
<maxdocs>10000</maxdocs>
<maxdocs>1000</maxdocs>
要注意的是,如果想在solr中再次添加mongodb中中的key作為索引元素,需要編輯solrcore中的schema.xml中的内容。下面的一個例子是:
<?xml version="1.0" encoding="utf-8" ?>
<schema name="example" version="1.5">
<types>
<fieldtype name="string" class="solr.strfield" sortmissinglast="true" />
<fieldtype name="boolean" class="solr.boolfield" sortmissinglast="true"/>
<fieldtype name="int" class="solr.trieintfield" precisionstep="0" positionincrementgap="0"/>
<fieldtype name="float" class="solr.triefloatfield" precisionstep="0" positionincrementgap="0"/>
<fieldtype name="long" class="solr.trielongfield" precisionstep="0" positionincrementgap="0"/>
<fieldtype name="double" class="solr.triedoublefield" precisionstep="0" positionincrementgap="0"/>
<fieldtype name="date" class="solr.triedatefield" precisionstep="0" positionincrementgap="0"/>
<!--
<fieldtype name="text_ik" class="solr.textfield">
<analyzer class="org.wltea.analyzer.lucene.ikanalyzer"/>
</fieldtype>
-->
<fieldtype name="text_general" class="solr.textfield" positionincrementgap="100">
<analyzer type="index">
<tokenizer class="solr.standardtokenizerfactory"/>
<filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt" />
<!-- in this example, we will only use synonyms at query time
<filter class="solr.synonymfilterfactory" synonyms="index_synonyms.txt" ignorecase="true" expand="false"/>
<filter class="solr.lowercasefilterfactory"/>
</analyzer>
<analyzer type="query">
<filter class="solr.synonymfilterfactory" synonyms="synonyms.txt" ignorecase="true" expand="true"/>
<filter class="solr.lowercasefilterfactory"/>
</analyzer>
</fieldtype>
</types>
<fields>
<field name="_version_" type="long" indexed="true" stored="true"/>
<field name="_id" type="string" indexed="true" stored="true" />
<field name="_ts" type="long" indexed="true" stored="true" />
<field name="ns" type="string" indexed="true" stored="true"/>
<field name="doclibrayid" type="string" indexed="true" stored="true"/>
<field name="originaldocpath" type="string" indexed="true" stored="true"/>
<field name="htmldocpath" type="string" indexed="true" stored="true" />
<field name="originalfilename" type="string" indexed="true" stored="true"/>
<field name="majorid" type="string" indexed="true" stored="true"/>
<field name="majorname" type="string" indexed="true" stored="true"/>
<field name="propertyid" type="string" indexed="true" stored="true"/>
<field name="propertyname" type="string" indexed="true" stored="true"/>
<field name="wordnum" type="int" indexed="true" stored="true"/>
<field name="paragnum" type="int" indexed="true" stored="true"/>
<field name="sentencenum" type="int" indexed="true" stored="true"/>
<field name="content" type="text_general" indexed="false" stored="true" multivalued="true"/>
</fields>
<uniquekey>_id</uniquekey>
<defaultsearchfield>majorname</defaultsearchfield>
<solrqueryparser defaultoperator="or"/>
</schema>
回到執行mongodb-connector指令的所在位置:
找到:oplog.timestamp,然後删除。同樣,也可以删除mongo-connector.log這個檔案
進入索引的存放目錄:
cd /home/software/solr-5.3.1/server/solr/docdetection/data
删除生成的所有的索引資訊rm -rf * (注意目錄在:cd /home/software/solr-5.3.1/server/solr/docdetection/data)
然後再執行:
重新開機solr,指令在博文的上面: