solrconfig.xml配置檔案中包含了很多solr自身配置相關的參數,solrconfig.xml配置檔案示例可以從solr的解壓目錄下找到,如圖:

用文本編輯軟體打開solrconfig.xml配置,你将會看到以下配置内容:
<?xml version="1.0" encoding="utf-8" ?>
<!--
licensed to the apache software foundation (asf) under one or more
contributor license agreements. see the notice file distributed with
this work for additional information regarding copyright ownership.
the asf licenses this file to you under the apache license, version 2.0
(the "license"); you may not use this file except in compliance with
the license. you may obtain a copy of the license at
http://www.apache.org/licenses/license-2.0
unless required by applicable law or agreed to in writing, software
distributed under the license is distributed on an "as is" basis,
without warranties or conditions of any kind, either express or implied.
see the license for the specific language governing permissions and
limitations under the license.
-->
<!--
for more details about configurations options that may appear in
this file, see http://wiki.apache.org/solr/solrconfigxml.
<config>
<!-- in all configuration below, a prefix of "solr." for class names
is an alias that causes solr to search appropriate packages,
including org.apache.solr.(search|update|request|core|analysis)
you may also specify a fully qualified java classname if you
have your own custom plugins.
-->
<!-- controls what version of lucene various components of solr
adhere to. generally, you want to use the latest version to
get all bug fixes and improvements. it is highly recommended
that you fully re-index after changing this setting as it can
affect both how text is indexed and queried.
-->
<lucenematchversion>5.1.0</lucenematchversion>
<!-- data directory
used to specify an alternate directory to hold all index data
other than the default ./data under the solr home. if
replication is in use, this should match the replication
configuration.
<!--
<datadir>${solr.data.dir:}</datadir>
<datadir>c:\solr_home\core1\data</datadir>
<!-- the directoryfactory to use for indexes.
solr.standarddirectoryfactory is filesystem
based and tries to pick the best implementation for the current
jvm and platform. solr.nrtcachingdirectoryfactory, the default,
wraps solr.standarddirectoryfactory and caches small files in memory
for better nrt performance.
one can force a particular implementation via solr.mmapdirectoryfactory,
solr.niofsdirectoryfactory, or solr.simplefsdirectoryfactory.
solr.ramdirectoryfactory is memory based, not
persistent, and doesn't work with replication.
<directoryfactory name="directoryfactory"
class="${solr.directoryfactory:solr.nrtcachingdirectoryfactory}">
</directoryfactory>
<!-- the codecfactory for defining the format of the inverted index.
the default implementation is schemacodecfactory, which is the official lucene
index format, but hooks into the schema to provide per-field customization of
the postings lists and per-document values in the fieldtype element
(postingsformat/docvaluesformat). note that most of the alternative implementations
are experimental, so if you choose to customize the index format, it's a good
idea to convert back to the official format e.g. via indexwriter.addindexes(indexreader)
before upgrading to a newer version to avoid unnecessary reindexing.
<codecfactory class="solr.schemacodecfactory"/>
<schemafactory class="classicindexschemafactory"/>
<!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
index config - these settings control low-level behavior of indexing
most example settings here show the default value, but are commented
out, to more easily see where customizations have been made.
note: this replaces <indexdefaults> and <mainindex> from older versions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
<indexconfig>
<!-- lockfactory
this option specifies which lucene lockfactory implementation
to use.
single = singleinstancelockfactory - suggested for a
read-only index or when there is no possibility of
another process trying to modify the index.
native = nativefslockfactory - uses os native file locking.
do not use when multiple solr webapps in the same
jvm are attempting to share a single index.
simple = simplefslockfactory - uses a plain file for locking
defaults: 'native' is default for solr3.6 and later, otherwise
'simple' is the default
more details on the nuances of each lockfactory...
http://wiki.apache.org/lucene-java/availablelockfactories
<locktype>${solr.lock.type:native}</locktype>
<!-- lucene infostream
to aid in advanced debugging, lucene provides an "infostream"
of detailed information when indexing.
setting the value to true will instruct the underlying lucene
indexwriter to write its info stream to solr's log. by default,
this is enabled here, and controlled through log4j.properties.
-->
<infostream>true</infostream>
</indexconfig>
<!-- jmx
this example enables jmx if and only if an existing mbeanserver
is found, use this if you want to configure jmx through jvm
parameters. remove this to disable exposing solr configuration
and statistics to jmx.
for more details see http://wiki.apache.org/solr/solrjmx
<jmx />
<!-- if you want to connect to a particular server, specify the
agentid
<!-- <jmx agentid="myagent" /> -->
<!-- if you want to start a new mbeanserver, specify the serviceurl -->
<!-- <jmx serviceurl="service:jmx:rmi:///jndi/rmi://localhost:9999/solr"/>
<!-- the default high-performance update handler -->
<updatehandler class="solr.directupdatehandler2">
<!-- enables a transaction log, used for real-time get, durability, and
and solr cloud replica recovery. the log can grow as big as
uncommitted changes to the index, so use of a hard autocommit
is recommended (see below).
"dir" - the target directory for transaction logs, defaults to the
solr data directory. -->
<updatelog>
<str name="dir">${solr.ulog.dir:}</str>
</updatelog>
<!-- autocommit
perform a hard commit automatically under certain conditions.
instead of enabling autocommit, consider using "commitwithin"
when adding documents.
http://wiki.apache.org/solr/updatexmlmessages
maxdocs - maximum number of documents to add since the last
commit before automatically triggering a new commit.
maxtime - maximum amount of time in ms that is allowed to pass
since a document was added before automatically
triggering a new commit.
opensearcher - if false, the commit causes recent index changes
to be flushed to stable storage, but does not cause a new
searcher to be opened to make those changes visible.
if the updatelog is enabled, then it's highly recommended to
have some sort of hard autocommit to limit the log size.
<autocommit>
<maxtime>${solr.autocommit.maxtime:15000}</maxtime>
<opensearcher>false</opensearcher>
</autocommit>
<!-- softautocommit is like autocommit except it causes a
'soft' commit which only ensures that changes are visible
but does not ensure that data is synced to disk. this is
faster and more near-realtime friendly than a hard commit.
<autosoftcommit>
<maxtime>${solr.autosoftcommit.maxtime:-1}</maxtime>
</autosoftcommit>
</updatehandler>
query section - these settings control query time things like caches
<query>
<!-- max boolean clauses
maximum number of clauses in each booleanquery, an exception
is thrown if exceeded.
** warning **
this option actually modifies a global lucene property that
will affect all solrcores. if multiple solrconfig.xml files
disagree on this property, the value at any given moment will
be based on the last solrcore to be initialized.
<maxbooleanclauses>1024</maxbooleanclauses>
<!-- solr internal query caches
there are two implementations of cache available for solr,
lrucache, based on a synchronized linkedhashmap, and
fastlrucache, based on a concurrenthashmap.
fastlrucache has faster gets and slower puts in single
threaded operation and thus is generally faster than lrucache
when the hit ratio of the cache is high (> 75%), and may be
faster under other scenarios on multi-cpu systems.
<!-- filter cache
cache used by solrindexsearcher for filters (docsets),
unordered sets of *all* documents that match a query. when a
new searcher is opened, its caches may be prepopulated or
"autowarmed" using data from caches in the old searcher.
autowarmcount is the number of items to prepopulate. for
lrucache, the autowarmed items will be the most recently
accessed items.
parameters:
class - the solrcache implementation lrucache or
(lrucache or fastlrucache)
size - the maximum number of entries in the cache
initialsize - the initial capacity (number of entries) of
the cache. (see java.util.hashmap)
autowarmcount - the number of entries to prepopulate from
and old cache.
<filtercache class="solr.fastlrucache"
size="512"
initialsize="512"
autowarmcount="0"/>
<!-- query result cache
caches results of searches - ordered lists of document ids
(doclist) based on a query, a sort, and the range of documents requested.
<queryresultcache class="solr.lrucache"
size="512"
initialsize="512"
autowarmcount="0"/>
<!-- document cache
caches lucene document objects (the stored fields for each
document). since lucene internal document ids are transient,
this cache will not be autowarmed.
<documentcache class="solr.lrucache"
size="512"
initialsize="512"
autowarmcount="0"/>
<!-- custom cache currently used by block join -->
<cache name="persegfilter"
class="solr.search.lrucache"
size="10"
initialsize="0"
autowarmcount="10"
regenerator="solr.noopregenerator" />
<!-- lazy field loading
if true, stored fields that are not requested will be loaded
lazily. this can result in a significant speed improvement
if the usual case is to not load all stored fields,
especially if the skipped fields are large compressed text
fields.
<enablelazyfieldloading>true</enablelazyfieldloading>
<!-- result window size
an optimization for use with the queryresultcache. when a search
is requested, a superset of the requested number of document ids
are collected. for example, if a search for a particular query
requests matching documents 10 through 19, and querywindowsize is 50,
then documents 0 through 49 will be collected and cached. any further
requests in that range can be satisfied via the cache.
-->
<queryresultwindowsize>20</queryresultwindowsize>
<!-- maximum number of documents to cache for any entry in the
queryresultcache.
<queryresultmaxdocscached>200</queryresultmaxdocscached>
<!-- use cold searcher
if a search request comes in and there is no current
registered searcher, then immediately register the still
warming searcher and use it. if "false" then all requests
will block until the first searcher is done warming.
<usecoldsearcher>false</usecoldsearcher>
<!-- max warming searchers
maximum number of searchers that may be warming in the
background concurrently. an error is returned if this limit
is exceeded.
recommend values of 1-2 for read-only slaves, higher for
masters w/o cache warming.
<maxwarmingsearchers>2</maxwarmingsearchers>
</query>
<!-- request dispatcher
this section contains instructions for how the solrdispatchfilter
should behave when processing requests for this solrcore.
handleselect is a legacy option that affects the behavior of requests
such as /select?qt=xxx
handleselect="true" will cause the solrdispatchfilter to process
the request and dispatch the query to a handler specified by the
"qt" param, assuming "/select" isn't already registered.
handleselect="false" will cause the solrdispatchfilter to
ignore "/select" requests, resulting in a 404 unless a handler
is explicitly registered with the name "/select"
handleselect="true" is not recommended for new users, but is the default
for backwards compatibility
<requestdispatcher handleselect="false" >
<!-- request parsing
these settings indicate how solr requests may be parsed, and
what restrictions may be placed on the contentstreams from
those requests
enableremotestreaming - enables use of the stream.file
and stream.url parameters for specifying remote streams.
multipartuploadlimitinkb - specifies the max size (in kib) of
multipart file uploads that solr will allow in a request.
formdatauploadlimitinkb - specifies the max size (in kib) of
form data (application/x-www-form-urlencoded) sent via
post. you can use post to pass request parameters not
fitting into the url.
addhttprequesttocontext - if set to true, it will instruct
the requestparsers to include the original httpservletrequest
object in the context map of the solrqueryrequest under the
key "httprequest". it will not be used by any of the existing
solr components, but may be useful when developing custom
plugins.
*** warning ***
the settings below authorize solr to fetch remote files, you
should make sure your system has some authentication before
using enableremotestreaming="true"
-->
<requestparsers enableremotestreaming="true"
multipartuploadlimitinkb="2048000"
formdatauploadlimitinkb="2048"
addhttprequesttocontext="false"/>
<!-- http caching
set http caching related parameters (for proxy caches and clients).
the options below instruct solr not to output any http caching
related headers
<httpcaching never304="true" />
</requestdispatcher>
<!-- request handlers
http://wiki.apache.org/solr/solrrequesthandler
incoming queries will be dispatched to a specific handler by name
based on the path specified in the request.
legacy behavior: if the request path uses "/select" but no request
handler has that name, and if handleselect="true" has been specified in
the requestdispatcher, then the request handler is dispatched based on
the qt parameter. handlers without a leading '/' are accessed this way
like so: http://host/app/[core/]select?qt=name if no qt is
given, then the requesthandler that declares default="true" will be
used or the one named "standard".
if a request handler is declared with startup="lazy", then it will
not be initialized until the first request that uses it.
<!-- searchhandler
http://wiki.apache.org/solr/searchhandler
for processing search queries, the primary request handler
provided with solr is "searchhandler" it delegates to a sequent
of searchcomponents (see below) and supports distributed
queries across multiple shards
<!--
<requesthandler name="/dataimport" class="solr.dataimporthandler">
<lst name="defaults">
<str name="config">solr-data-config.xml</str>
</lst>
</requesthandler>
<str name="config">data-config.xml</str>
<requesthandler name="/select" class="solr.searchhandler">
<!-- default values for query parameters can be specified, these
will be overridden by parameters in the request
<lst name="defaults">
<str name="echoparams">explicit</str>
<int name="rows">10</int>
</lst>
</requesthandler>
<!-- a request handler that returns indented json by default -->
<requesthandler name="/query" class="solr.searchhandler">
<str name="wt">json</str>
<str name="indent">true</str>
<str name="df">text</str>
the export request handler is used to export full sorted result sets.
do not change these defaults.
<requesthandler name="/export" class="solr.searchhandler">
<lst name="invariants">
<str name="rq">{!xport}</str>
<str name="wt">xsort</str>
<str name="distrib">false</str>
<arr name="components">
<str>query</str>
</arr>
<initparams path="/update/**,/query,/select,/tvrh,/elevate,/spell">
<str name="df">text</str>
</initparams>
<!-- field analysis request handler
requesthandler that provides much the same functionality as
analysis.jsp. provides the ability to specify multiple field
types and field names in the same request and outputs
index-time and query-time analysis for each of them.
request parameters are:
analysis.fieldname - field name whose analyzers are to be used
analysis.fieldtype - field type whose analyzers are to be used
analysis.fieldvalue - text for index-time analysis
q (or analysis.q) - text for query time analysis
analysis.showmatch (true|false) - when set to true and when
query analysis is performed, the produced tokens of the
field value analysis will be marked as "matched" for every
token that is produces by the query analysis
-->
<requesthandler name="/analysis/field"
startup="lazy"
class="solr.fieldanalysisrequesthandler" />
<!-- document analysis handler
http://wiki.apache.org/solr/analysisrequesthandler
an analysis handler that provides a breakdown of the analysis
process of provided documents. this handler expects a (single)
content stream with the following format:
<docs>
<doc>
<field name="id">1</field>
<field name="name">the name</field>
<field name="text">the text value</field>
</doc>
<doc>...</doc>
...
</docs>
note: each document must contain a field which serves as the
unique key. this key is used in the returned response to associate
an analysis breakdown to the analyzed document.
like the fieldanalysisrequesthandler, this handler also supports
query analysis by sending either an "analysis.query" or "q"
request parameter that holds the query text to be analyzed. it
also supports the "analysis.showmatch" parameter which when set to
true, all field tokens that match the query tokens will be marked
as a "match".
<requesthandler name="/analysis/document"
class="solr.documentanalysisrequesthandler"
startup="lazy" />
<!-- echo the request contents back to the client -->
<requesthandler name="/debug/dump" class="solr.dumprequesthandler" >
<str name="echoparams">explicit</str>
<str name="echohandler">true</str>
<!-- search components
search components are registered to solrcore and used by
instances of searchhandler (which can access them by name)
by default, the following components are available:
<searchcomponent name="query" class="solr.querycomponent" />
<searchcomponent name="facet" class="solr.facetcomponent" />
<searchcomponent name="mlt" class="solr.morelikethiscomponent" />
<searchcomponent name="highlight" class="solr.highlightcomponent" />
<searchcomponent name="stats" class="solr.statscomponent" />
<searchcomponent name="debug" class="solr.debugcomponent" />
<!-- terms component
http://wiki.apache.org/solr/termscomponent
a component to return terms and document frequency of those
terms
<searchcomponent name="terms" class="solr.termscomponent"/>
<!-- a request handler for demonstrating the terms component -->
<requesthandler name="/terms" class="solr.searchhandler" startup="lazy">
<bool name="terms">true</bool>
<bool name="distrib">false</bool>
</lst>
<str>terms</str>
<!-- legacy config for the admin interface -->
<admin>
<defaultquery>*:*</defaultquery>
</admin>
</config>
下面我将對其中關鍵地方加以解釋說明:
lib
<lib> 标簽指令可以用來告訴solr如何去加載solr plugins(solr插件)依賴的jar包,在solrconfig.xml配置檔案的注釋中有配置示例,例如:
<lib dir="./lib" regex=”lucene-\w+\.jar”/>
這裡的dir表示一個jar包目錄路徑,該目錄路徑是相對于你目前core根目錄的;regex表示一個正規表達式,用來過濾檔案名的,符合正規表達式的jar檔案将會被加載
datadir parameter
用來指定一個solr的索引資料目錄,solr建立的索引會存放在data\index目錄下,預設datadir是相對于目前core目錄(如果solr_home下存在core的話),如果solr_home下不存在core的話,那datadir預設就是相對于solr_home啦,不過一般datadir都在core.properties下配置。
<datadir>/var/data/solr</datadir>
用來設定lucene反向索引的編碼工廠類,預設實作是官方提供的schemacodecfactory類。
在solrconfig.xml的<indexconfig>标簽中間有很多關于此配置項的說明:
<!-- maxfieldlength was removed in 4.0. to get similar behavior, include a
limittokencountfilterfactory in your fieldtype definition. e.g.
<filter class="solr.limittokencountfilterfactory" maxtokencount="10000"/>
提供我們maxfieldlength配置項已經從4.0版本開始就已經被移除了,可以使用配置一個filter達到相似的效果,maxtokencount即在對某個域分詞的時候,最多隻提取前10000個token,後續的域值将被抛棄。maxfieldlength若表示1000,則意味着隻會對域值的0~1000範圍内的字元串進行分詞索引。
<writelocktimeout>1000</writelocktimeout>
writelocktimeout表示indexwriter執行個體在擷取寫鎖的時候最大等待逾時時間,超過指定的逾時時間仍未擷取到寫鎖,則indexwriter寫索引操作将會抛出異常
<maxindexingthreads>8</maxindexingthreads>
表示建立索引的最大線程數,預設是開辟8個線程來建立索引
<usecompoundfile>false</usecompoundfile>
是否開啟複合檔案模式,啟用了複合檔案模式即意味着建立的索引檔案數量會減少,這樣占用的檔案描述符也會減少,但這會帶來性能的損耗,在lucene中,它預設是開啟,而在solr中,自從3.6版本開始,預設就是禁用的
<rambuffersizemb>100</rambuffersizemb>
表示建立索引時記憶體緩存大小,機關是mb,預設最大是100m,
<maxbuffereddocs>1000</maxbuffereddocs>
表示在document寫入到硬碟之前,緩存的document最大個數,超過這個最大值會觸發索引的flush操作。
<mergepolicy class="org.apache.lucene.index.tieredmergepolicy">
<int name="maxmergeatonce">10</int>
<int name="segmentspertier">10</int>
</mergepolicy>
用來配置lucene索引段合并政策的,裡面有兩個參數:
maxmergeatone: 一次最多合并段個數
segmentpertier: 每個層級的段個數,同時也是記憶體buffer遞減的等比數列的公比,看源碼:
// compute max allowed segs in the index
long levelsize = minsegmentbytes;
long bytesleft = totindexbytes;
double allowedsegcount = 0;
while(true) {
final double segcountlevel = bytesleft / (double) levelsize;
if (segcountlevel < segspertier) {
allowedsegcount += math.ceil(segcountlevel);
break;
}
allowedsegcount += segspertier;
bytesleft -= segspertier * levelsize;
levelsize *= maxmergeatonce;
}
int allowedsegcountint = (int) allowedsegcount;
<mergefactor>10</mergefactor>
要了解mergefactor因子的含義,還是先看看lucene in action中給出的解釋:
indexwriter’s mergefactor lets you control how many documents to store in memory
before writing them to the disk, as well as how often to merge multiple index
segments together. (index segments are covered in appendix b.) with the default
value of 10, lucene stores 10 documents in memory before writing them to a single
segment on the disk. the mergefactor value of 10 also means that once the
number of segments on the disk has reached the power of 10, lucene merges
these segments into a single segment.
for instance, if you set mergefactor to 10, a new segment is created on the disk
for every 10 documents added to the index. when the tenth segment of size 10 is
added, all 10 are merged into a single segment of size 100. when 10 such segments
of size 100 have been added, they’re merged into a single segment containing
1,000 documents, and so on. therefore, at any time, there are no more than 9
segments in the index, and the size of each merged segment is the power of 10.
there is a small exception to this rule that has to do with maxmergedocs,
another indexwriter instance variable: while merging segments, lucene ensuresthat no segment with more than maxmergedocs documents is created. for instance,
suppose you set maxmergedocs to 1,000. when you add the ten-thousandth document,
instead of merging multiple segments into a single segment of size 10,000,
lucene creates the tenth segment of size 1,000 and keeps adding new segments
of size 1,000 for every 1,000 documents added.
indexwriter的mergefactory允許你來控制索引在寫入磁盤之前記憶體中能緩存的document數量,以及合并
多個段檔案的頻率。預設這個值為10. 當往記憶體中存儲了10個document,此時lucene還沒有把單個段檔案
寫入磁盤,mergefactor值等于10也意味着當硬碟上的段檔案數量達到10,lucene将會把這10個段檔案合
并到一個段檔案中。例如:如果你把mergefactor設定為10,當你往索引中添加了10個document,一個段
檔案将會在硬碟上被建立,當第10個段檔案被添加時,這10個段檔案就會被合并到1個段檔案,此時這個
段檔案中有100個document,當10個這樣的包含了100個document的段檔案被添加時,他們又會被合并到一
個新的段檔案中,而此時這個段檔案包含 1000個document,以此類推。是以,在任何時候,在索引中不
存在超過9個段檔案。每個被合并的段檔案包含的document個數都是10,但這樣有點小問題,我們還必須
設定一個maxmergedocs變量,當合并段檔案的時候,lucene必須確定沒有哪個段檔案超過maxmergedocs
變量規定的最大document數量。設定maxmergedocs的目的是為了防止單個段檔案中包含的document數量
過大,假定你把maxmergedocs設定為1000,當你建立第10個包含1000個document段檔案的時候,這時并
不會觸發段檔案合并(如果沒有設定maxmergedocs為100的話,按理來說,這10個包含了1000個document
的段檔案将會被合并到一個包含了10000個document的段檔案當中,但maxmergedocs限制了單個段檔案中
最多包含1000個document,是以此時并不會觸發段合并操作)。影響段合并還有一些其他參數,比如:
mergefactor:當大小幾乎相當的段的數量達到此值的時候,開始合并。
minmergesize:所有大小小于此值的段,都被認為是大小幾乎相當,一同參與合并。
maxmergesize:當一個段的大小大于此值的時候,就不再參與合并。
maxmergedocs:當一個段包含的文檔數大于此值的時候,就不再參與合并。
段合并分兩個步驟:
1.首先篩選出哪些段需要合并,這一步由mergepolicy合并政策類來決定
2.然後就是真正的段合并過程了,這一步是交給mergescheduler來完成的,mergescheduler類主要做兩件事:
a.對存儲域,項向量,标準化因子即norms等資訊進行合并
b.對反向索引資訊進行合并
尼瑪扯遠了,接着繼續我們的solrconfig.xml中影響索引建立的一些參數配置;
<mergescheduler class="org.apache.lucene.index.concurrentmergescheduler"/>
mergescheduler剛才提到過了,這是用來配置段合并操作的處理類。預設實作類是lucene中自帶的concurrentmergescheduler。
<locktype>${solr.lock.type:native}</locktype>
這個是用來指定lucene中lockfactory實作的,可配置項如下:
single = singleinstancelockfactory - suggested for a
single:表示隻讀鎖,沒有另外一個處理線程會去修改索引資料
native:即lucene中的nativefslockfactory實作,使用的是基于作業系統的本地檔案鎖
simple:即lucene中的simplefslockfactory實作,通過在硬碟上建立write.lock鎖檔案實作
defaults:從solr3.6版本開始,這個預設值是native,否則,預設值就是simple,意思就是說,你如果配置為defaults,到底使用哪種鎖實作,取決于你目前使用的solr版本。
<unlockonstartup>false</unlockonstartup>
如果這個設定為true,那麼在solr啟動後,indexwriter和commit送出操作擁有的鎖将會被釋放,這會打破lucene的鎖機制,請謹慎使用。如果你的locktype設定為single,那麼這個配置true or false都不會産生任何影響。
<deletionpolicy class="solr.solrdeletionpolicy">
用來配置索引删除政策的,預設使用的是solr的solrdeletionpolicy實作。如果你需要自定義删除政策,那麼你需要實作lucene的org.apache.lucene.index.indexdeletionpolicy接口。
<jmx />
這個配置是用來在solr中啟用jmx,有關這方面的詳細資訊,請移步到solr官方wiki,通路位址如下:
<a href="http://wiki.apache.org/solr/solrjmx">http://wiki.apache.org/solr/solrjmx</a>
<updatehandler class="solr.directupdatehandler2">
指定索引更新操作處理類,directupdatehandler2是一個高性能的索引更新處理類,它支援軟送出
<updatelog>
<str name="dir">${solr.ulog.dir:}</str>
</updatelog>
<updatelog>用來指定上面的updatehandler的處理事務日志存放路徑的,預設值是solr的data目錄即solr的datadir配置的目錄。
<query>标簽是有關索引查詢相關的配置項
<maxbooleanclauses>1024</maxbooleanclauses>
表示booleanquery最大能連結多少個子query,當不同的core下的solrconfig.xml中此配置項的參數值配置的不一樣時,以最後一個初始化的core的配置為準。
<filtercache class="solr.fastlrucache"
size="512"
initialsize="512"
autowarmcount="0"/>
用來配置filter過濾器的緩存相關的參數
<queryresultcache class="solr.lrucache"
size="512"
initialsize="512"
autowarmcount="0"/>
用來配置對query傳回的查詢結果集即topdocs的緩存
<documentcache class="solr.lrucache"
size="512"
initialsize="512"
autowarmcount="0"/>
用來配置對document中存儲域的緩存,因為每次從硬碟上加載存儲域的值都是很昂貴的操作,這裡說的存儲域指的是那些store.yes的field,是以你懂的。
<fieldvaluecache class="solr.fastlrucache"
size="512"
autowarmcount="128"
showitems="32" />
這個配置是用來緩存document id的,用來快速通路你的document id的。這個配置項預設就是開啟的,無需顯式配置。
<cache name="myusercache"
class="solr.lrucache"
size="4096"
initialsize="1024"
autowarmcount="1024"
regenerator="com.mycompany.myregenerator"
/>
這個配置是用來配置你的自定義緩存的,你自己的regenerator需要實作solr的cacheregenerator接口。
<enablelazyfieldloading>true</enablelazyfieldloading>
表示啟用存儲域的延遲加載,前提是你的存儲域在query的時候沒有顯式指定需要return這個域。
<usefilterforsortedquery>true</usefilterforsortedquery>
表示當你的query沒有使用score進行排序時,是否使用filter來替代query.
<listener event="newsearcher" class="solr.querysenderlistener">
<arr name="queries">
<!--
<lst><str name="q">solr</str><str name="sort">price asc</str></lst>
<lst><str name="q">rocks</str><str name="sort">weight asc</str></lst>
-->
</arr>
</listener>
querysenderlistener用來監聽查詢發送過程,即你可以在query請求發送之前追加一些請求參數,如上面給的示例中,可以追加qery關鍵字以及sort排序規則。
<requestdispatcher handleselect="false" >
這個select請求是為了相容先前的舊版本,已經不推薦使用。
<httpcaching never304="true" />
表示solr伺服器段永遠不傳回304,那http響應狀态碼304表示什麼呢?表示伺服器端告訴用戶端,你請求的資源尚未被修改過,我傳回給你的是上次緩存的内容。never304即告訴伺服器,不管我通路的資源有沒有更新過,都給我重新傳回不走http緩存。這屬于http協定相關知識,不清楚的請去google http協定詳細了解去。
<requesthandler name="/query" class="solr.searchhandler">
<str name="echoparams">explicit</str>
<str name="wt">json</str>
<str name="indent">true</str>
其他的一些requesthandler說明就略過了,其實都大同小異,就是一個請求url跟請求處理類的一個映射,就好比springmvc中請求url和controller類的一個映射。
<searchcomponent name="spellcheck" class="solr.spellcheckcomponent">
用來配置查詢元件比如spellcheckcomponent拼寫檢查,有關拼寫檢查的詳細配置說明留到以後說到spellcheck時再說吧。
<searchcomponent name="terms" class="solr.termscomponent"/>
用來傳回所有的term以及每個document中term的出現頻率
<searchcomponent class="solr.highlightcomponent" name="highlight">
用來配置關鍵字高亮的,solr高亮配置的詳細說明這裡暫時先略過,這篇我們隻是先暫時大緻了解下每個配置項的含義即可,具體如何使用留到後續再深入研究。
有關searchcomponent查詢元件的其他配置我就不一一說明了,太多了。你們自己看裡面的英文注釋吧,如果你實在看不懂再來問我。
<queryresponsewriter name="json" class="solr.jsonresponsewriter">
<!-- for the purposes of the tutorial, json responses are written as
plain text so that they are easy to read in *any* browser.
if you expect a mime type of "application/json" just remove this override.
<str name="content-type">text/plain; charset=utf-8</str>
</queryresponsewriter>
這個是用來配置solr響應資料轉換類,jsonresponsewriter就是把http響應資料轉成json格式,content-type即response響應頭資訊中的content-type,即告訴用戶端傳回的資料的mime類型為text/plain,且charset字元集編碼為utf-8.
内置的響應資料轉換器還有velocity,xslt等,如果你想自定義一個基于freemarker的轉換器,那你需要實作solr的queryresponsewriter接口,模仿其他實作類,你懂的,然後在solrconfig.xml中添加類似的<queryresponsewriter配置即可
最後需要說明下的是solrconfig.xml中有大量類似<arr> <list> <str> <int>這樣的自定義标簽,下面做個統一的說明:
這張圖摘自于solr in action這本書,由于是英文的,是以我稍微解釋下:
arr:即array的縮寫,表示一個數組,name即表示這個數組參數的變量名
lst即list的縮寫,但注意它裡面存放的是key-value鍵值對
bool表示一個boolean類型的變量,name表示boolean變量名,
同理還有int,long,float,str等等
str即string的縮寫,唯一要注意的是arr下的str子元素是沒有name屬性的,而list下的str元素是有name屬性的
最後總結下:
solrconfig.xml中的配置項主要分以下幾大塊:
1.依賴的lucene版本配置,這決定了你建立的lucene索引結構,因為lucene各版本之間的索引結構并不是完全相容的,這個需要引起你的注意。
2.索引建立相關的配置,如索引目錄,indexwriterconfig類中的相關配置(它決定了你的索引建立性能)
3.solrconfig.xml中依賴的外部jar包加載路徑配置
4.jmx相關配置
5.緩存相關配置,緩存包括過濾器緩存,查詢結果集緩存,document緩存,以及自定義緩存等等
6.updatehandler配置即索引更新操作相關配置
7.requesthandler相關配置,即接收用戶端http請求的處理類配置
8.查詢元件配置如hightlight,spellchecker等等
9.responsewriter配置即響應資料轉換器相關配置,決定了響應資料是以什麼樣格式傳回給用戶端的。
10.自定義valuesourceparser配置,用來幹預document的權重、評分,排序
solrconfig.xml就解釋到這兒了,了解這些配置項是為後續solr學習掃清障礙。有些我沒說到的或者我有意略過的,就留給你們自己去閱讀和了解了,畢竟内容太多,1000多行的配置,一行不拉的解釋完太耗時,有些都是類似的配置,我想你們應該能看懂。
如果你還有什麼問題請加我Q-q:7-3-6-0-3-1-3-0-5,
或者加裙
一起交流學習!
轉載:http://iamyida.iteye.com/blog/2211728