天天看點

lucene IndexOptions可以設定DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS DOCS,ES裡也可以設定org.apache.lucene.indexJava Code Examples for org.apache.lucene.index.IndexOptions

Only documents and term frequencies are indexed: positions are omitted.

Indexes documents, frequencies and positions.

Indexes documents, frequencies, positions and offsets.

Only documents are indexed: term frequencies and positions are omitted.

Example 4

6 votes

ES裡,

first of all index_options & term_vectors are two totally different things. 

index_options are "options" for the index you are searching on, a 

datastructure that holds "terms" to document lists (posting lists). 

TermVectors are a datastructure that gives you the "terms" for a given 

document and in addition their position in the document as well as their 

start and end character offsets. Now the index (each field has such an 

index) holds a sorted list of terms and each term points to a posting list. 

these posting lists are a list of documents that contain the term. On the 

posting list you can also store information like frequencies (how often did 

term Y occur in document X -> useful for scoring) as well as "positions" 

(at which position did term Y occur in document X -> this is required fo 

phrase & span queries). 

if you have for instance a field that you only use for filtering you don't 

need freqs and postions so documents only will do the job. In an index the 

position information is the biggest piece of data usually aside stored 

fields. If you don't do phrase queries or spans you don't need them at all 

so safe the disk space and improve perf by only use docs and freqs. In 

previous version it wasn't possible to have only freqs but no positions 

(index_options supersede omit_term_frequencies_and_positions) so this is an 

improvement overall since the most common usecase might only need freqs but 

no positions. 

附上一些選項:

1:term_vector

TermVector.YES: Only store number of occurrences.

TermVector.WITH_POSITIONS: Store number of occurrence and positions of terms, but no offset.

TermVector.WITH_OFFSETS: Store number of occurrence and offsets of terms, but no positions.

TermVector.WITH_POSITIONS_OFFSETS:number of occurrence and positions , offsets of terms.

TermVector.NO:Don't store any term vector information.

2: index_options

Allows to set the indexing options, possible values are docs (only doc numbers are indexed), freqs (doc numbers and term frequencies), and positions (doc numbers, term frequencies and positions). Defaults to positions for analyzed fields, and to docs for not_analyzed fields. It is also possible to set it to offsets (doc numbers, term frequencies, positions and offsets).

參考:https://lucene.apache.org/core/4_1_0/core/org/apache/lucene/index/FieldInfo.IndexOptions.html

http://elasticsearch.cn/question/119

本文轉自張昺華-sky部落格園部落格,原文連結:http://www.cnblogs.com/bonelee/p/6397455.html,如需轉載請自行聯系原作者