phrasequery:短语查询,就是查询文档中是否包含指定的一个term或多个term,多个term之间可以指定间隔即slop参数,官方api解释如图:

使用示例代码,如下:
package com.yida.framework.lucene5.query;
import java.io.ioexception;
import org.apache.lucene.analysis.analyzer;
import org.apache.lucene.analysis.standard.standardanalyzer;
import org.apache.lucene.document.document;
import org.apache.lucene.document.field;
import org.apache.lucene.document.textfield;
import org.apache.lucene.index.directoryreader;
import org.apache.lucene.index.indexreader;
import org.apache.lucene.index.indexwriter;
import org.apache.lucene.index.indexwriterconfig;
import org.apache.lucene.index.indexwriterconfig.openmode;
import org.apache.lucene.index.term;
import org.apache.lucene.search.indexsearcher;
import org.apache.lucene.search.phrasequery;
import org.apache.lucene.search.scoredoc;
import org.apache.lucene.search.topdocs;
import org.apache.lucene.store.directory;
import org.apache.lucene.store.ramdirectory;
public class phrasequerytest {
public static void main(string[] args) throws ioexception {
directory dir = new ramdirectory();
analyzer analyzer = new standardanalyzer();
indexwriterconfig iwc = new indexwriterconfig(analyzer);
iwc.setopenmode(openmode.create);
indexwriter writer = new indexwriter(dir, iwc);
document doc = new document();
doc.add(new textfield("text", "quick brown fox", field.store.yes));
writer.adddocument(doc);
doc = new document();
doc.add(new textfield("text", "jumps over lazy broun dog", field.store.yes));
doc.add(new textfield("text", "jumps over extremely very lazy broxn dog", field.store.yes));
writer.close();
indexreader reader = directoryreader.open(dir);
indexsearcher searcher = new indexsearcher(reader);
string term1 = "dog";
string term2 = "jumps";
phrasequery phrasequery = new phrasequery();
phrasequery.add(new term("text",term1));
phrasequery.add(new term("text",term2));
phrasequery.setslop(15);
topdocs results = searcher.search(phrasequery, null, 100);
scoredoc[] scoredocs = results.scoredocs;
for (int i = 0; i < scoredocs.length; ++i) {
//system.out.println(searcher.explain(query, scoredocs[i].doc));
int docid = scoredocs[i].doc;
document document = searcher.doc(docid);
string path = document.get("text");
system.out.println("text:" + path);
}
}
}
pharsequery.add(term),每次都是add到末尾,当然你也可以用add(term,position)明确指定add到哪个位置,示例代码中add了两个term,则我们的查询短语是dog jumps,他们的间隔为0,然后我们设置slop值为5,
第2个索引文档里单词jumps往右移动5次刚好可以得到我们的查询短语dog jumps,因此它符合要求被返回了,而第1个索引文档直接不包含单词dog不符合要求,第3个索引文档需要移动7次才能得到dog jumps,所以最后返回的只有第2个索引文档。
如果我把代码变一下,改成这样:
string term1 = "dog";
string term2 = "jumps";
phrasequery phrasequery = new phrasequery();
phrasequery.add(new term("text",term1),0);
phrasequery.add(new term("text",term2),2);
phrasequery.setslop(6);
topdocs results = searcher.search(phrasequery, null, 100);
这时候我们的查询短语就是dog xxx jumps,意思就是我们要查询包含dog和jumps字符的文档而且dog和jumps之间要有一个字符间隔(不包含停用词),这时候我们的slop就要加1了,即我们需要再多移动一次,所以这次slop值应该为6.
如果你还有什么问题请加我Q-q:7-3-6-0-3-1-3-0-5,
或者加裙
一起交流学习!
转载:http://iamyida.iteye.com/blog/2195838