Lucene5学习之PhraseQuery短语查询

phrasequery：短语查询，就是查询文档中是否包含指定的一个term或多个term,多个term之间可以指定间隔即slop参数，官方api解释如图：

使用示例代码，如下：

package com.yida.framework.lucene5.query;

import java.io.ioexception;

import org.apache.lucene.analysis.analyzer;

import org.apache.lucene.analysis.standard.standardanalyzer;

import org.apache.lucene.document.document;

import org.apache.lucene.document.field;

import org.apache.lucene.document.textfield;

import org.apache.lucene.index.directoryreader;

import org.apache.lucene.index.indexreader;

import org.apache.lucene.index.indexwriter;

import org.apache.lucene.index.indexwriterconfig;

import org.apache.lucene.index.indexwriterconfig.openmode;

import org.apache.lucene.index.term;

import org.apache.lucene.search.indexsearcher;

import org.apache.lucene.search.phrasequery;

import org.apache.lucene.search.scoredoc;

import org.apache.lucene.search.topdocs;

import org.apache.lucene.store.directory;

import org.apache.lucene.store.ramdirectory;

public class phrasequerytest {

public static void main(string[] args) throws ioexception {

directory dir = new ramdirectory();

analyzer analyzer = new standardanalyzer();

indexwriterconfig iwc = new indexwriterconfig(analyzer);

iwc.setopenmode(openmode.create);

indexwriter writer = new indexwriter(dir, iwc);

document doc = new document();

doc.add(new textfield("text", "quick brown fox", field.store.yes));

writer.adddocument(doc);

doc = new document();

doc.add(new textfield("text", "jumps over lazy broun dog", field.store.yes));

doc.add(new textfield("text", "jumps over extremely very lazy broxn dog", field.store.yes));

writer.close();

indexreader reader = directoryreader.open(dir);

indexsearcher searcher = new indexsearcher(reader);

string term1 = "dog";

string term2 = "jumps";

phrasequery phrasequery = new phrasequery();

phrasequery.add(new term("text",term1));

phrasequery.add(new term("text",term2));

phrasequery.setslop(15);

topdocs results = searcher.search(phrasequery, null, 100);

scoredoc[] scoredocs = results.scoredocs;

for (int i = 0; i < scoredocs.length; ++i) {

//system.out.println(searcher.explain(query, scoredocs[i].doc));

int docid = scoredocs[i].doc;

document document = searcher.doc(docid);

string path = document.get("text");

system.out.println("text:" + path);

}

pharsequery.add(term),每次都是add到末尾，当然你也可以用add(term,position)明确指定add到哪个位置，示例代码中add了两个term,则我们的查询短语是dog jumps,他们的间隔为0，然后我们设置slop值为5，

第2个索引文档里单词jumps往右移动5次刚好可以得到我们的查询短语dog jumps,因此它符合要求被返回了，而第1个索引文档直接不包含单词dog不符合要求，第3个索引文档需要移动7次才能得到dog jumps,所以最后返回的只有第2个索引文档。

如果我把代码变一下，改成这样：

string term1 = "dog";

string term2 = "jumps";

phrasequery phrasequery = new phrasequery();

phrasequery.add(new term("text",term1),0);

phrasequery.add(new term("text",term2),2);

phrasequery.setslop(6);

topdocs results = searcher.search(phrasequery, null, 100);

这时候我们的查询短语就是dog xxx jumps,意思就是我们要查询包含dog和jumps字符的文档而且dog和jumps之间要有一个字符间隔(不包含停用词)，这时候我们的slop就要加1了，即我们需要再多移动一次，所以这次slop值应该为6.

如果你还有什么问题请加我Ｑ-q：7-3-6-0-3-1-3-0-5，

或者加裙

一起交流学习！

转载：http://iamyida.iteye.com/blog/2195838

Lucene5学习之PhraseQuery短语查询

继续阅读

数据结构与算法（27）——排序（二）

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的简单使用

neo4j之cypher使用文档

Dijkstra--简易版（最短路径）

GitHub连夜封杀！这份阿里 10W 字内部 Java 字面试手册到底有多强？

spark/scala关于【资源文件】加载方法概述外部文件加载方案测试资源文件打包入jar包中小结

mybatis_入门程序Mybatis入门

AOP编程_Android优雅权限框架(1)概念基础，2021金三银四前言正文大纲正文

Effective Java 8:通用程序设计

OOM三种类型

工厂模式-三种类型

【递归】高效率求2的n次幂

win10本地scala和spark安装安装scala安装spark

scala (3) Function 和 Method

hdu7108哈希