Lucene5學習之PhraseQuery短語查詢

phrasequery：短語查詢，就是查詢文檔中是否包含指定的一個term或多個term,多個term之間可以指定間隔即slop參數，官方api解釋如圖：

使用示例代碼，如下：

package com.yida.framework.lucene5.query;

import java.io.ioexception;

import org.apache.lucene.analysis.analyzer;

import org.apache.lucene.analysis.standard.standardanalyzer;

import org.apache.lucene.document.document;

import org.apache.lucene.document.field;

import org.apache.lucene.document.textfield;

import org.apache.lucene.index.directoryreader;

import org.apache.lucene.index.indexreader;

import org.apache.lucene.index.indexwriter;

import org.apache.lucene.index.indexwriterconfig;

import org.apache.lucene.index.indexwriterconfig.openmode;

import org.apache.lucene.index.term;

import org.apache.lucene.search.indexsearcher;

import org.apache.lucene.search.phrasequery;

import org.apache.lucene.search.scoredoc;

import org.apache.lucene.search.topdocs;

import org.apache.lucene.store.directory;

import org.apache.lucene.store.ramdirectory;

public class phrasequerytest {

public static void main(string[] args) throws ioexception {

directory dir = new ramdirectory();

analyzer analyzer = new standardanalyzer();

indexwriterconfig iwc = new indexwriterconfig(analyzer);

iwc.setopenmode(openmode.create);

indexwriter writer = new indexwriter(dir, iwc);

document doc = new document();

doc.add(new textfield("text", "quick brown fox", field.store.yes));

writer.adddocument(doc);

doc = new document();

doc.add(new textfield("text", "jumps over lazy broun dog", field.store.yes));

doc.add(new textfield("text", "jumps over extremely very lazy broxn dog", field.store.yes));

writer.close();

indexreader reader = directoryreader.open(dir);

indexsearcher searcher = new indexsearcher(reader);

string term1 = "dog";

string term2 = "jumps";

phrasequery phrasequery = new phrasequery();

phrasequery.add(new term("text",term1));

phrasequery.add(new term("text",term2));

phrasequery.setslop(15);

topdocs results = searcher.search(phrasequery, null, 100);

scoredoc[] scoredocs = results.scoredocs;

for (int i = 0; i < scoredocs.length; ++i) {

//system.out.println(searcher.explain(query, scoredocs[i].doc));

int docid = scoredocs[i].doc;

document document = searcher.doc(docid);

string path = document.get("text");

system.out.println("text:" + path);

}

pharsequery.add(term),每次都是add到末尾，當然你也可以用add(term,position)明确指定add到哪個位置，示例代碼中add了兩個term,則我們的查詢短語是dog jumps,他們的間隔為0，然後我們設定slop值為5，

第2個索引文檔裡單詞jumps往右移動5次剛好可以得到我們的查詢短語dog jumps,是以它符合要求被傳回了，而第1個索引文檔直接不包含單詞dog不符合要求，第3個索引文檔需要移動7次才能得到dog jumps,是以最後傳回的隻有第2個索引文檔。

如果我把代碼變一下，改成這樣：

string term1 = "dog";

string term2 = "jumps";

phrasequery phrasequery = new phrasequery();

phrasequery.add(new term("text",term1),0);

phrasequery.add(new term("text",term2),2);

phrasequery.setslop(6);

topdocs results = searcher.search(phrasequery, null, 100);

這時候我們的查詢短語就是dog xxx jumps,意思就是我們要查詢包含dog和jumps字元的文檔而且dog和jumps之間要有一個字元間隔(不包含停用詞)，這時候我們的slop就要加1了，即我們需要再多移動一次，是以這次slop值應該為6.

如果你還有什麼問題請加我Ｑ-q：7-3-6-0-3-1-3-0-5，

或者加裙

一起交流學習！

轉載：http://iamyida.iteye.com/blog/2195838

Lucene5學習之PhraseQuery短語查詢

繼續閱讀

資料結構與算法（27）——排序（二）

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的簡單使用

neo4j之cypher使用文檔

Dijkstra--簡易版（最短路徑）

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

mybatis_入門程式Mybatis入門

AOP程式設計_Android優雅權限架構(1)概念基礎，2021金三銀四前言正文大綱正文

Effective Java 8:通用程式設計

OOM三種類型

工廠模式-三種類型

【遞歸】高效率求2的n次幂

win10本地scala和spark安裝安裝scala安裝spark

scala (3) Function 和 Method

hdu7108哈希