Lucene5學習之NumericRangeQuery使用

說到numericrangequery查詢，你們肯定覺得很簡單，不就是數字範圍查詢嗎？使用者提供一個上限值和一個下限值，底層api裡直接>min,<max,真的是這樣嗎？其實在lucene裡隻能對字元串string建立索引，那麼數字怎麼轉成string,你肯定又會想當然的認為tostring()一下就ok啦？ok，假如真的是這樣的，那字元串"3" > "26"問題怎麼解決？ok，可以通過在數字前面加前導零解決，“03”<"26"是沒錯，可是前導零加幾位沒法确定，加多了浪費硬碟空間，加少了支援索引的數字位數受限。即使你解決了位數受限問題，但lucene裡的範圍查詢本質還是通過booleanquery進行條件連接配接起來的，term條件太多還是會出現too many boolean clause異常的。其實lucene内部是把數字(int，long,float,double)轉成十六進制的數字來處理的。具體怎麼轉成的請參看numericutils這個工具類的源碼，

/**

* converts a <code>float</code> value to a sortable signed <code>int</code>.

* the value is converted by getting their ieee 754 floating-point "float format"

* bit layout and then some bits are swapped, to be able to compare the result as int.

* by this the precision is not reduced, but the value can easily used as an int.

* @see #sortableinttofloat

public static int floattosortableint(float val) {

int f = float.floattorawintbits(val);

if (f<0) f ^= 0x7fffffff;

return f;

}

上面貼的就是把float轉成十六進制的數字的代碼，裡面盡是位運算，看的人暈暈的，要完全搞懂，不是一件容易的事情。

/** this helper does the splitting for both 32 and 64 bit. */

private static void splitrange(

final object builder, final int valsize,

final int precisionstep, long minbound, long maxbound

) {

if (precisionstep < 1)

throw new illegalargumentexception("precisionstep must be >=1");

if (minbound > maxbound) return;

for (int shift=0; ; shift += precisionstep) {

// calculate new bounds for inner precision

final long diff = 1l << (shift+precisionstep),

mask = ((1l<<precisionstep) - 1l) << shift;

final boolean

haslower = (minbound & mask) != 0l,

hasupper = (maxbound & mask) != mask;

final long

nextminbound = (haslower ? (minbound + diff) : minbound) & ~mask,

nextmaxbound = (hasupper ? (maxbound - diff) : maxbound) & ~mask;

lowerwrapped = nextminbound < minbound,

upperwrapped = nextmaxbound > maxbound;

if (shift+precisionstep>=valsize || nextminbound>nextmaxbound || lowerwrapped || upperwrapped) {

// we are in the lowest precision or the next precision is not available.

addrange(builder, valsize, minbound, maxbound, shift);

// exit the split recursion loop

break;

}

if (haslower)

addrange(builder, valsize, minbound, minbound | mask, shift);

if (hasupper)

addrange(builder, valsize, maxbound & ~mask, maxbound, shift);

// recurse to next precision

minbound = nextminbound;

maxbound = nextmaxbound;

}

說實話，我還沒有完全參透這段源碼，留着以後有空研究算法的時候再來啃這塊骨頭吧。

上面說了一大堆廢話，都是涉及底層數字範圍查詢設計原理的東西，隻說了個大概，具體實作涉及的算法和原理我也還沒參透，表示很抱歉，如果你對這方面算法很了解，麻煩請告知我，謝謝！

numericrangequery原理了解起來很難，但使用起來卻是非常簡單：

query q = numericrangequery.newfloatrange("weight", 0.03f, 0.10f, true, true);

後面兩個boolean值用來控制是否包含兩個上下邊界值的。

不過要注意的是numericrangequery隻對intfield,longfield,floatfield,doublefield等這些表示數字的field域有效,numericrangequery還有一個比較重要的設定就是precision step，何為precision step呢？翻譯過來就是精度步長，還是不夠直覺無法了解，對不對？說通俗一點就是拿多大一個長度來截取term，因為你的數字轉成十六進制的字元串後，可能很長，需要按照一定的步長截取成多個term進行索引的，比如“1111101111111011”，如果你的precision step值為16的話(不同資料類型的步長預設值不同，都定義在numericutils工具類裡)，那最終隻有1個term，如果precision step值為8，那最終索引中就會有2個term,這就是為什麼官方api裡說percisionstep值越小會越占硬碟空間但搜尋速度越快了。term多了肯定越占硬碟空間了。 numericrangequery就說到這兒了，thanks all.

如果你還有什麼問題請加我Ｑ-q：7-3-6-0-3-1-3-0-5，

或者加裙

一起交流學習！

轉載：http://iamyida.iteye.com/blog/2194799

Lucene5學習之NumericRangeQuery使用

繼續閱讀

資料結構與算法（27）——排序（二）

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的簡單使用

neo4j之cypher使用文檔

Dijkstra--簡易版（最短路徑）

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

mybatis_入門程式Mybatis入門

AOP程式設計_Android優雅權限架構(1)概念基礎，2021金三銀四前言正文大綱正文

Effective Java 8:通用程式設計

OOM三種類型

工廠模式-三種類型

【遞歸】高效率求2的n次幂

win10本地scala和spark安裝安裝scala安裝spark

scala (3) Function 和 Method

hdu7108哈希