3.Lucene3.x API分析，Director 索引操作目錄，Document，分詞器

1 lucene卡發包結構分析

包名

功能

org.apache.lucene.analysis

analysis提供自帶的各種analyzer

org.apache.lucene.collation

包含collationkeyfilter和collationkeyanalyzer兩個相同功能的類，将所有token轉為collationkey，與indexablebinarystringtools一起存為term

org.apache.lucene.document

document包中是document相關各種資料結構，如document類，field類等

org.apache.lucene.index

index包中是索引的讀寫操作類，常用的是對索引檔案的segment進行寫、合并和優化的indexwriter類和對索引進行讀取和删除操作的indexreader類

org.apache.lucene.queryparser

queryparser包中是解析查詢語句相關的類（常用的是queryparser類）

org.apache.lucene.search

檢索管理，根據查詢條件，檢索得到結果search包中是從索引中進行搜尋的各種不同的query類(如termquery、booleanquery等)和搜尋結果集hits類

org.apache.lucene.store

store包中是索引的存儲相關類，如directory類定義了索引檔案的存儲結構，fsdirectory是存儲在檔案系統（即磁盤）中的索引存儲類，ramdirectory為存儲在記憶體中的索引存儲類

org.apache.lucene.util

util包中是公共工具類，例如時間和字元串之間的轉換工具

2 director

索引操作目錄

fsdirectory :磁盤路徑，在磁盤中建立檔案索引庫

ramdirectory:記憶體路徑，指在記憶體中建立檔案索引庫

//目前工程index目錄，相對路徑

fsdirectory.open(new

file("index"));

//絕對路徑

file("d:\\index"));

//在類路徑下建立

file(lucenetest.class.getresource("/").getfile()));

//記憶體路徑

ramdirectory

directory = new ramdirectory();

分詞器(主要要完全搜尋的不要分詞，比如當查詢書的書号時不分詞)

analyzer 分詞器

new standardanalyzer(version.lucene_36); //建立标準分詞器，對于漢子采用單自分詞

document索引中文對象，field文檔内部資料資訊

每個資料對象，對應一個document對象

對應一個屬性，對應一個field對象

newfield(fieldname,value,store,index);

将資料建立索引庫field，store決定是否存儲，index決定是否索引分詞

store.yes

存儲

、store.no

不存儲

index.no

不建立索引

index.analyzed

分詞建立索引

儲存權重資訊

index.not_analyzed

不分詞建立索引

index.analyzed_no_norms

分詞建立索引，不存放權重資訊

index.not_analyzed_no_norms

不分詞建立索引，不存放權重資訊

document document =

new document();

document.add(new

field("id", article.getid() +

"", store.yes,

index.not_analyzed));//對于id通常不分詞的

document.add(newfield("title",article.gettitle(),store.yes,index.analyzed));

field("content", article.getcontent(), store.yes,index.analyzed));

@test

查詢索引庫,檢視norms效果

public

void testquery()

throws exception {

建立query對象--根據标題

string querystring = "lucene";

第一個參數，版本号

第二個參數，字段

第三個參數，分詞器

analyzer analyzer = new standardanalyzer(version.lucene_36);

queryparser queryparser = new queryparser(version.lucene_36,

"content",

analyzer);

query query = queryparser.parse(querystring);

根據query查找

索引目錄位置

directory directory = fsdirectory.open(new

indexsearcher indexsearcher = new indexsearcher(

indexreader.open(directory));

查詢滿足結果的前100條資料

topdocs topdocs = indexsearcher.search(query, 100);

system.out.println("滿足結果記錄條數："

+ topdocs.totalhits);

擷取結果

scoredoc[] scoredocs = topdocs.scoredocs;

for (int

i = 0; i < scoredocs.length; i++) {

先獲得document下标

int docid = scoredocs[i].doc;

document document = indexsearcher.doc(docid);

system.out.println("得分："

+ scoredocs[i].score);

system.out.println("id:"

+ document.get("id"));

system.out.println("title:"

+ document.get("title"));

system.out.println("content:"

+ document.get("content"));

}

indexsearcher.close();

運作結果：

3.Lucene3.x API分析，Director 索引操作目錄，Document，分詞器

是否分詞，

根據業務查找條件決定

是否存儲，

根據業務是否需要傳回結果資料

決定

norm是按照詞頻計算的

問題：index.analyzed　和　index.analyzed_no_norms　差別　

index.analyzed　會儲存權重資訊

index.analyzed_no_norms　不會儲存權重資訊

權重會影響得分，得分計算排名，

搜尋技術搜尋結果

一定要進行排序，按照得分

不儲存norm值，預設按照 1.0

計算

* norm

是按照詞條數

計算，值<= 1

index.analyzed_no_norms

效率會高一些

索引建立過程

分詞器analyzer

目錄directory

進入索引寫入，必須使用indexwriter,但是在初始化indexwriter過程中，對目标索引庫加鎖。

當試圖對一個索引庫建立多個indexwriter時，報異常

org.apache.lucene.util.setonce$alreadysetexception:the object cannot be set twice!

*使用同一 indexwriterconfig

兩次

org.apache.lucene.store.lockobtainfailedexception:lock obtain timed out:nativefslock@d:\work\javaee20130408\lucene3_day1\index\write.lock

*試圖建立第二個indexwriter

，第一個indexwriter

還沒有關閉，鎖檔案還在

問題：如果兩個線程同時對一個索引庫操作怎麼辦？---解決辦法：隻能使用同一個indexwriter對象

3.Lucene3.x API分析，Director 索引操作目錄，Document，分詞器

繼續閱讀

apache (httpd)不支援中文路徑問題先解除安裝yum安裝的httpd再用源碼安裝，重裝httpd再安裝支援中文的插件遇到問題

搭建httpd服務

windows下配置Apache的vhost初次接觸，強烈歡迎拍磚，指出錯誤

Apache與PHP環境下配置本地虛拟主機

Linux 7 中配置Apache服務，及禁止ip通路，删除apache廣告頁面。

Apache配置檔案中的deny和allow的使用

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合

Apache 配置預設編碼

伺服器配置——Apache

Apache靜态檔案通路配置（書封伺服器）

apache httpd 配置

Ubuntu16.04安裝Apache+MySQL+PHP1. 安裝Apache2. 安裝MySQL3. 安裝PHP4. 安裝phpMyAdmin

Apache配置SSLApache配置SSL

Windows下配置Apache的SSL服務

Apache2.4.x 配置檔案詳解Apache配置需要了解如下：開始講解：

配置apache支援PHP（win7）