lucene分词分析器Analyzer

SimpleAnalyzer

StandardAnalyzer

WhitespaceAnalyzer

StopAnalyzer

测试代码：

import java.io.Reader;

import java.io.StringReader;

import org.apache.lucene.analysis.Analyzer;

import org.apache.lucene.analysis.SimpleAnalyzer;

import org.apache.lucene.analysis.StopAnalyzer;

import org.apache.lucene.analysis.StopFilter;

import org.apache.lucene.analysis.Token;

import org.apache.lucene.analysis.Tokenizer;

import org.apache.lucene.analysis.WhitespaceAnalyzer;

import org.apache.lucene.analysis.standard.StandardAnalyzer;

public class TestAnalyzer {

private static String testString1 = "The quick brown fox jumped over the lazy dogs";

private static String testString2 = "xy&z mail is - [email protected]";

public static void testWhitespace(String testString) throws Exception{

Analyzer analyzer = new WhitespaceAnalyzer();

Reader r = new StringReader(testString);

Tokenizer ts = (Tokenizer) analyzer.tokenStream("", r);

System.err.println("=====Whitespace analyzer====");

System.err.println("分析方法：空格分割");

Token t;

while ((t = ts.next()) != null) {

System.out.println(t.termText());

}

public static void testSimple(String testString) throws Exception{

Analyzer analyzer = new SimpleAnalyzer();

Reader r = new StringReader(testString);

Tokenizer ts = (Tokenizer) analyzer.tokenStream("", r);

System.err.println("=====Simple analyzer====");

System.err.println("分析方法：空格及各种符号分割");

Token t;

while ((t = ts.next()) != null) {

System.out.println(t.termText());

}

public static void testStop(String testString) throws Exception{

Analyzer analyzer = new StopAnalyzer();

Reader r = new StringReader(testString);

StopFilter sf = (StopFilter) analyzer.tokenStream("", r);

System.err.println("=====stop analyzer====");

System.err.println("分析方法：空格及各种符号分割,去掉停止词，停止词包括 is,are,in,on,the等无实际意义的词");

//停止词

Token t;

while ((t = sf.next()) != null) {

System.out.println(t.termText());

}

public static void testStandard(String testString) throws Exception{

Analyzer analyzer = new StandardAnalyzer();

Reader r = new StringReader(testString);

StopFilter sf = (StopFilter) analyzer.tokenStream("", r);

System.err.println("=====standard analyzer====");

System.err.println("分析方法：混合分割,包括了去掉停止词，支持汉语");

Token t;

while ((t = sf.next()) != null) {

System.out.println(t.termText());

}

public static void main(String[] args) throws Exception{

// String testString = testString1;

String testString = testString2;

System.out.println(testString);

testWhitespace(testString);

testSimple(testString);

testStop(testString);

testStandard(testString);

}

运行结果：

xy&z mail is - [email protected]

=====Whitespace analyzer====

分析方法：空格分割

xy&z

mail

[email protected]

=====Simple analyzer====

分析方法：空格及各种符号分割

mail

xyz

sohu

com

=====stop analyzer====

分析方法：空格及各种符号分割,去掉停止词，停止词包括 is,are,in,on,the等无实际意义

的词

mail

xyz

sohu

com

=====standard analyzer====

分析方法：混合分割,包括了去掉停止词，支持汉语

xy&z

mail

[email protected]

lucene分词分析器Analyzer

继续阅读

httpd dead but subsys locked;No space left on device:Couldn't create accept loc

apache (httpd)不支持中文路径问题先卸载yum安装的httpd再用源码安装，重装httpd再安装支持中文的插件遇到问题

搭建httpd服务

windows下配置Apache的vhost初次接触，强烈欢迎拍砖，指出错误

Apache与PHP环境下配置本地虚拟主机

Linux 7 中配置Apache服务，及禁止ip访问，删除apache广告页面。

Apache配置文件中的deny和allow的使用

Apache 配置默认编码

服务器配置——Apache

Apache静态文件访问配置（书封服务器）

apache httpd 配置

Ubuntu16.04安装Apache+MySQL+PHP1. 安装Apache2. 安装MySQL3. 安装PHP4. 安装phpMyAdmin

Apache配置SSLApache配置SSL

Windows下配置Apache的SSL服务

Apache2.4.x 配置文件详解Apache配置需要了解如下：开始讲解：

配置apache支持PHP（win7）