天天看点

Lucene4.9使用 mmseg4j1.9遇到的问题,修改mmseg4j源码解决了

今天在写一个Lucene4.9demo的时候,直接用mmseg4j1.9分词器。但是程序报出了异常。

java.lang.IllegalStateException: TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.
	at org.apache.lucene.analysis.Tokenizer$1.read(Tokenizer.java:111)
	at java.io.BufferedReader.fill(BufferedReader.java:161)
	at java.io.BufferedReader.read(BufferedReader.java:182)
	at java.io.FilterReader.read(FilterReader.java:65)
	at java.io.PushbackReader.read(PushbackReader.java:90)
	at com.chenlb.mmseg4j.MMSeg.readNext(MMSeg.java:42)
	at com.chenlb.mmseg4j.MMSeg.next(MMSeg.java:64)
	at com.chenlb.mmseg4j.analysis.MMSegTokenizer.incrementToken(MMSegTokenizer.java:64)
	at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:604)
	at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:342)
	at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:301)
	at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:222)
	at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:450)
	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1507)
	at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1222)
	at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1203)
	at LuceneDemo.createIndex(LuceneDemo.java:37)
	at LuceneTest.test1(LuceneTest.java:8)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:483)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:73)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:46)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:180)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:41)
	at org.junit.runners.ParentRunner$1.evaluate(ParentRunner.java:173)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:220)
	at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:46)
	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)

           

网上查需要修改mmseg4j的源码。

要修改的地方如下

Lucene4.9使用 mmseg4j1.9遇到的问题,修改mmseg4j源码解决了

修改 MMSegTokenizer 类的reset方法(其实就是加一句话)

public void reset() throws IOException {
		//lucene 4.0
		//org.apache.lucene.analysis.Tokenizer.setReader(Reader)
		//setReader 自动被调用, input 自动被设置。
		super.reset();   //加这一句
		mmSeg.reset(input);
	}
           

修改好后,生成class文件,然后替换原来jar包里面的那个class文件,在重新打jar包就可以了。 楼主新人,第一次修改源码,虽然就是在网上抄了一句话。但是还是很开心的。学习了如何修改源码,如何在打jar等。大牛勿喷哈。

继续阅读