Warning: Parameter not found: enable_new_segsearch
Warning: Parameter not found: save_raw_choices
D:\devtools\Tesseract-OCR\tessdata
result:--->650 3428

一、测试图片777.png

tess4j的简单使用doOCR异常汇总：

二、测试代码：

package com.gazgeek.helloworld.tess4jTest;


import java.awt.*;
import java.io.File;
import net.sourceforge.tess4j.*;

public class Testtess {

    public static void main(String[] args) {

        File imageFile = new File("F:\\imgall\\777.png");
        Tesseract tessInst = new Tesseract();
        tessInst.setDatapath("D:\\devtools\\Tesseract-OCR\\tessdata");
        tessInst.setLanguage("eng");// eng.traineddata is in /tessdata direcotry

        try {
            String result= tessInst.doOCR(imageFile);
            System.out.println("D:\\devtools\\Tesseract-OCR\\tessdata");
            System.out.println("result:--->"  + result );
        } catch (TesseractException e) {
            System.err.println(e.getMessage());
        }

    }

}

三、测试结果：

tess4j的简单使用doOCR异常汇总：

四、FAQ

1. ERROR net.sourceforge.tess4j.Tesseract - Not a JPEG file: starts with 0x89 0x50

Solution: the file is not acctually JPEG file, select true JPEG file.

2. WARN Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.

Solution: option A, tessInst.setDatapath(System.getProperty("user.dir") + "/tessdata");

option B, set TESSDATA_PREFIX your environment. Which is Tesseract's tessdata default value. If do not set, it will

open ./*.traineddata file.

3. "Warning: Parameter not found: enable_new_segsearch"

Solution: Works with this eng.traineddata: https://github.com/tesseract-ocr/tessdata_fast/blob/master/eng.traineddata

Note: language data file best use tessdata_best's file. If you want to recognize chinese, select chi_sim.traineddata, and download it, move it in your tessdata directory.

Java's print API basically works on the assumption that everything is done at 72 dpi. This means that you can use this as bases for converting to/from different measurements

references:

1. http://www.jbrandsma.com/news/2015/12/07/ocr-with-java-and-tesseract/

2. https://sourceforge.net/projects/tess4j/

3. https://github.com/tesseract-ocr/tessdata_best

4. https://www.b4x.com/android/forum/threads/solved-tesseract-api-a-120-opotunity.101482/

6. https://www.learnopencv.com/deep-learning-based-text-recognition-ocr-using-tesseract-and-opencv/

7. https://stackoverflow.com/questions/18975595/how-to-design-an-image-in-java-to-be-printed-on-a-300-dpi-printer

五、《Mac安装Tesseract的全过程，附带完整的错误和异常的解决办法。Java开源OCR识别》

MacOS

使用

sudo port install tesseract

安装 tesseract-ocr 之后，再执行以下命令时：

sudo tesseract 1563327696899809.jpg 1563327696899809.txt -l chi_sim+eng

提示：

Warning: Parameter not found: enable_new_segsearch

Tesseract Open Source OCR Engine v4.0.0 with Leptonica

Warning: Invalid resolution 0 dpi. Using 70 instead.

Estimating resolution as 194

这是因为 -l 参数中指定的语言包都没有安装的原因，使用以下命令安装：

sudo port install tesseract-chi-sim

sudo port install tesseract-eng

异常汇总：

①Warning: Parameter not found: enable_new_segsearch

Mac出现的时候（把语言包文件拷贝到你在Java代码设定好的目录下，原因是此目录没有中文简体的语言包）

ITesseract iTesseract = new Tesseract();
iTesseract.setDatapath("你的语言包绝对路径");

②Warning: Invalid resolution 0 dpi. Using 70 instead.

ITesseract iTesseract = new Tesseract();
iTesseract.setDatapath("你的语言包绝对路径");
iTesseract.setTessVariable("user_defined_dpi", "300");

设置一下dpi即可，默认设置300是最好的

tess4j的简单使用doOCR异常汇总：

tess4j的简单使用doOCR异常汇总：

一、测试图片777.png

二、测试代码：

三、测试结果：

四、FAQ

五、《Mac安装Tesseract的全过程，附带完整的错误和异常的解决办法。Java开源OCR识别》

异常汇总：

继续阅读

Python+locust做性能测试 --- locustV1.1.1版本更新 HttpUser

Python验证码识别tesseract-ocr安装，报错解决

java实现阿里云图片文字识别

图片文字识别工具调研2.tesseract安装与使用3.python

LEADTOOLS WinRT OCR识别代码示例

场景文本检测（一）-可微分二值化在基于语义分割方法的场景文本检测中的应用动机和Contributions现存方法Methodology总结Ref

文本检测算法----DB、DBNet1. 摘要2. 算法3. 实验4. 结论

【百度OCR 封装篇】OCR封装只IOCR自定义模版或分类器封装两种调用方式

使用MODI（Microsoft Office Document Imaging）识别中文，但无法区分段落

主流深度学习OCR文字识别方法对比：Tesseract（LSTM）、CTPN+CRNN、Densenetopencv mser算法框出图片文字区域

【基于WPF+OneNote+Oracle的中文图片识别系统阶段总结】之篇三：批量处理后的txt文件入库处理篇三：批量处理后的txt文件入库处理

百度云 OCR 识别图片验证码

爬虫验证码识别_工具篇：安装pytesseract&Tesseract-OCR

应用移动端银行卡识别技术，实现APP端快捷绑定银行卡号

手写字符识别

Jaspersfot Studio Create Check OCR Font