Warning: Parameter not found: enable_new_segsearch
Warning: Parameter not found: save_raw_choices
D:\devtools\Tesseract-OCR\tessdata
result:--->650 3428

一、測試圖檔777.png

tess4j的簡單使用doOCR異常彙總：

二、測試代碼：

package com.gazgeek.helloworld.tess4jTest;


import java.awt.*;
import java.io.File;
import net.sourceforge.tess4j.*;

public class Testtess {

    public static void main(String[] args) {

        File imageFile = new File("F:\\imgall\\777.png");
        Tesseract tessInst = new Tesseract();
        tessInst.setDatapath("D:\\devtools\\Tesseract-OCR\\tessdata");
        tessInst.setLanguage("eng");// eng.traineddata is in /tessdata direcotry

        try {
            String result= tessInst.doOCR(imageFile);
            System.out.println("D:\\devtools\\Tesseract-OCR\\tessdata");
            System.out.println("result:--->"  + result );
        } catch (TesseractException e) {
            System.err.println(e.getMessage());
        }

    }

}

三、測試結果：

tess4j的簡單使用doOCR異常彙總：

四、FAQ

1. ERROR net.sourceforge.tess4j.Tesseract - Not a JPEG file: starts with 0x89 0x50

Solution: the file is not acctually JPEG file, select true JPEG file.

2. WARN Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.

Solution: option A, tessInst.setDatapath(System.getProperty("user.dir") + "/tessdata");

option B, set TESSDATA_PREFIX your environment. Which is Tesseract's tessdata default value. If do not set, it will

open ./*.traineddata file.

3. "Warning: Parameter not found: enable_new_segsearch"

Solution: Works with this eng.traineddata: https://github.com/tesseract-ocr/tessdata_fast/blob/master/eng.traineddata

Note: language data file best use tessdata_best's file. If you want to recognize chinese, select chi_sim.traineddata, and download it, move it in your tessdata directory.

Java's print API basically works on the assumption that everything is done at 72 dpi. This means that you can use this as bases for converting to/from different measurements

references:

1. http://www.jbrandsma.com/news/2015/12/07/ocr-with-java-and-tesseract/

2. https://sourceforge.net/projects/tess4j/

3. https://github.com/tesseract-ocr/tessdata_best

4. https://www.b4x.com/android/forum/threads/solved-tesseract-api-a-120-opotunity.101482/

6. https://www.learnopencv.com/deep-learning-based-text-recognition-ocr-using-tesseract-and-opencv/

7. https://stackoverflow.com/questions/18975595/how-to-design-an-image-in-java-to-be-printed-on-a-300-dpi-printer

五、《Mac安裝Tesseract的全過程，附帶完整的錯誤和異常的解決辦法。Java開源OCR識别》

MacOS

使用

sudo port install tesseract

安裝 tesseract-ocr 之後，再執行以下指令時：

sudo tesseract 1563327696899809.jpg 1563327696899809.txt -l chi_sim+eng

提示：

Warning: Parameter not found: enable_new_segsearch

Tesseract Open Source OCR Engine v4.0.0 with Leptonica

Warning: Invalid resolution 0 dpi. Using 70 instead.

Estimating resolution as 194

這是因為 -l 參數中指定的語言包都沒有安裝的原因，使用以下指令安裝：

sudo port install tesseract-chi-sim

sudo port install tesseract-eng

異常彙總：

①Warning: Parameter not found: enable_new_segsearch

Mac出現的時候（把語言封包件拷貝到你在Java代碼設定好的目錄下，原因是此目錄沒有中文簡體的語言包）

ITesseract iTesseract = new Tesseract();
iTesseract.setDatapath("你的語言包絕對路徑");

②Warning: Invalid resolution 0 dpi. Using 70 instead.

ITesseract iTesseract = new Tesseract();
iTesseract.setDatapath("你的語言包絕對路徑");
iTesseract.setTessVariable("user_defined_dpi", "300");

設定一下dpi即可，預設設定300是最好的

tess4j的簡單使用doOCR異常彙總：

tess4j的簡單使用doOCR異常彙總：

一、測試圖檔777.png

二、測試代碼：

三、測試結果：

四、FAQ

五、《Mac安裝Tesseract的全過程，附帶完整的錯誤和異常的解決辦法。Java開源OCR識别》

異常彙總：

繼續閱讀

Python+locust做性能測試 --- locustV1.1.1版本更新 HttpUser

Python驗證碼識别tesseract-ocr安裝，報錯解決

java實作阿裡雲圖檔文字識别

圖檔文字識别工具調研2.tesseract安裝與使用3.python

LEADTOOLS WinRT OCR識别代碼示例

場景文本檢測（一）-可微分二值化在基于語義分割方法的場景文本檢測中的應用動機和Contributions現存方法Methodology總結Ref

文本檢測算法----DB、DBNet1. 摘要2. 算法3. 實驗4. 結論

【百度OCR 封裝篇】OCR封裝隻IOCR自定義模版或分類器封裝兩種調用方式

使用MODI（Microsoft Office Document Imaging）識别中文，但無法區分段落

主流深度學習OCR文字識别方法對比：Tesseract（LSTM）、CTPN+CRNN、Densenetopencv mser算法框出圖檔文字區域

【基于WPF+OneNote+Oracle的中文圖檔識别系統階段總結】之篇三：批量處理後的txt檔案入庫處理篇三：批量處理後的txt檔案入庫處理

百度雲 OCR 識别圖檔驗證碼

爬蟲驗證碼識别_工具篇：安裝pytesseract&Tesseract-OCR

應用移動端銀行卡識别技術，實作APP端快捷綁定銀行卡号

手寫字元識别

Jaspersfot Studio Create Check OCR Font