Tesseract安裝

【1】直接安裝

1）Ubuntu 14.04下，可以直接安裝發行包tesseract-ocr

sudo apt-get install tesseract-ocr

這樣安裝的系統在/usr/bin下，資料檔案在/usr/share/tesseract-ocr/tessdata下（已經安裝了eng包）

在/usr/local/lib/python*.*/dist-package下有一個檔案夾pytesseract

（也許是我不小心裝上去的，GitHub[https://github.com/madmaze/pytesseract]上寫的是sudo pip install pytesseract安裝），

這樣就可以在Python中用tesseract了，例子如下：

import Image

import pytesseract

print pytesseract.image_to_string(Image.open('./Test/Python/t2.png'))

print pytesseract.image_to_string(Image.open('./Test/Python/t2.png'), )

把我訓練好的數字樣本檔案num.traineddata拷貝到資料檔案目錄下

print pytesseract.image_to_string(Image.open('./Test/Python/t2.png'), )

特殊的數字識别就很準了！

2）這樣安裝好的tesseract-ocr有一個問題，就是在Terminal下無法使用tesseract指令解析，報如下錯誤（但Python中可用）：

Tesseract Open Source OCR Engine v3.03 with Leptonica

Error in pixReadStreamPng: function not present

Error in pixReadStream: png: no pix returned

Error in pixRead: pix not read

Error in pixGetInputFormat: pix not defined

Reading ./Test/Python/t2.png as a list of filenames...

Error in fopenReadStream: file not found

Error in pixRead: image file not found: �PNG

Image file �PNG cannot be read!

Error during processing.

網上說是因為Leptonica不認識png,tif,jpg格式（其實基本上什麼格式都不認識，真不知道為什麼還要基于這個庫？）

（這個問題我還沒有解決？？？？？？？？？？？？？？？？？）

--------------------------------------------------------------------------------------------

【2】從源碼安裝

1）首先需要安裝leptonica，下載下傳位址：www.leptonica.org/download.html，例如下載下傳leptonica-1.68.tar.gz

然後安裝，使用如下的基本安裝方式就可以了（leptonica的定制安裝有興趣的再弄吧）：

./configure [build the Makefile]

make [builds the library and shared library versions of all the progs]

sudo make install [as root; this puts liblept.a into /usr/local/lib/ and all the progs into /usr/local/bin/ ]

2）下載下傳Tesseract，現在Tesseract托管到GitHub了（https://github.com/tesseract-ocr）。（不用FQ了去googlecode了下了！）

從GitHub下載下傳代碼，解壓縮到某個目錄（例如/tmp/tesseract）

3）安裝

./autogen.sh

./configure

make

sudo make install

sudo ldconfig

注意這樣安裝好的系統在/usr/local/bin下，資料檔案在/usr/local/share/tessdata下！

其中可能會有如下錯誤：

[1]./autogen.sh時，報錯一堆工具沒有，則需要補齊相應工具：

沒有aclocal sudo apt-get install automake

沒有libtoolize sudo apt-get install libtool

如果再報沒有其他工具，則執行這個工具，Ubuntu會告訴你如何安裝它。

[2]資料問題

源碼make出來的系統是沒有資料的，必須至少安裝一個資料包（一般是eng）才能運作系統，安裝方法：

先下載下傳資料包，然後解壓縮到/usr/local/share/tessdata

[3]測試是否安裝成功

先測試系統安裝，運作tesseract，出現以下内容說明安裝成功！

[email protected]:/usr/local/share/tessdata$ tesseract

Usage:tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfile...]

pagesegmode values are:

0 = Orientation and script detection (OSD) only.

1 = Automatic page segmentation with OSD.

2 = Automatic page segmentation, but no OSD, or OCR

3 = Fully automatic page segmentation, but no OSD. (Default)

4 = Assume a single column of text of variable sizes.

5 = Assume a single uniform block of vertically aligned text.

6 = Assume a single uniform block of text.

7 = Treat the image as a single text line.

8 = Treat the image as a single word.

9 = Treat the image as a single word in a circle.

10 = Treat the image as a single character.

-l lang and/or -psm pagesegmode must occur before anyconfigfile.

Single options:

-v --version: version info

--list-langs: list available languages for tesseract engine

常見錯誤是沒有語言資料，如下，這是需要按照前面說的安裝好語言資料（最好裝上eng，系統預設是eng，而且eng肯定用得上）：

Error opening data file /usr/local/share/tessdata/eng.traineddata

Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.

Failed loading language 'eng'

Tesseract couldn't load any languages!

Could not initialize tesseract.

然後測試檔案識别，源碼目錄下有個phototest.tif檔案，可以作為測試用。

tesseract phototest.tif test1 -l eng

常見錯誤是Leptonica不比對，如下：

Tesseract Open Source OCR Engine v3.02.02 with Leptonica

Error in findTiffCompression: function not present

Error in pixReadStreamTiff: function not present

Error in pixReadStream: tiff: no pix returned

Error in pixRead: pix not read

Unsupported image type.

這個問題我還沒有解決，網上說的方法不行（在Ubuntu 14.04上沒試通）????????????????????????????????

轉載于:https://www.cnblogs.com/searchware/p/4825138.html

Tesseract安裝

繼續閱讀

來自python的【條件控制/語句循環/break/continue/else/pass】一、條件控制二、語句循環

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入