Tesseract安装

【1】直接安装

1）Ubuntu 14.04下，可以直接安装发行包tesseract-ocr

sudo apt-get install tesseract-ocr

这样安装的系统在/usr/bin下，数据文件在/usr/share/tesseract-ocr/tessdata下（已经安装了eng包）

在/usr/local/lib/python*.*/dist-package下有一个文件夹pytesseract

（也许是我不小心装上去的，GitHub[https://github.com/madmaze/pytesseract]上写的是sudo pip install pytesseract安装），

这样就可以在Python中用tesseract了，例子如下：

import Image

import pytesseract

print pytesseract.image_to_string(Image.open('./Test/Python/t2.png'))

print pytesseract.image_to_string(Image.open('./Test/Python/t2.png'), )

把我训练好的数字样本文件num.traineddata拷贝到数据文件目录下

print pytesseract.image_to_string(Image.open('./Test/Python/t2.png'), )

特殊的数字识别就很准了！

2）这样安装好的tesseract-ocr有一个问题，就是在Terminal下无法使用tesseract命令解析，报如下错误（但Python中可用）：

Tesseract Open Source OCR Engine v3.03 with Leptonica

Error in pixReadStreamPng: function not present

Error in pixReadStream: png: no pix returned

Error in pixRead: pix not read

Error in pixGetInputFormat: pix not defined

Reading ./Test/Python/t2.png as a list of filenames...

Error in fopenReadStream: file not found

Error in pixRead: image file not found: �PNG

Image file �PNG cannot be read!

Error during processing.

网上说是因为Leptonica不认识png,tif,jpg格式（其实基本上什么格式都不认识，真不知道为什么还要基于这个库？）

（这个问题我还没有解决？？？？？？？？？？？？？？？？？）

--------------------------------------------------------------------------------------------

【2】从源码安装

1）首先需要安装leptonica，下载地址：www.leptonica.org/download.html，例如下载leptonica-1.68.tar.gz

然后安装，使用如下的基本安装方式就可以了（leptonica的定制安装有兴趣的再弄吧）：

./configure [build the Makefile]

make [builds the library and shared library versions of all the progs]

sudo make install [as root; this puts liblept.a into /usr/local/lib/ and all the progs into /usr/local/bin/ ]

2）下载Tesseract，现在Tesseract托管到GitHub了（https://github.com/tesseract-ocr）。（不用FQ了去googlecode了下了！）

从GitHub下载代码，解压缩到某个目录（例如/tmp/tesseract）

3）安装

./autogen.sh

./configure

make

sudo make install

sudo ldconfig

注意这样安装好的系统在/usr/local/bin下，数据文件在/usr/local/share/tessdata下！

其中可能会有如下错误：

[1]./autogen.sh时，报错一堆工具没有，则需要补齐相应工具：

没有aclocal sudo apt-get install automake

没有libtoolize sudo apt-get install libtool

如果再报没有其他工具，则执行这个工具，Ubuntu会告诉你如何安装它。

[2]数据问题

源码make出来的系统是没有数据的，必须至少安装一个数据包（一般是eng）才能运行系统，安装方法：

先下载数据包，然后解压缩到/usr/local/share/tessdata

[3]测试是否安装成功

先测试系统安装，运行tesseract，出现以下内容说明安装成功！

[email protected]:/usr/local/share/tessdata$ tesseract

Usage:tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfile...]

pagesegmode values are:

0 = Orientation and script detection (OSD) only.

1 = Automatic page segmentation with OSD.

2 = Automatic page segmentation, but no OSD, or OCR

3 = Fully automatic page segmentation, but no OSD. (Default)

4 = Assume a single column of text of variable sizes.

5 = Assume a single uniform block of vertically aligned text.

6 = Assume a single uniform block of text.

7 = Treat the image as a single text line.

8 = Treat the image as a single word.

9 = Treat the image as a single word in a circle.

10 = Treat the image as a single character.

-l lang and/or -psm pagesegmode must occur before anyconfigfile.

Single options:

-v --version: version info

--list-langs: list available languages for tesseract engine

常见错误是没有语言数据，如下，这是需要按照前面说的安装好语言数据（最好装上eng，系统默认是eng，而且eng肯定用得上）：

Error opening data file /usr/local/share/tessdata/eng.traineddata

Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.

Failed loading language 'eng'

Tesseract couldn't load any languages!

Could not initialize tesseract.

然后测试文件识别，源码目录下有个phototest.tif文件，可以作为测试用。

tesseract phototest.tif test1 -l eng

常见错误是Leptonica不匹配，如下：

Tesseract Open Source OCR Engine v3.02.02 with Leptonica

Error in findTiffCompression: function not present

Error in pixReadStreamTiff: function not present

Error in pixReadStream: tiff: no pix returned

Error in pixRead: pix not read

Unsupported image type.

这个问题我还没有解决，网上说的方法不行（在Ubuntu 14.04上没试通）????????????????????????????????

转载于:https://www.cnblogs.com/searchware/p/4825138.html

Tesseract安装

继续阅读

来自python的【条件控制/语句循环/break/continue/else/pass】一、条件控制二、语句循环

无法解析的外部符号 wmain，该符号在函数 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink导出用例转换工具(XML2Excel)

YAML简介和PyYAML安全操作YAML支持的类型YAML的优点：yaml的基本语法python操作

Small tricks

libsvm for python 安装

学习软件测试基础测试第七天

Zeppelin 配置访问 REST APIApache Zeppelin Configuration REST API

【Torch】最简洁logging使用指南

27. Remove Element(列表)题目代码

Cloud Studio初体验

使用 ctypes 进行 Python 和 C 的混合编程

【python】【数据处理】画多维数据分布图

【python】netconf协议对接管理设备

「Python 网络自动化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 网络设备

在python中创建excel并写入