天天看點

tesseract 圖檔驗證碼初級

另存為圖檔,字尾名.png

下載下傳tesseract

配置環境變量 上方建立第一空 TESSDATA_PREFIX 第二空找到tesseract.exe 右鍵屬性,安全–對象屬性

ctrl+左鍵 pytesseract 進入修改路徑 C:\Tesseract-OCR\tesseract.exe 改成雙斜杠

代碼:

import pytesseract
from PIL import Image

# image = Image.open('code.png')
image = Image.open('code4.png')
# image1 = Image.open('code.png')
# image.show()

tesseract_data = '--tessdata-dir "C:\\Tesseract-OCR\\tessdata"'
#彩色圖變成灰階圖
image = image.convert('L')
# image.show()

#取出幹擾線
# threshold = 170
threshold = 125
table = []
for i in range(256):
    if i<threshold:
        table.append(0)
    else:
        table.append(1)
image = image.point(table,'1')
# image.show()
image_str = pytesseract.image_to_string(image,config=tesseract_data)
print(image_str)
           

運作結果

E:\project\python.exe C:/Users/Administrator/Desktop/四階xpat爬蟲系列/Requests/Requests01/Requests01/day14/demo_tesseract.py

KVGi

Process finished with exit code 0