天天看點

Python程式設計:通過百度文字識别提取表格資料

百度文字識别文檔: https://ai.baidu.com/docs#/OCR-Python-SDK/top 安裝sdk

pip install baidu-aip      

先建立應用,得到appid

要識别的表格圖檔:

Python程式設計:通過百度文字識别提取表格資料

代碼示例

from aip import AipOcr

""" 你的 APPID AK SK """
APP_ID = '你的 App ID'
API_KEY = '你的 Api Key'
SECRET_KEY = '你的 Secret Key'

client = AipOcr(APP_ID, API_KEY, SECRET_KEY)

with open("names.png", "rb") as f:
    image = f.read()

result = client.basicGeneral(image)
print(result)      

識别結果:

{
    "log_id":3213553909522465362,
    "words_result_num":20,
    "words_result":[
        {
            "words":"表格1:"
        },
        {
            "words":"姓名"
        },
        {
            "words":"年齡"
        },
        {
            "words":"性别"
        },
        {
            "words":"李雷"
        },
        {
            "words":"20男"
        },
        {
            "words":"韓梅梅"
        },
        {
            "words":"23女"
        },
        {
            "words":"趙小三"
        },
        {
            "words":"25女"
        },
        {
            "words":"Table2."
        },
        {
            "words":"Name"
        },
        {
            "words":"ge"
        },
        {
            "words":"Gender"
        },
        {
            "words":"Tom"
        },
        {
            "words":"30 Male"
        },
        {
            "words":"Jack"
        },
        {
            "words":"33 Male"
        },
        {
            "words":"one"
        },
        {
            "words":"31Female"
        }
    ]
}      

結果不太滿意,年齡和性别被合在一起了

繼續閱讀